Riposte: A trace-driven compiler and parallel VM for vector code in R
@article{Talbot2012RiposteAT, title={Riposte: A trace-driven compiler and parallel VM for vector code in R}, author={Justin Talbot and Zach DeVito and Pat Hanrahan}, journal={2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT)}, year={2012}, pages={43-51}, url={https://api.semanticscholar.org/CorpusID:1989369} }
Riposte is a new runtime for the R language that uses tracing, a technique commonly used to accelerate scalar code, to dynamically discover and extract sequences of vector operations from arbitrary R code and achieves an overall average speed-up of over 150× without explicit programmer parallelization.
Topics
Riposte (opens in a new tab)Dynamically Typed Languages (opens in a new tab)Single Instruction Multiple Data (opens in a new tab)Hardware (opens in a new tab)Runtime (opens in a new tab)Vector Codes (opens in a new tab)Processor Designs (opens in a new tab)Workloads (opens in a new tab)Programming Language (opens in a new tab)
36 Citations
Accelerating Dynamically-Typed Languages on Heterogeneous Platforms Using Guards Optimization
- 2018
Computer Science
The present paper presents MegaGuards, a new approach for speculatively executing dynamic languages on heterogeneous platforms in a fully automatic and transparent manner, which removes guards from compute-intensive loops and improves sequential performance.
Reflections on the compatibility, performance, and scalability of parallel Python
- 2019
Computer Science
This paper reports on the experience with the three parallel VMs of Python by comparing their compatibility, performance, and scalability, and shows that fine-grained locking can yield better scalability than the STM approach.
Just-In-Time GPU Compilation for Interpreted Languages with Partial Evaluation
- 2017
Computer Science
This paper uses just-in-time compilation to transparently and automatically offload computations from interpreted dynamic languages to heterogeneous devices and shows that when taking into account start-up time, large speedups are achievable, even when the applications run for as little as a few seconds.
Parallelizing Julia with a Non-Invasive DSL
- 2017
Computer Science
ParallelAccelerator is presented, a library and compiler for high-level, high-performance scientific computing in Julia that exposes the implicit parallelism in high- level array-style programs and compiles them to fast, parallel native code.
Optimizing R VM: Allocation Removal and Path Length Reduction via Interpreter-level Specialization
- 2014
Computer Science
A first classification of R programming styles into Type I (looping over data), Type II (vector programming), and Type III (glue codes) is introduced, which shows the most serious overhead of R are mostly manifested on Type I R codes, whereas many Type III R codes can be quite fast.
Just-in-time Length Specialization of Dynamic Vector Code
- 2014
Computer Science
A trace-based just-in-time compilation strategy that performs partial length specialization of dynamically typed vector code to avoid excessive compilation overhead while still enabling the generation of efficient machine code through length-based optimizations.
Dynamic page sharing optimization for the R language
- 2014
Computer Science
This work presents a low-overhead page sharing approach for R that significantly reduces the interpreter's memory overhead and Concentrating on the most rewarding optimizations avoids the high runtime overhead of existing generic approaches for memory deduplication or compression.
ROSA: R Optimizations with Static Analysis
- 2017
Computer Science
ROSA is presented, a static analysis framework to improve the performance and space efficiency of R programs and shows substantial reductions by ROSA in execution time and memory consumption over both CRAN R and Microsoft R Open.
Run-time data analysis to drive compiler optimizations
- 2021
Computer Science
This project proposes integrating data analysis into a dynamic runtime to speed up big data applications and uses the detailed run-time information for speculative compiler optimizations based on the shape and complexion of the data to improve performance.
Contextual dispatch for function specialization
- 2020
Computer Science
This paper proposes an approach to further the specialization of dynamic language compilers, by disentangling classes of behaviors into separate optimization units, and describes a compiler for the R language which uses this approach.
39 References
Intel's Array Building Blocks: A retargetable, dynamic compiler and embedded language
- 2011
Computer Science, Engineering
This paper introduces Intel® Array Building Blocks (ArBB), which is a retargetable dynamic compilation framework that focuses on making it easier to write and port programs so that they can harvest data and thread parallelism on both multi-core and heterogeneous many-core architectures, while staying within standard C++.
ispc: A SPMD compiler for high-performance CPU programming
- 2012
Computer Science, Engineering
A compiler, the Intel R® SPMD Program Compiler (ispc), is developed that delivers very high performance on CPUs thanks to effective use of both multiple processor cores and SIMD vector units.
Compiling for stream processing
- 2006
Computer Science
A compiler for stream programs that efficiently schedules computational kernels and stream memory operations, and allocates on-chip storage, and is able to overlap memory operations and manage local storage so that 78% to 96% of program execution time is spent in running computational kernels.
Copperhead: compiling an embedded data parallel language
- 2011
Computer Science
The language, compiler, and runtime features that enable Copperhead to efficiently execute data parallel code are discussed and the program analysis techniques necessary for compiling Copperhead code into efficient low-level implementations are introduced.
HotpathVM: an effective JIT compiler for resource-constrained devices
- 2006
Computer Science
A just-in-time compiler for a Java VM that is small enough to fit on resource-constrained devices, yet is surprisingly effective, and benchmarks show a speedup that in some cases rivals heavy-weight just- in-time compilers.
Harnessing the Multicores: Nested Data Parallelism in Haskell
- 2008
Computer Science
This talk will describe Data Parallel Haskell, which embodies nested data parallelism in a modern, general-purpose language, implemented in a state-of-the-art compiler, GHC, and will focus particularly on the vectorisation transformation, which transforms nested to flatData Parallel Haskell.
Dynamo: a transparent dynamic optimization system
- 2000
Computer Science
The Dynamo prototype presented here is a realistic implementation running on an HP PA-8000 workstation under the HPUX 10.20 operating system and demonstrates that even statically optimized native binaries can be accelerated Dynamo, and often by a significant degree.
Lazy binary-splitting: a run-time adaptive work-stealing scheduler
- 2010
Computer Science
Lazy Binary Splitting is presented, a user-level scheduler of nested parallelism for shared-memory multiprocessors that builds on existing Eager binary Splitting work-stealing, but improves performance and ease-of-programming.
Larrabee: a many-core x86 architecture for visual computing
- 2008
Computer Science, Engineering
This paper presents a many-core visual computing architecture code named Larrabee, a new software rendering pipeline, a manycore programming model, and performance analysis for several applications that demonstrates Larabee's potential for a broad range of parallel computation.
Scalable aggregation on multicore processors
- 2011
Computer Science, Engineering
This paper aims to provide a solution to performing in-memory parallel aggregation on the Intel Nehalem architecture, and considers several previously proposed techniques, including a hybrid independent/shared method and a method that clones data items automatically when contention is detected.