Riposte: A trace-driven compiler and parallel VM for vector code in R

Justin Talbot; Zach DeVito; P. Hanrahan

DOI:10.1145/2370816.2370825
Corpus ID: 1989369

Riposte: A trace-driven compiler and parallel VM for vector code in R

@article{Talbot2012RiposteAT,
  title={Riposte: A trace-driven compiler and parallel VM for vector code in R},
  author={Justin Talbot and Zach DeVito and Pat Hanrahan},
  journal={2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT)},
  year={2012},
  pages={43-51},
  url={https://api.semanticscholar.org/CorpusID:1989369}
}

Justin TalbotZach DeVitoP. Hanrahan
Published in International Conference on… 19 September 2012
Computer Science

Riposte is a new runtime for the R language that uses tracing, a technique commonly used to accelerate scalar code, to dynamically discover and extract sequences of vector operations from arbitrary R code and achieves an overall average speed-up of over 150× without explicit programmer parallelization.

View on ACM

cs.stanford.edu

36 Citations

Highly Influential Citations

Background Citations

Methods Citations

Figures from this paper

Topics

Riposte Dynamically Typed Languages Single Instruction Multiple Data Hardware Runtime Vector Codes Processor Designs Workloads Programming Language

Accelerating Dynamically-Typed Languages on Heterogeneous Platforms Using Guards Optimization

Mohaned QunaibitStefan BrunthalerYeoul NaStijn VolckaertM. Franz

Computer Science

ECOOP

2018

The present paper presents MegaGuards, a new approach for speculatively executing dynamic languages on heterogeneous platforms in a fully automatic and transparent manner, which removes guards from compute-intensive loops and improves sequential performance.

Reflections on the compatibility, performance, and scalability of parallel Python

Remigius MeierT. Gross

Computer Science

DLS

2019

This paper reports on the experience with the three parallel VMs of Python by comparing their compatibility, performance, and scalability, and shows that fine-grained locking can yield better scalability than the STM approach.

Just-In-Time GPU Compilation for Interpreted Languages with Partial Evaluation

J. FumeroMichel SteuwerLukas StadlerChristophe Dubach

Computer Science

VEE

2017

This paper uses just-in-time compilation to transparently and automatically offload computations from interpreted dynamic languages to heterogeneous devices and shows that when taking into account start-up time, large speedups are achievable, even when the applications run for as little as a few seconds.

Parallelizing Julia with a Non-Invasive DSL

T. A. AndersonHai LiuL. KuperE. TotoniJ. VitekT. Shpeisman

Computer Science

ECOOP

2017

ParallelAccelerator is presented, a library and compiler for high-level, high-performance scientific computing in Julia that exposes the implicit parallelism in high- level array-style programs and compiles them to fast, parallel native code.

Optimizing R VM: Allocation Removal and Path Length Reduction via Interpreter-level Specialization

Haichuan WangPeng WuD. Padua

Computer Science

CGO '14

2014

A first classification of R programming styles into Type I (looping over data), Type II (vector programming), and Type III (glue codes) is introduced, which shows the most serious overhead of R are mostly manifested on Type I R codes, whereas many Type III R codes can be quite fast.

Just-in-time Length Specialization of Dynamic Vector Code

Justin TalbotZach DeVitoP. Hanrahan

Computer Science

ARRAY@PLDI

2014

A trace-based just-in-time compilation strategy that performs partial length specialization of dynamically typed vector code to avoid excessive compilation overhead while still enabling the generation of efficient machine code through length-based optimizations.

Dynamic page sharing optimization for the R language

Helena KotthausIngo KorbM. EngelP. Marwedel

Computer Science

DLS

2014

This work presents a low-overhead page sharing approach for R that significantly reduces the interpreter's memory overhead and Concentrating on the most rewarding optimizations avoids the high runtime overhead of existing generic approaches for memory deduplication or compression.

ROSA: R Optimizations with Static Analysis

Rathijit SenJianqiao ZhuJ. PatelS. Jha

Computer Science

ArXiv

2017

ROSA is presented, a static analysis framework to improve the performance and space efficiency of R programs and shows substantial reductions by ROSA in execution time and memory consumption over both CRAN R and Microsoft R Open.

[PDF]

Run-time data analysis to drive compiler optimizations

Sebastian Kloibhofer

Computer Science

SPLASH

2021

This project proposes integrating data analysis into a dynamic runtime to speed up big data applications and uses the detailed run-time information for speculative compiler optimizations based on the shape and complexion of the data to improve performance.

Contextual dispatch for function specialization

O. FlückigerGuido ChariMing-Ho YeeJan JecmenJakob HainJ. Vitek

Computer Science

Proc. ACM Program. Lang.

2020

This paper proposes an approach to further the specialization of dynamic language compilers, by disentangling classes of behaviors into separate optimization units, and describes a compiler for the R language which uses this approach.

Intel's Array Building Blocks: A retargetable, dynamic compiler and embedded language

C. NewburnByoungro So Dan Zhang

Computer Science, Engineering

International Symposium on Code Generation and…

2011

This paper introduces Intel® Array Building Blocks (ArBB), which is a retargetable dynamic compilation framework that focuses on making it easier to write and port programs so that they can harvest data and thread parallelism on both multi-core and heterogeneous many-core architectures, while staying within standard C++.

ispc: A SPMD compiler for high-performance CPU programming

M. PharrWilliam R. Mark

Computer Science, Engineering

2012 Innovative Parallel Computing (InPar)

2012

A compiler, the Intel R® SPMD Program Compiler (ispc), is developed that delivers very high performance on CPUs thanks to effective use of both multiple processor cores and SIMD vector units.

Compiling for stream processing

Abhishek DasW. DallyP. Mattson

Computer Science

2006 International Conference on Parallel…

2006

A compiler for stream programs that efficiently schedules computational kernels and stream memory operations, and allocates on-chip storage, and is able to overlap memory operations and manage local storage so that 78% to 96% of program execution time is spent in running computational kernels.

Copperhead: compiling an embedded data parallel language

Bryan CatanzaroM. GarlandK. Keutzer

Computer Science

PPoPP '11

2011

The language, compiler, and runtime features that enable Copperhead to efficiently execute data parallel code are discussed and the program analysis techniques necessary for compiling Copperhead code into efficient low-level implementations are introduced.

HotpathVM: an effective JIT compiler for resource-constrained devices

A. GalChristian W. ProbstM. Franz

Computer Science

VEE '06

2006

A just-in-time compiler for a Java VM that is small enough to fit on resource-constrained devices, yet is surprisingly effective, and benchmarks show a speedup that in some cases rivals heavy-weight just- in-time compilers.

Harnessing the Multicores: Nested Data Parallelism in Haskell

S. JonesRoman LeshchinskiyG. KellerM. Chakravarty

Computer Science

FSTTCS

2008

This talk will describe Data Parallel Haskell, which embodies nested data parallelism in a modern, general-purpose language, implemented in a state-of-the-art compiler, GHC, and will focus particularly on the vectorisation transformation, which transforms nested to flatData Parallel Haskell.

Dynamo: a transparent dynamic optimization system

Vasanth BalaE. DuesterwaldS. Banerjia

Computer Science

PLDI '00

2000

The Dynamo prototype presented here is a realistic implementation running on an HP PA-8000 workstation under the HPUX 10.20 operating system and demonstrates that even statically optimized native binaries can be accelerated Dynamo, and often by a significant degree.

Lazy binary-splitting: a run-time adaptive work-stealing scheduler

Alexandros TzannesGeorge C. CarageaR. BaruaU. Vishkin

Computer Science

PPoPP '10

2010

Lazy Binary Splitting is presented, a user-level scheduler of nested parallelism for shared-memory multiprocessors that builds on existing Eager binary Splitting work-stealing, but improves performance and ease-of-programming.

Larrabee: a many-core x86 architecture for visual computing

L. SeilerDouglas M. Carmean P. Hanrahan

Computer Science, Engineering

ACM Trans. Graph.

2008

This paper presents a many-core visual computing architecture code named Larrabee, a new software rendering pipeline, a manycore programming model, and performance analysis for several applications that demonstrates Larabee's potential for a broad range of parallel computation.

Scalable aggregation on multicore processors

Yangang YeK. A. RossNorases Vesdapunt

Computer Science, Engineering

DaMoN '11

2011

This paper aims to provide a solution to performing in-memory parallel aggregation on the Intel Nehalem architecture, and considers several previously proposed techniques, including a hybrid independent/shared method and a method that clones data items automatically when contention is detected.

Riposte: A trace-driven compiler and parallel VM for vector code in R

Figures from this paper

Topics

36 Citations

Accelerating Dynamically-Typed Languages on Heterogeneous Platforms Using Guards Optimization

Reflections on the compatibility, performance, and scalability of parallel Python

Just-In-Time GPU Compilation for Interpreted Languages with Partial Evaluation

Parallelizing Julia with a Non-Invasive DSL

Optimizing R VM: Allocation Removal and Path Length Reduction via Interpreter-level Specialization

Just-in-time Length Specialization of Dynamic Vector Code

Dynamic page sharing optimization for the R language

ROSA: R Optimizations with Static Analysis

Run-time data analysis to drive compiler optimizations

Contextual dispatch for function specialization

39 References

Intel's Array Building Blocks: A retargetable, dynamic compiler and embedded language

ispc: A SPMD compiler for high-performance CPU programming

Compiling for stream processing

Copperhead: compiling an embedded data parallel language

HotpathVM: an effective JIT compiler for resource-constrained devices

Harnessing the Multicores: Nested Data Parallelism in Haskell

Dynamo: a transparent dynamic optimization system

Lazy binary-splitting: a run-time adaptive work-stealing scheduler

Larrabee: a many-core x86 architecture for visual computing

Scalable aggregation on multicore processors

Related Papers