Codelet Extractor and REplayer

Introducing CERE

What is CERE?

CERE steps

Codelet Extractor and REplayer (CERE) is an open source framework for code isolation developped jointly at UVSQ Li-PaRAD lab and ECR lab. CERE finds and extracts the hotspots of an application as isolated fragments of code, called codelets. Codelets can be modified, compiled, run, and measured independently from the original application. Code isolation reduces benchmarking cost and allows piecewise optimization of an application. Unlike previous approaches, CERE isolates codes at the compiler Intermediate Representation level. Therefore CERE is language agnostic and supports many input languages such as C, C++, Fortran, and D. CERE automatically detects codelets invocation that have the same performance behavior. Then, it selects a reduced set of representative codelets and invocations, much faster to replay, which still captures accurately the original application. In addition, CERE supports recompiling and retargeting the extracted codelets. Therefore, CERE can be used for cross-architecture performance prediction or piecewise code optimization. On the SPEC 2006 FP benchmarks, CERE codelets cover 90.9% and accurately replay 66.3% of the execution time. We use CERE codelets in a realistic study to evaluate three different architectures on the NAS benchmarks. CERE accurately estimates each architecture performance and is 7.3 times to 46.6 times cheaper than running the full benchmark. Within the Montblanc European Research Project CERE is being ported to ARM Aarch64 architectures.


CERE: LLVM based codelet extractor and REplayer for piecewise benchmarking and optimization. Pablo de Oliveira Castro, Chadi Akel, Eric Petit, Mihail Popov, and Jalby William. ACM Transactions on Architecture and Code Optimization, 2015. [ bib | DOI ]

PCERE: Fine-grained Parallel Benchmark Decomposition for Scalability Prediction. Mihail Popov, Chadi Akel, Florent Conti, William Jalby, and Pablo de Oliveira Castro. In Proceedings of the 29th IEEE International Parallel and Distributed Processing Symposium IPDPS 2015. IEEE, 2015. [ bib ]

Piecewise Holistic Autotuning of Compiler and Runtime Parameters. Mihail Popov, Chadi Akel, William Jalby, and Pablo de Oliveira Castro. In Euro-Par 2016 Parallel Processing - 22nd International Conference, Lecture Notes in Computer Science. Springer, 2016. [ bib ]

Automatic Decomposition of Parallel Programs for Optimization and Performance Prediction. Mihail Popov, PhD thesis, defended in 2016.

Piecewise Holistic Autotuning of Parallel Programs with CERE. Mihail Popov, Chadi Akel, Yohan Chatelain, William Jalby, and Pablo de Oliveira Castro. Concurrency and Computation: Practice and Experience, 2017. [ bib ]

Efficient Thread/Page/Parallelism Autotuning for NUMA Systems. Mihail Popov, Alexandra Jimborean, and David Black-Schaffer. International Conference on Supercomputing (ICS '19). ACM, 2019. [ bib ]

How does it works?

CERE takes as an input the source files of an application or a benchmark suite. All the languages supported by the LLVM front-ends (C, C++, Fortran, D, etc.) are accepted. The application loops are outlined and instrumented with profiling probes to identify the application hotspots. The hot loops are kept and form the full codelet set. CERE can prune this full codelet set and only keep a reduced set of representative codelets. Then, a clustering algorithm analyzes the invocation performance trace of each codelet to find a representative subset of invocations. The memory and cache state of each representative invocation is then captured and dumped to disk. The output of this process is a set of representative codelets and invocations, which can be redistributed, recompiled and replayed on different systems and architectures. The codelet set can be used as a proxy for original application in optimization or benchmarking studies.

Get in touch

For questions or discussions please use the mailing list.

Getting CERE

Download CERE

CERE is distributed under the GNU Lesser General Public License. Sources and latest releases are available at

Install CERE

Please follow the instructions here.

Replay Accuracy Reports


Benchmark Description
BT Block Tri-diagonal solver
CG Conjugate Gradient
EP Embarrassingly Parallel
FT Discrete 3D fast Fourier Transformation
IS Integer Sort
LU Lower-Upper Gauss-Seidel solver
MG Multi-Grid on a sequence of meshes
SP Scalar Penta-diagonal solver


Benchmark Description
410-bwaves Fluid Dynamics
416-gamess Quantum Chemistry
433-milc Quantum Chromodynamics
434-zeusmp CFD
435-gromacs Biochemistry: Molecular Dynamics
436-cactusADM.html General Relativity
437-leslie3d Fluid Dynamics
444-namd Biology: Molecular Dynamics
447-dealII Finite Element Analysis
450-soplex Linear Programming, Optimization
453-povray Image Ray-tracing
454-calculix Structural Mechanics
459-GemsFDTD Computational Electromagnetics
465-tonto Quantum Chemistry
470-lbm Fluid Dynamics
481-wrf Weather Prediction
482-sphinx3 Speech recognition
999-specrand Random Number Generation
Benchmark Description
lulesh2.0 Intel Haswell Hydrodynamics
lulesh2.0 Aarch64 Hydrodynamics