Codelet Extractor and REplayer (CERE) is an open source framework for code isolation developped jointly at UVSQ Li-PaRAD lab and ECR lab. CERE finds and extracts the hotspots of an application as isolated fragments of code, called codelets. Codelets can be modified, compiled, run, and measured independently from the original application. Code isolation reduces benchmarking cost and allows piecewise optimization of an application. Unlike previous approaches, CERE isolates codes at the compiler Intermediate Representation level. Therefore CERE is language agnostic and supports many input languages such as C, C++, Fortran, and D. CERE automatically detects codelets invocation that have the same performance behavior. Then, it selects a reduced set of representative codelets and invocations, much faster to replay, which still captures accurately the original application. In addition, CERE supports recompiling and retargeting the extracted codelets. Therefore, CERE can be used for cross-architecture performance prediction or piecewise code optimization. On the SPEC 2006 FP benchmarks, CERE codelets cover 90.9% and accurately replay 66.3% of the execution time. We use CERE codelets in a realistic study to evaluate three different architectures on the NAS benchmarks. CERE accurately estimates each architecture performance and is 7.3 times to 46.6 times cheaper than running the full benchmark. Within the Montblanc European Research Project CERE is being ported to ARM Aarch64 architectures.
CERE: LLVM based codelet extractor and REplayer for piecewise benchmarking and optimization. Pablo de Oliveira Castro, Chadi Akel, Eric Petit, Mihail Popov, and Jalby William. ACM Transactions on Architecture and Code Optimization, 2015. [ bib | DOI ]
PCERE: Fine-grained Parallel Benchmark Decomposition for Scalability Prediction. Mihail Popov, Chadi Akel, Florent Conti, William Jalby, and Pablo de Oliveira Castro. In Proceedings of the 29th IEEE International Parallel and Distributed Processing Symposium IPDPS 2015. IEEE, 2015. [ bib ]
Piecewise Holistic Autotuning of Compiler and Runtime Parameters. Mihail Popov, Chadi Akel, William Jalby, and Pablo de Oliveira Castro. In Euro-Par 2016 Parallel Processing - 22nd International Conference, Lecture Notes in Computer Science. Springer, 2016. [ bib ]
Automatic Decomposition of Parallel Programs for Optimization and Performance Prediction. Mihail Popov, PhD thesis, defended in 2016.
CERE takes as an input the source files of an application or a benchmark suite. All the languages supported by the LLVM front-ends (C, C++, Fortran, D, etc.) are accepted. The application loops are outlined and instrumented with profiling probes to identify the application hotspots. The hot loops are kept and form the full codelet set. CERE can prune this full codelet set and only keep a reduced set of representative codelets. Then, a clustering algorithm analyzes the invocation performance trace of each codelet to find a representative subset of invocations. The memory and cache state of each representative invocation is then captured and dumped to disk. The output of this process is a set of representative codelets and invocations, which can be redistributed, recompiled and replayed on different systems and architectures. The codelet set can be used as a proxy for original application in optimization or benchmarking studies.
For questions or discussions please use the email@example.com mailing list.
CERE is distributed under the GNU Lesser General Public License. Sources and latest releases are available at https://github.com/benchmark-subsetting/cere
Please follow the instructions here.
|BT||Block Tri-diagonal solver|
|FT||Discrete 3D fast Fourier Transformation|
|LU||Lower-Upper Gauss-Seidel solver|
|MG||Multi-Grid on a sequence of meshes|
|SP||Scalar Penta-diagonal solver|
|435-gromacs||Biochemistry: Molecular Dynamics|
|444-namd||Biology: Molecular Dynamics|
|447-dealII||Finite Element Analysis|
|450-soplex||Linear Programming, Optimization|
|999-specrand||Random Number Generation|