Preliminary Experiments for Tier 2 Interpreter Ideas in CPython

kj0 · April 16, 2023, 4:24pm

Hi all,

For the past 3 months, me and a friend (Jules) have been working on an experiment to see how a Tier 2 interpreter in CPython could look like. To that end, we implemented lazy basic block versioning in CPython (this is different than the superblock approach the Faster CPython team is taking). We achieved a 39% speedup on microbenchmarks involving float operations (without JIT compiler), with no speedup yet on benchmarks from pyperformance. We used a combination of techniques, such as type propagation, guard elimination, and float unboxing. Of note is that our type propagation algorithm works on CPython bytecode, and does not need a CFG or AST.

This project not meant to compete with the superblock approach proposed by the Faster CPython team, but rather a showcase and validation of one of its optimisation steps - type propagation.

For quick viewing and animations for certain parts, here are the slides for our presentation: Exploring a Tier 2 Interpreter for CPython - Google Slides

The project report goes into more depth about details. It details how we overcame certain problems like jump rewriting: cpython/CPython_Tier_2_LBBV_Report_For_Repo.pdf at tier2_interpreter_no_separate_eval_no_tracer_contiguous_bbs · Fidget-Spinner/cpython · GitHub

The repo is a branch of CPython: GitHub - Fidget-Spinner/cpython at tier2_interpreter_no_separate_eval_no_tracer_contiguous_bbs.

This project was a very close knit collaboration with the Faster CPython and Cinder team, and we hope everyone gets to learn something from it!

Special thanks to Prof Martin Henz who supervised this project!

kj0 · June 5, 2023, 4:09pm

NOTE

This report contains many limitations, and is not a guide for language implementers to follow. Rather it is a set of experiments in CPython. We encourage people to learn from it, but please do not follow the implementation as-is if you want better performance.

Chiefly among the features, it has the following drawbacks:

It lacks unboxing of ints, which has higher payoffs.
We added significantly more opcodes which adds way more overhead.

We go into a lot more details about the limitations and lack of speedups in the report.

Thanks for Stefan Brunthaler for reminding me and pointing these out.