I worked a bit on studying the effect of the new CPython JIT and more generally of performance for simple CPU bounded pure Python code.
I focused on a very simple benchmark involving only very simple pure Python:
def short_calcul(n):
result = 0
for i in range(1, n + 1):
result += i
return result
def long_calcul(num):
result = 0
for i in range(num):
result += short_calcul(i) - short_calcul(i)
return result
I’m not claiming that this code is representative of any real world Python CPU bounded code. One interest of this benchmark is that it involves only very simple tasks (loops, addition/subtractions of integers and function calls) so it should be relatively simple for compilers to accelerate it.
For example, for this benchmark PyPy 3.11 is 25 times faster than the system CPython 3.11 on Debian (compiled with GCC).
Unfortunately, the results are a bit depressing. The full code and full results are available here.
I used only solutions available without compiling the interpreters (installing Python with UV and Miniforge). I run these benchmarks only on Linux x86_64 (Debian). All Pythons used for this experiment have the GIL and everything is sequential.
Here are few important points:
- The compiler used to compile the interpreter has a non negligible effect (see cpython 3.13 installed with UV slow and not compiled with `--enable-experimental-jit=yes-off`` · Issue #535 · astral-sh/python-build-standalone · GitHub).
- The CPython JIT still has a relatively small effect on this benchmark. Even for the best case (3.13 from conda-forge), the speedup related to the JIT is only of x1.2, compared to x25 with PyPy!
- If we compare only Python provided by UV, 3.14a6 is a bit faster than 3.13 (x1.2) but actually approximately as fast as 3.11 (x1.05 to be more accurate).
For such simple code, this is in my humble opinion quite disappointing. PyPy is still more than 22 times faster than the best CPython result (Python 3.13 from conda-forge with the JIT). And let’s not talk about other languages…
It seems to indicate that the CPython JIT does not manage to avoid boxing/unboxing of integers.
I have few questions about these results:
- Is it possible to get information on what happens internally with the CPython JIT with this code? What is wrong in this case?
- Is there a chance that a future version of the CPython JIT will really accelerate such kind of pure Python CPU bounded code?
- For 3.14 (compiled with Clang), the JIT has in practice no measurable effect. Is it related to the new interpreter using tail calls?
- Are there examples for which the CPython JIT leads to a non negligible speedup?