PEP 744: JIT Compilation

brandtbucher · April 12, 2024, 12:33am

PEP 744 is an informational PEP answering many common questions about CPython 3.13’s new experimental JIT compiler.

My main goal for this PEP is to build community consensus around the specific criteria that the JIT should meet in order to become a permanent, non-experimental part of CPython. The “Specification” section lists three basic requirements as a starting point, but I expect that more will be added as a result of our discussions here:

The JIT will become non-experimental once all of the following conditions are met:

It provides a meaningful performance improvement for at least one popular platform (realistically, on the order of 5%).

It can be built, distributed, and deployed with minimal disruption.

The Steering Council, upon request, has determined that it would provide more value to the community if enabled than if disabled (considering tradeoffs such as maintenance burden, memory usage, or the feasibility of alternate designs).

If you have any questions or concerns that are not addressed in the PEP or its linked materials, please let me know and I’ll be happy to add them.

NeilGirdhar · April 12, 2024, 5:15am

Would it make sense to contrast the 3.13 JIT with PyPy’s JIT?

malemburg · April 12, 2024, 8:39am

Thank you for the write-up, Brandt. This is an exciting development

A few questions:

LLVM: I am a bit confused about the statements related to LLVM. In the “Rational” section you write using LLVM would incur significant overhead and a heavy dependency. Then in the “Support” section you write that some platforms may not receive JIT support due to LLVM not having good support for them. Could you clarify whether you are using LLVM of not ? Other comments in the PEP suggest that you are using LLVM. If that’s the case, how do you manage to keep the overhead low ? (In that case, it may also make sense to change/update the “Rational” section to avoid such questions)
Patents: Are you aware of any patents on the copy-and-patch approach or other advanced techniques you are using in the JIT, which may encumber the use of a JIT enabled CPython ?
RAM usage: Does the JIT limit itself to using a certain percentage of available RAM (as a measure against DOS style attacks or bugs in the code) ? How quickly are less used JIT compiled parts garbage collected ? Can the JIT GC be controlled programmatically (e.g. by sys module calls) ?
Dependencies: Assuming you are using LLVM, is it still possible to build CPython without JIT support on platforms where LLVM is not available/supported ?

mdroettboom · April 12, 2024, 2:05pm

LLVM: A typical approach to creating a JIT is to use LLVM at runtime. This is the “cost of introducing heavy runtime dependencies” mentioned in the Rationale that we want to avoid. The copy-and-patch approach is novel in that it only introduces the LLVM dependency at build time, where the overhead is less important since it only affects CPython developers, not Python users. I think some judicious use of “runtime” and “build time” to the doc will hopefully clear up this confusion.

Patents: We aren’t aware of any patents, and the copy-and-patch paper on which this is based is open access (which isn’t the same thing, but helps). However, I am not a lawyer – is there a typical process that happens when this is a concern?

Dependencies: There are no plans to remove the ability to build CPython without the JIT on any platform. It is likely that the default build will remain “without JIT”, even after the default binaries on supported platforms become “with JIT”, just as PGO and LTO are today.

RAM usage: I’ll leave this for Brandt to answer.

smontanaro · April 12, 2024, 2:38pm

Brandt,

Thanks for the PEP (still reading it). I did watch your talk this morning. As someone who hasn’t really done much with the virtual machine in a couple decades, Knowing know 3.11 & 3.12 laid the groundwork for the JIT work was quite useful. I’m guessing most people won’t be familiar with the copy-and-patch technique, so it might be a good idea to add those Lua references to the PEP bibliography.

smontanaro · April 12, 2024, 2:42pm

Note that I stopped reading the PEP when I encountered the link to your talk, so missed the two links right after it. I still think an explicit bibliography makes sense.

brandtbucher · April 12, 2024, 5:40pm

Maybe. Despite the maybe-overly-broad title, I’ve tried to keep the PEP focused on this implementation specifically, rather than making it “a survey of all of the Python JITs”. What sort of information are you hoping to get from a section contrasting the two JITs?

While I have had short conversations with a couple of PyPy devs on different occasions (including brief tours of parts of their code), I’m far from an expert on PyPy’s JIT backend. So I would either need to do a lot deeper dive into the source code, or provide a perhaps-not-detailed-enough summary based on my own incomplete understanding from various papers, blog posts, and other resources available online.

brandtbucher · April 12, 2024, 6:03pm

Mostly echoing @mdroettboom’s excellent answers:

LLVM is used at build time to compile individual micro-op instructions into mostly-opaque blobs of machine code. These blobs are dumped into a header file, which is used to build CPython itself.

At runtime, we compile a sequence of micro-ops by “simply” copying the machine code for each one, almost verbatim. So instead of firing up LLVM, it’s “just” a memcpy and a couple of loops to fix up parts of the code. Implementation here, in C.

The PEP could probably be updated to make the run-time/build-time distinction clearer.

I’m not aware of any patents on the approaches used here.

No, when their refcount hits zero, and no.

…at least, that’s the state right now. The current memory allocation/freeing scheme can be summed up as “I got it working a while back and haven’t touched it since”. All of the points you make above are things I want to do, but simply haven’t had the time to work on yet.

One main thing to keep in mind is that I’d like to avoid introducing any “official” JIT APIs (or encouraging users to tweak them) while it’s still experimental. I could definitely see tunable controls for enabling/disabling the JIT, controlling max memory, and more control over GC of cold traces in the future, though, exposed through the sys module, environment variables, etc.

The good news is that all of the things you ask for should be easy to do. We already maintain a doubly-linked list of all JIT code (for unrelated correctness reasons) and have ways to turn the JIT on and off at runtime (for testing).

Yep, and it always will be (even on platforms where LLVM is available).

NeilGirdhar · April 12, 2024, 6:17pm

I was just curious if there were any fundamental design differences. No need for a detailed comparison!

kj0 · April 12, 2024, 7:22pm

Patents: Are you aware of any patents on the copy-and-patch approach or other advanced techniques you are using in the JIT, which may encumber the use of a JIT enabled CPython ?

IIRC, the trace trees technique or tracing JITs in general (I forgot which one) are patented by UC Irvine/ Andreas Gal and Michael Franz. But IIUC, they allow open source code to use it without getting sued. I am not a lawyer so please don’t take my word on this and actually consult a real expert.

The US Patent number is US 8,769,511 B2.

malemburg · April 14, 2024, 11:20am

Thanks for your answers, @mdroettboom and @brandtbucher

For companies using CPython this could be a concern (simply to avoid the risk of being sued for patent infringement) and it’s good to collect such knowledge, since good patent research is hard and expensive.

I did a quick search on Google Patents, but the only patent which came up is this one:

which only appears remotely related and is expired by now.

Which doesn’t mean that there may be other patents or pending ones related to the used JIT techniques. The paper was written by Stanford scientists, so it wouldn’t be surprising to find that a patent application is underway.

Thanks, @kj0 for the added information. Here’s the link to the patent: US8769511B2 - Dynamic incremental compiler and method - Google Patents

I tried to find our whether the UC has a policy of allowing open source projects using such patents, but wasn’t successful.

This may actually be something the PSF could get involved in for the benefit of our (commercial) users, by setting up a patent license pool to which PSF sponsors would then get access.

Thanks for clearing this up. So the comment in the “Rational” section was about using LLVM at runtime. You do use LLVM at build time to compile the templates, which are then copy-and-patched at runtime with the needed details, but not at runtime.

BTW: I think it would make the PEP easier to follow, if you’d add a sentence explaining what copy-and-patch compilation is all about, e.g. based on the one you find on Wikipedia:

“Copy-and-patch compilation is a simple compiler technique that uses pre-written machine code fragments that are then patched to insert memory addresses, register addresses, constants and other parameters to produce executable code.”

Great, so these things are on the radar. I was just wondering, since I still remember how Armin’s psyco used to have issues with significant RAM usage due to the JIT keeping too many traces in memory.

Great. Thanks for the confirmation.

daniele · April 14, 2024, 1:18pm

Why is the JIT any more of a concern patent wise than any other part of CPython?

malemburg · April 14, 2024, 1:46pm

It is not, but since JIT technology has seen quite a few patents in recent years (esp. due to the Java and JS VMs heavily using JITs), and because the copy-and-patch technology sounds like a rather new and clever idea, this triggered my question

jamestwebber · April 14, 2024, 4:19pm

“Hard, expensive, and valuable to corporations” sounds exactly like the type of work that corporations need to pony up for.

Maybe some group of them would be willing to dedicate funds for such a thing.

pitrou · April 15, 2024, 10:36am

Could you perhaps mention the slides in the PEP text, and make sure that PDF file is hosted in the PEPs repo itself, rather than in a personal GH repo that might become unavailable at some point?

pitrou · April 15, 2024, 10:51am

General question: since half of the tasks in Improving JIT code quality · Issue #115802 · python/cpython · GitHub seem already done on main platforms, by how much do you envision the remaining tasks to improve performance?

It seems there is a general contradiction between working on tiny unit of works (micro-ops) and using a JIT compilation scheme that doesn’t optimize accross micro-ops. Is it possible to break out of this contradiction while still benefitting from the advantages of the copy-and-patch approach?

diegor · April 15, 2024, 3:40pm

We are now at a point where a code execution can take different paths. Tier-1, tier-2, JIT… I found this explanation easy to understand and I took the liberty to create a diagram from it.

UPDATE: the above diagram is incorrect. Please scroll down to find the fixed version.

Maybe spend a few words (or even a simple diagram) to explain where the JIT fits in the whole picture.

guido · April 15, 2024, 4:00pm

Thanks for that diagram. I think there’s one decision point misplaced: the “hot enough?” question is only asked when either the JIT is enabled, or when the “uops” option is selected (via command line or env var).

smontanaro · April 15, 2024, 4:39pm

I’m confused by the use of the terms “tier 1” and “tier 2” in this diagram. I thought those terms referred to the level of support for different platforms. That doesn’t seem to be how they are being used in this diagram. Does a tier 1 platform become tier 2 part of the way through the decision bits related to JIT execution?

jamestwebber · April 15, 2024, 4:40pm

It’s a totally different set of tiers: the tiers of optimization. There are only so many synonyms for “level”.