A note from the Faster CPython project in response to Greg’s question about bytecode specialization and optimization in a nogil world. Our ultimate goal is to integrate a JIT into CPython, although this is still several releases away (most optimistically, an experimental JIT could be shipped with 3.13).
We’ve had a group discussion about how our work would be affected by free threading. Our key conclusion is that merging nogil will set back our current results by a significant amount of time, and in addition will reduce our velocity in the future. We don’t see this as a reason to reject nogil – it’s just a new set of problems we would have to overcome, and we expect that our ultimate design would be quite different as a result. But there is a significant cost, and it’s not a one-time cost. We could use help from someone (Sam?) who has experience thinking about the problems posed by the new environment.
I expect that Brandt will post more details, but the key issue appears to be that much of our current and future designs use inline caches and divide the execution of a specialized bytecode instruction into guards and actions. Guards check whether the cache is still valid, and deoptimize (jump to an unspecialized version) when it isn’t. Actions use the cache to access the internals of objects (e.g. instance attributes or the globals dict) to get the desired result faster. An important optimization is guard elimination, which removes redundant guards. This is performed before machine code generation (JIT).
Free threading complicates the design of guards and actions. In 3.11 and 3.12, the GIL ensures that the state of the world doesn’t change between a guard and its action(s), so that the condition checked by the guard still holds when the actions run. Our plans for 3.13 include splitting bytecode instructions into micro-ops (uops), where each uop is either a guard or an action. But with free threading, it is possible for the world to change between a guard and a corresponding action. This can lead to unsafe use of an object’s internals, producing incorrect results and even crashes.
Solving such problems with locking would likely be slower than just not specializing at all, so we will need to be cleverer. We’re entering uncharted territory here, since most relevant academic papers and real-world implementations use either a statically typed language (Java) or a single-threaded runtime (JavaScript).
A relatively speedy decision timeline would benefit us, since our schedule for 3.13 and 3.14 will be quite different depending on what happens with nogil. If nogil is accepted, we’ll have to prioritize making sure that the specialization implementation is thread-safe, and then we have to design a new, thread-safe approach to a JIT.
It looks like Sam left us quite a bit of work regarding the thread-safety of the current specialization code (i.e., what’s already in 3.12). Brandt can explain it better. Even if we decide that for now we’re better off just not specializing when CPython is built with nogil (since it will be used mainly to create multi-core apps), that’s only a temporary measure that can buy us 1-2 releases, but I don’t expect that we would continue work on our current JIT plans that depend on a GIL; instead, after salvaging the existing specializations, I expect us to go back to the drawing board and come up with a new plan. This will set those JIT plans back by 1-2 releases, unless additional funding appears.
In the meantime we’re treading water, unsure whether to put our efforts in continuing with the current plan, or in designing a new, thread-safe optimization architecture.