PEP 703: Making the Global Interpreter Lock Optional (3.12 updates)

Pure Python code will not require changes.

I’d expect an abi4 by the time we switch to a single build, but not immediately.

Extension modules would produce two wheels per platform for 3.13 but that would not lead to doubling the total number of wheels, because extensions typically build for multiple previous Python releases. My expectation is that the two build modes would only be for 2-3 CPython releases, as described in the Python Build Modes section. (And based on comments, I think a single build mode after 2 releases is a better goal than after 3 releases.)

  • C code (compile time): The Py_NOGIL macro is defined.
  • Build: For building extensions, etc., the appropriate information is automatically propagated to sysconfig variables when the ABI flags are changed. These are used by setuptools, for example. (e.g., sysconfig.get_config_var('EXT_SUFFIX')).
  • Run time: The reference implementations provide the sys.flags.nogil to indicate whether the GIL is enabled at runtime, but that is not specified in the PEP.
15 Likes

So in the interim, when there are two build options, and projects will want to publish wheels for both nogil and gil builds, what would the appropriate wheel tags be for the nogil build? The tags are Python interpreter, abi and platform, as defined here. If there isn’t a different abi, which tag will distinguish?

As described in the Build Configuration Changes section, the ABI tag includes “n” for “no GIL”. For example,
cp313-cp13n-PLATFORM. (Or it could be “t” for “threading” if Brett prefers.)

Brett’s question was about a new stable ABI.

5 Likes

I guess there’s a bit of misunderstanding here. With nogil, you can actually leverage multithreading for performance, which implies more people will probably start to use threading than today, which also implies there will be some amount of pressure on existing pure-Python libraries that aren’t thread safe today [1] to change in this respect. Therefore, even the behavior of pure Python code does not substantially change, we can expect to see a shift towards multithreading and it could be useful to have, e.g., a PyPI classifier “Tested in multicore scenarios” [2].

Ultimately, though, what the expectations from the wider community towards libraries will be is not something the core team has control over.

Personally I don’t view these possible expectations as a problem; it is similar to how you don’t need to add type annotations to your code if you don’t want to (and it still works the same) but some users will request them and if you don’t want them, you’re free to just turn down the request. I think the ecosystem has lived fine with this principle on typing so far.


  1. For some definition of “thread-safe”. Of course, pure Python code doesn’t cause UB, and won’t with nogil either, but it can exhibit buggy behavior in multithread contexts. ↩︎

  2. or another bikeshed color ↩︎

14 Likes

Probably. As long as the right things are exposed in the right places it will just fall through via sysconfig and other packages that have to figure this stuff out.

Correct, that’s more my point. For instance, I have never once concerned myself to make any of my code up on PyPI be thread-safe, and in a no-GIL world I will very likely need to start caring.

I admittedly don’t share your optimism that the transition could happen in 2-3 years/releases, so in my head this takes 5 years and thus you do end up in a position where it’s doubled. Regardless, the key point is it’s increasing the number of wheels, not decreasing.

5 Likes

I doubt that similarity holds. Typing will stay optional in perpetuity (unless Python radically changes its character), while nogil is clearly aimed at becoming the new default.

Also, typing annotations are a somewhat subjective trade-off for an improved development experience, which is (IMO) nowhere near the level of impatience that tantalizingly unrealized :sparkles:performance:sparkles: gains inspire in people.

Do you really think numpy / pandas would be free to “just turn down the request”, or the reaction if that were to happen? It’s a hypothetical example because the dev’s there are working very hard to stay abreast of all CPython changes (and Sam already ported them for his fork), but for moderately widespread packages, I think the pressure will be enormous.


I’ve been thinking about how to reduce that pressure (and the runtime-switch discussion), and was wondering if there could be a mechanism for packages to opt-out of nogil completely, even if they publish wheels built for the new ABI? E.g. a module-level setting along the lines of “all calls in the namespace of our package need to be protected by the GIL, even if the rest of CPython and other libraries aren’t”.

That way, users would be able to install everything they want, and those packages declaring themselves nogil-ready would be able to realize those benefits, while those packages that aren’t ready would have time to work on it without depriving users of the gains in other libraries.

Basically, trying to uncouple the new ABI (+ the multi-threading safety review, etc.) from the dependency requirements of users – making it less “all or nothing”, which to me sounds like a recipe for frayed tempers.

3 Likes

We aren’t on the same page. My post that you are responding to is about pure Python packages, for which caring about correctness in multithreaded contexts will definitely remain optional, while your concerns are about C extensions.

I’m cognizant of not going in circles here. I believe a key point in this discussion is “how optional is nogil”, with several voices in this thread overestimating (IMO[1]) how easy it will be for projects to opt out, resp. for the broader infra to handle and the ecosystem to digest.

FWIW, I believe the parallel to typing (re: opt-in) came up for the first time, at least in this thread. Similarly, I don’t think someone mentioned a module-level opt-out of nogil as a thought experiment yet.


  1. while I’m not a CPython developer, I’m a SciPy maintainer, and very involved in conda-forge for many years (regularly across 100+ packages), so I don’t think my concerns fall under pointless bickering. ↩︎

1 Like

I think the PEP would be helped by making the point that pure Python code will not require changes much more central. I understand you are saying this “by omission” in the Backwards Compatibility section, buried deeply in the PEP, but a lot of the confusion in discussions seems to come from a vague idea that the GIL offer protection at the Python level. So perhaps the PEP could explicitly, and prominently in the abstract, explain that pure Python code which currently isn’t worried about concurrency will not have to worry about concurrency under the proposed changes.

7 Likes

This is described in the Py_mod_gil Slot section of the PEP.

4 Likes

That would enable the GIL for everyone, right? I think Vetinari was asking if a given module could say “I need the GIL whenever I’m running, but you can turn it off the rest of the time”

That sounds more complicated than the existing proposal, to me, but it would certainly be very useful if possible.

1 Like

Here is a PR plan: https://github.com/colesbury/nogil-3.12/wiki/PR-plan-for-PEP-703. I’ve noted the places where I’d expect the commits might be further broken up into a few PRs, but the extent of that would depend on reviewer preference.

Daan and I have talked about this by email and Skype and he’s offered to help with the issues and extensions on his side. We haven’t gotten into all the specifics and I don’t want to impose on Daan’s time while the SC still hasn’t decided if they want to pursue this PEP. I can ask Daan to comment here if you’d like.

I’ll need to think about what those microbenchmarks would look like. A number of people have written about using it in real applications, and I’ve also discussed scaling on a real application in my EuroPython talk. Here’s a link to the relevant section of the talk. I think those are more useful to understand scaling capabilities than microbenchmarks.

In terms of high-level limitations, the most common bottleneck is reference counting contention, but the extent to which it comes up depends on the application.

4 Likes

A note from the Faster CPython project in response to Greg’s question about bytecode specialization and optimization in a nogil world. Our ultimate goal is to integrate a JIT into CPython, although this is still several releases away (most optimistically, an experimental JIT could be shipped with 3.13).

We’ve had a group discussion about how our work would be affected by free threading. Our key conclusion is that merging nogil will set back our current results by a significant amount of time, and in addition will reduce our velocity in the future. We don’t see this as a reason to reject nogil – it’s just a new set of problems we would have to overcome, and we expect that our ultimate design would be quite different as a result. But there is a significant cost, and it’s not a one-time cost. We could use help from someone (Sam?) who has experience thinking about the problems posed by the new environment.

I expect that Brandt will post more details, but the key issue appears to be that much of our current and future designs use inline caches and divide the execution of a specialized bytecode instruction into guards and actions. Guards check whether the cache is still valid, and deoptimize (jump to an unspecialized version) when it isn’t. Actions use the cache to access the internals of objects (e.g. instance attributes or the globals dict) to get the desired result faster. An important optimization is guard elimination, which removes redundant guards. This is performed before machine code generation (JIT).

Free threading complicates the design of guards and actions. In 3.11 and 3.12, the GIL ensures that the state of the world doesn’t change between a guard and its action(s), so that the condition checked by the guard still holds when the actions run. Our plans for 3.13 include splitting bytecode instructions into micro-ops (uops), where each uop is either a guard or an action. But with free threading, it is possible for the world to change between a guard and a corresponding action. This can lead to unsafe use of an object’s internals, producing incorrect results and even crashes.

Solving such problems with locking would likely be slower than just not specializing at all, so we will need to be cleverer. We’re entering uncharted territory here, since most relevant academic papers and real-world implementations use either a statically typed language (Java) or a single-threaded runtime (JavaScript).

A relatively speedy decision timeline would benefit us, since our schedule for 3.13 and 3.14 will be quite different depending on what happens with nogil. If nogil is accepted, we’ll have to prioritize making sure that the specialization implementation is thread-safe, and then we have to design a new, thread-safe approach to a JIT.

It looks like Sam left us quite a bit of work regarding the thread-safety of the current specialization code (i.e., what’s already in 3.12). Brandt can explain it better. Even if we decide that for now we’re better off just not specializing when CPython is built with nogil (since it will be used mainly to create multi-core apps), that’s only a temporary measure that can buy us 1-2 releases, but I don’t expect that we would continue work on our current JIT plans that depend on a GIL; instead, after salvaging the existing specializations, I expect us to go back to the drawing board and come up with a new plan. This will set those JIT plans back by 1-2 releases, unless additional funding appears.

In the meantime we’re treading water, unsure whether to put our efforts in continuing with the current plan, or in designing a new, thread-safe optimization architecture.

16 Likes

This is not the kind of thing we’ll plan out; just something we’d expect to happen in the wider community.

free-threading is a low level concept offering. I expect higher level frameworks will always help people make better sense of it. concurrent.futures for example.

I’m sure async frameworks could construct their own concepts of parallel event loops or even a fiber like concept on top of threading. Defining on their own how to specify when work can happen in another OS thread or not. Implementations successfully doing that on top of threads are already AFAICT widely used in other languages.

Would you (or Brandt) please be more specific?

nogil-3.12 identifies one possible thread-safety issue with specialization: reading from inline bytecode caches while another thread is writing them. Since the caches are only written once (during specialization), nogil-3.12 gets around this particular issue by specializing each instruction only one time in the presence of threads and locking around all specialization attempts. In practice, this means that a failing specialization (such as an attribute lookup on an instance of a new or modified class at the same site) cannot “give up” and re-specialize as it can when run single-threaded. This is problematic, since a failing specialization is slower than no specialization at all.

We’ve spitballed a few possible fixes for this, including locking (or tolerating races on) the deopt counters of the specializations that don’t use inline caches, or loading the opcode and all caches in one atomic operation. We don’t yet have a good idea of how viable these approaches really are.

However, this is not the only type of race that can occur. It’s often unsafe to run code between a specialized instruction’s “guards” and “actions”, since that can defeat the purpose of guards. In general, it’s not enough to just protect code objects, since guards and their actions may depend on the state of any arbitrary object for thread-safety.

While nogil-3.12 modifies 8 of the ~60 total specializations to improve their thread-safety, it is still possible to crash at least 7 of the modified instructions (as well as several unmodified ones) from (admittedly contrived) pure-Python code.

I’ve identified ~20 specializations that appear to be unsafe in their current form, so there is probably still quite a bit of work to do before specialization is truly thread-safe. It’s not yet clear to us how best to fix them, what the performance hit will be now, or how making them thread-safe will complicate our attempts to remove and reorder guards later.

12 Likes

It makes sense as a mechanism for initial experimentation and stability. Consider it a transitional period. If existential issues crop up as the ecosystem readies itself and attempts using it in practice those would be signs of things to fix before we would be comfortable declaring it not ready for prime time (ie: default/only behavior).

(We’ve done this in the past: from __future__ import annotations is a nice recent example. We planned that future and its default behavior change release and found via the community that the original plan was wrong… so we paused and altered our future.)

The underspecified future in the PEP is actually something I think we here and the steering council can help craft a better “Plan A” set of goals for.

It should be clear to everyone that if we ship a --disable-gil option and it’s ABI (which I’ll call t because I don’t think the negative n is good letter for it): The ultimate goal would be for it to be the default and only behavior in the long run iff no existential blockers are found during the exploratory release period. It is probably too early to declare a fixed time frame for that up front. I’d just state minimum times (N releases) and goals on when and how we expect to actually decide if we’re good to move forward.

Questions exist in the interim that we should really officer advice for in the PEP: What is a CPython distributor supposed to do? Which build should they ship? Does python dot org also ship t release builds? Do we need an official new name for a python3 entry point binary built that way? It this an opportunity to require that new fangled py launcher for this selection purpose or would that become a curse?

3 Likes

When talking about pure Python code I do not think this is true. Your pure Python PyPI packages are already in use by people using multithreaded applications in which any given Python operation can be interrupted with a thread change at any point. Our existing GIL-threading and signal handling implementation has always allowed interrupts at almost any Python bytecode point.

It isn’t really relevant whether or not the threads run concurrently in time or in today’s cooperative execution style that the GIL has so far enforced. At the high level of Python, the risks are the same. Just as multithreading’s requirements from code doesn’t change on a one core system vs a multi-core system at the application level.

I believe there are some constructs that people might “rely” on (intentionally or not) as being “atomic” from a Python point of view (even though I don’t have any on the top of my head); but doing so has always been dangerous because we’ve even changed some of that behavior over the years as we tweak when GIL-thread-swap/signal-checks happen within our evolving eval loop dispatch system. Our atomicity guarantees are mostly around individual basic data type modification not causing crashes. dict or list modification from multiple threads for example. You could never tell what the order of operations would be but nothing is going to corrupt the internal state or either (that’d be unpythonic). Thus per data structure locks being added and the lock acquisition ordering trick laid out in the PEP IIUC.

13 Likes

My intuition is telling me there are some existing reusable mechanisms for synchronization of specialization guard checks leading into actions involving thread local structures and potentially versioned global ones and a less-frequently acquired specialization/JIT lock (SJL pronounced “Sigil” spelled “SiJiL”?) for the designed-to-be-less-frequent times at which un-ignorable[^] conflicts could arise.

But if I try to spell out how I see that working without spending some hours mulling things over and sketching it out I’m likely to get it wrong and add confusion. So I’ll just drop that as a food for thought hint for those who’s heads are already in that space.

[^] - “ignorable” because execution could just bail to the safe/slow code path as you’ll be back here executing the same action in the future and get another chance at the dynamic code improvement lottery if it hasn’t already been done for you.


Non fleshed out :blinking-under-construction-banner-from-the-90s: brainstorm: Something like a per-thread guard pointer having an action version and always checking that the thread local guard pointer’s version matched the action version else either triggering a resync from the global pointers or setting a “resync” bit and bailing to the slow safe path to not pause execution during an unlikely version conflict? A resync might require a lock or at least an atomic read and a maybe a write barrier or two - it’d be responsible for updating thread local pointers from the global state. The viability of that may depend on the complexity of guards - I’m thinking high level right now in lieu of personal specialization internals/plan knowledge. (I’m used to thinking of guards as intentionally trivial boolean operations).

I like to assume this kind of thing has already been done in JVMs or way back into Smalltalk and whatnot land and covered in related papers. I’m the wrong person to ask, y’all likely already know of any such references. =)

3 Likes

Thanks for the list of issues and test script. I will look into them.

6 Likes