PEP 703: Making the Global Interpreter Lock Optional (3.12 updates)

I tried to do something like that above but a) I didn’t test it very well and b) I didn’t explain it well. I think those use who reach for multiprocessing on a regular basis [1] are a bit dazzled by the possibilities here and not taking the time to make a argument for general python. :star_struck:

I don’t think the idea needs a keyword–it’s just a parallel version of itertools (a Python equivalent to Rust’s rayon). These functions don’t protect you from shooting yourself in the foot but for lots of common tasks (like “iterate over a million things and do a thing to each one”) it should be a drop-in replacement.

I realized my previous example was partially hamstrung by using a 2-vCPU machine (which is one physical CPU). On an n2d-standard-8 (4 physical CPUs), I can use the parmap defined in the previous post to parallelize a map operation and get about a 4x speedup vs python 3.11.

Test details

I cloned nogil 3.12 and built it, and installed python 3.11 from conda-forge (I could clone python 3.12 and build that, but this was simpler). In each I just ran an interpreter and defined:

>>> from concurrent.futures import ThreadPoolExecutor
>>> from time import time
>>> def parmap(fn, *args, n=8):
...     with ThreadPoolExecutor(n) as exc:
...             yield from exc.map(fn, *args)
... 
>>> def a(n):
...     for i in range(n):
...             i += 1
... 
>>> m = [1_000_000] * 100

Then I just timed the difference between list(map(a, m)) and list(parmap(a, m))

Results:

command py311 nogil nogil / py311
a(1_000_000) 0.0214 0.0231 1.08
list(map(a, m)) 2.123 2.577 1.21
list(parmap(a, m)) 2.191 0.545 0.249

It’s curious that the plain map is so much slower when the single function call is only 8% slower [2] but I hope this demonstration is somewhat useful for illustrating why nogil has the potential to be broadly useful.


  1. people using python for science, finance, ML, etc ↩︎

  2. which is still nothing to ignore, for sure ↩︎

6 Likes

Thank you for the example.

Consider this quote from Charlie Marsh (ruff author)

The “fearless concurrency” that you get with Rust is a big one though. Ruff has a really simple parallelism model right now (each file is a separate task), but even that goes a long way. I always found Python’s multi-processing to be really challenging – hard to get right, but also, the performance characteristics are often confusing and unintuitive to me.

One could wonder if ruff would even be needed in a world where you could lint one file per thread on multi-core python

I’ll just preface this by saying that I greatly appreciate all the work that’s gone into this so far and I also appreciate the time taken to talk to randos like me who wandered in here.

That’s definitely true. But if I’m dependent on a library that’s multi-threaded, surely I have to wait until they release a nogil version? If their nogil version has problems, that’s a bug in their library, not a problem with mine.

I recognize that the transition could be painful [1], it doesn’t feel fair to count every dependency of a dependency as needing a big lift to be compatible (any more than the usual effort needed for a major release, for instance)

And to be clear, I wasn’t making a demand or anything. I just thought that was a useful discussion to have (as Eric Snow also gets into above).


  1. and I think more info about how some major packages had to be modified would be really useful! ↩︎

I am also mostly a bystander here but I would like to express how much this would help Datadog (the company I work for) and its customers. I will be speaking in a personal capacity and therefore anything I say is not necessarily endorsed by my employer.

The source of most of the data we collect comes from our Agent that users install on their infrastructure. This collects metrics, events, logs, traces, process information, network information (eBPF), and more that I don’t feel like enumerating (and definitely other stuff that I don’t even know about). This software runs on an extremely large number of hosts across the globe for many customers (small subset) whose names would be familiar to the average person anywhere.

I work on a team that creates and maintains integrations that are shipped OOTB with the Agent. These could be for databases like Postgres, web servers like NGINX, Kubernetes running anywhere, hypervisors like vSphere, IaaS cloud offerings like Azure IoT Edge, SaaS cloud offerings like Amazon MSK (Apache Kafka), hardware devices like Cisco routers, system internals like disk partitions or Windows services, broadly used things like TLS/certificates, etc.

Basically, for anything our customers care about, it is my job to find a way to extract meaningful data :slightly_smiling_face:

Most of these Agent integrations are written in Python, with each being its own (namespaced) package. While the Datadog Agent is written in Go, integrations aren’t usually for more rapid maintenance and to make it easier for customers since they can not only contribute but also run custom integrations just for their own use cases.

A single Agent will often have enabled at least a dozen integrations, usually with many instances of each. This level of concurrency with the resource usage required is directly hindered by the GIL and the way we ameliorate the constraint is hacky.

We wrote a little about connecting Go to Python here. This component is called the rtloader and here is an example of exposing a new binding to integrations (fun fact: CI passing on the first commit and getting merged was by sheer luck because my dev environment broke that morning and I could not build!).

We run each instance of each enabled integration in its own goroutine, which are assigned to a runner for work. By default, the number of runners is set to 4 and each instance is scheduled to run every 15 seconds. The GIL is managed by Go and while there is “concurrency” and performance is better than without, there is not parallelism, meaning although execution of instances can ping-pong back and forth between each other rapidly based on syscalls and other heuristics there can only ever be one running at any given moment.

Being limited by the GIL has had a negative, material impact on us. Occasionally, a customer’s environment is so large that we simply have to rewrite the integration in Go. Other times a customer has to work out how to best spread the load between different configured instances of an integration or even run multiple hosts to distribute load to different Agents. Every time I hear of a performance issue for a large customer that cannot be resolved I think “darn, a rewrite is coming” or I just feel sad that so much compute is wasted working around the lack of parallelism.

Having the option (hopefully default and only eventually) to remove the GIL would make our use case actually work without hardship for customers and us engineers. Additionally, although I’m not allowed to say how many Agents are deployed this moment nor can I give my savings estimate for fear of inferring the former, the optimization of our resource usage would have a very positive impact on energy consumption/the environment, for those who are concerned about that. We are still growing rapidly so the environmental impact would be commensurate to our growth.

If it is optional for some time rather than the default and only, dependencies that provide extension modules would have minimal impact for us since we already have a build system that can do everything from scratch and is not necessarily reliant on provided wheels. To be clear, my preference would be for this to be the default (Python 4?).

Some notes:

  1. I don’t have much insight into our backend but much of it is still written in Python (less so over time) and we would benefit there just as much as Instagram and other organizations
  2. I write a lot of CLIs and single threaded performance would help me the most, so please interpret my advocacy for proper parallelism as in direct opposition to my personal preference because I view that as the best path here for the long term future of Python
14 Likes

Thanks, Eric!

Does this mean it will be possible to declare an object graph (e.g., a dictionary with many complex objects) read-only in many subinterpreters and modify it in one other subinterpreter?

Or would it need to be “frozen” for all subinterpreters before being shared? As long as the sharing has little overhead, that could be repeated often, so could be workable as well.

In the end, I guess everything in this space is a discussion of who bears overheads: CPU and memory, core devs, extension devs, application devs / language users, and also documenters/teachers.

So far, Python’s low conceptual overhead placed a tremendous burden on core devs and (low-level) extension devs to the great benefit of language users and application devs. Perhaps this share of the burden is no longer fair in the world of many cores, given the maturity of the language and the huge ecosystem it grew.

:wave: Hello from Pantsbuild! (Mentioned above by @davidhewitt, PyO3’s author above)
I couldn’t tell how community feedback was expected to be collected, so I’ll put my little blurb here.

Our project uses a Rust engine leveraging tokio’s async event loop and wiring that up to Python’s. The engine itself is implemented as an extension module, and the application is a Python app. The only reason we use Python anymore is because it’s a VERY great language for rapid development and readability. We just couldn’t expect our wonderful plugin authors to write everything in Rust.

At the boundary, we’re consuming read-only deep Python data structures. We leverage PyO3 to make that simple. (An interpreter pool could work if those could read shared memory AND PyO3 consumed it, but still not ideal)

Python was also the first language supported in our build system in V2, and I’d argue we have best-in-class monorepo support for Python. So, we <3 Python.

The GIL is one of the reasons we see poor multi-core performance in a lot of scenarios. When we measure those we port them to Rust. Not because Rust is fast but because we can do work outside of the GIL.

I can likely make a very strong case for porting the Rust code back to Python if the GIL wasn’t holding us back. Having the code be pure-Python means it’s easier to read for everyone, and we don’t split ourselves between Rust and .pyi to get good type support. And faster development.

Lastly, our use case doesn’t rely on 3rdparty libraries (we do today, but those can easily be removed). So we’d immediately benefit from a no-gil CPython (once PyO3 support was merged). I actually tried it out, but hit a few bugs/snags. I’m eager to try it again.

So, there’s our data point :smile:

20 Likes

For sure yes. But even more important: with typing already in place, new code can be directly written in Python and compiled to native code. Pure Python libraries do not disable the GIL as extensions may do, but if the GIL is not in place and the code can be compiled (or JIT-compiled), many libraries will be directly written in Python.

mypyc is already a good example and many other compilers would surely pop up.

Let’s envision a future in which most of Python is written in Python itself until it can 100% bootstrap itself.

3 Likes

Yes, because most of the C code that I write leans heavily on the GIL for its own sanity. Given the choice between implementing my own fine-grained locking vs. assuming/hoping/dreaming that someone will make ctypes totally safe, I’d probably rest my hopes on someone adopting ctypes maintenance. (I’m not doing perf-sensitive stuff mostly, it’s OS integration.)

1 Like

Revisiting this thread and had some additional thoughts/questions [1]

I’m curious why? Do you think some people will prefer to keep the GIL around for simplicity, or that single-threaded performance will always remain better with it enabled, or some other reason?

I would think the ideal would be “no GIL, but also no performance hit for single-threaded code”, in which case there’s no need for a runtime mode. But maybe that isn’t possible [2]

Separate builds is a big infrastructure challenge [3] , but from a user perspective it’s nice that it makes compatibility crystal-clear: if you’ve released a wheel for nogil, you’re claiming compatibility. Whereas with a runtime flag, I might mix together compatible and incompatible extensions without realizing (even if there are warnings and documentation and so on, people will do this).

This is all to ask: can people imagine a transition of a split build for a release or two, and then no GIL thereafter (i.e. skip the runtime flag)? To me this seems like the most desirable outcome, and potentially a simpler transition, both for users (if you can install it, it should work) and for developers (release normally for 3.X, develop/test on 3.X-nogil, and be ready for 3.X+1 which is nogil by default).


  1. can you tell I’m excited about this PEP? ↩︎

  2. if future versions of single-threaded Python rely on the GIL for maximized optimizations ↩︎

  3. maybe a corporation with nogil aspirations could pay the bot bill…? :wink: ↩︎

Not necessarily. For instance, will pure Python projects be expected to declare they support free threading?


@colesbury Speaking of packaging and declaring support, after re-reading the PEP this weekend I assume the expectation is we will have to define an abi4 or abi3t to signal a stable ABI that supports no-GIL (FYI I prefer t or n to signal “threading” over “no GIL” as an ABI suffix)?

The PEP also mean extension module wheels will double the number of CPython wheels they produce if they want to support both with and without a GIL, correct?

And what’s your suggestion on how to tell if you are running a CPython with or without the GIL? Via sysconfig or some other mechanism (i.e., I didn’t see any mention of some change to sys to expose this)? Up until this point you could tell e.g. CPython from PyPy by the binary name as well as the feature version, but that isn’t the case here.

1 Like

I was assuming that a pure Python project is not going to have issues. This is almost the definition of nogil “working” at all, in my eyes: correctly run Python code. As I understand it [1], if a pure Python project is working now with the GIL, it shouldn’t have a problem without it. As H Vetinari pointed out, it’s totally possible that there are some deep assumptions somewhere in CPython that make this untrue, but it doesn’t sound like they’ve been uncovered by the testing so far.

For more complicated dependency graphs, it seems like there’s a transitive property. If I need A, and A depends on B, … and Z is a C extension that hasn’t been released for nogil, then I can’t install A because pip won’t resolve the dependencies.

If we imagine there were no compatibility concerns at all, and it was just a breaking change to the ABI that required all extensions to be rebuilt, that would be the scenario, right? Everything in pure Python still works but anything that depends on a compiled extension has to wait until it’s available.


  1. e.g. based on Sam’s post earlier ↩︎

1 Like

Hmm, if you can’t tell from the running interpreter that it’s a nogil build, how can packaging build backends know to build a nogil version of a package for that interpreter? Pip, for example, will try to build a wheel if one doesn’t exist - and it does that using default build settings, so there’s a built in assumption that “build a wheel” will by default build one that’s compatible with the running python.

Also, packaging tools need to know if the running python is nogil in order to determine what wheels are compatible with this interpreter…

1 Like

This point is crucial – is pip and related infrastructure ready to serve a dual-ABI world? That’s also breaking an ancient assumption. I brought it up in the other thread, but didn’t get much response.

Given the situation in the packaging space (maintainers far above capacity, huge amount of complexity / tooling diversity, many long-standing & severe painpoints, lots of legacy cruft, etc.), I doubt that trying to push such a huge change through this bottleneck will be helpful[1], i.e. this should factor into the decision of whether to aim for parallel ABIs or version-separated ones (i.e. Python 3.x=with gil / Python 4.x=nogil).

It doesn’t need “deep assumptions somewhere in CPython” for pure python code to be possibly affected.[2]

Yes, and that’s entirely reasonable. If you depend on Z, and it’s not nogil-safe, you cannot use it in a nogil-only context without stuff potentially blowing up. The situation does explain the desire towards the runtime switch though (or more generally, running gil & nogil code in the same interpreter). If that encapsulation could somehow be made fool-proof, then you could just[3] run Z with the gil, but keep all the other dependencies running without it.

To be clear – I want nogil to succeed – but I’m trying to point out how large the surface for potential problems is (which explains a large part of the hesitation in this thread). There’s also a severe mismatch here (well, even more than usual) in how much the natural desire of users for an obvious improvement will translate into a huge chunk of work for volunteer maintainers all over. It’s great that there’s so much excitement about it, but more often than not, that excitement will turn into impatience rather than someone assisting the maintainers in climbing down some really deep debugging rabbit holes.


  1. but then, as someone who’s very involved in packaging, I acknowledge my biases on this ↩︎

  2. In a footnote because I’m repeating myself: it’s enough to rely on non-trivial third-party dependencies, or do some things (like IO) where threading might change observable behaviour (especially losing some implicit ordering in handling external resources like files, opening the possibility for data races, concurrent access problems, etc.); aside from the fact that even pure python libraries might not be ready for being called in parallel from other, now suddenly nogil, python code. ↩︎

  3. a very load-bearing “just” ↩︎

3 Likes

I’m trying to understand the boundaries of that surface, partially because I’m just curious but mostly because I think defining that surface more precisely would be useful for this discussion. It’s somewhere between “everything could break” and “extensions using the ABI need an update” and narrowing down that range is helpful.

To be clear, I think this is a feature of having a different ABI, not a problem[1].

I’m just trying to understand the precise failure modes here [2].

  • If my code doesn’t use any threading now, it shouldn’t change behavior as long as the dependencies work. nogil Python isn’t going to change the ordering of what I’m doing in a single-threaded program.
  • If my dependencies work and they aren’t compiled extensions, they should continue to work whether or not they use threading, because a broken use of threading is already broken. The existing threading tools allow for arbitrary execution of different threads, and anyone using them has to account for this with the appropriate design, locks, etc.
  • If I need dependencies with compiled extensions, they might have issues with nogil, but they will need to be updated and released for nogil before I can worry about them.

Regarding “this library isn’t safe to call from a multi-threaded context”, it seems like that would have to already be true, and it’s just that no one is doing it right now. In my opinion that doesn’t mean the package is broken by this PEP, but the interest in adding this feature would increase. I already encounter this all the time in the multiprocessing context [3]. It can be annoying but I don’t usually consider it a bug in the library.

If there are issues beyond the above, I guess that’s what I meant by “deep”, in that they are not something the package developer realized they were relying on and they were (probably?) not documented behavior. If there are data races lurking in CPython that are exposed by (otherwise correct) multithreading code [4], they have yet to be found by Sam Gross or the others who have been using nogil thus far. Therefore, pretty “deep”, in my eyes. That’s not to say they don’t exist.

I understand the hesitation but it’s also a little funny to me. It seems…good? to introduce language features that are so popular that users clamor for them to be implemented. That seems like the sign of a good PEP, and more generally the sign of a good major version.

For what it’s worth, I solemnly swear to help debug/contribute PRs any of my dependencies that need to be adapted. :sweat_smile:


  1. the packaging question is a good one though, and strengthens the argument for version separation ↩︎

  2. at the very least, I need to understand this for my own code! ↩︎

  3. because many objects can’t be pickled and passed to subprocesses ↩︎

  4. and again, this should cover compatibility for any set of pure-Python packages ↩︎

3 Likes

Pure Python code will not require changes.

I’d expect an abi4 by the time we switch to a single build, but not immediately.

Extension modules would produce two wheels per platform for 3.13 but that would not lead to doubling the total number of wheels, because extensions typically build for multiple previous Python releases. My expectation is that the two build modes would only be for 2-3 CPython releases, as described in the Python Build Modes section. (And based on comments, I think a single build mode after 2 releases is a better goal than after 3 releases.)

  • C code (compile time): The Py_NOGIL macro is defined.
  • Build: For building extensions, etc., the appropriate information is automatically propagated to sysconfig variables when the ABI flags are changed. These are used by setuptools, for example. (e.g., sysconfig.get_config_var('EXT_SUFFIX')).
  • Run time: The reference implementations provide the sys.flags.nogil to indicate whether the GIL is enabled at runtime, but that is not specified in the PEP.
15 Likes

So in the interim, when there are two build options, and projects will want to publish wheels for both nogil and gil builds, what would the appropriate wheel tags be for the nogil build? The tags are Python interpreter, abi and platform, as defined here. If there isn’t a different abi, which tag will distinguish?

As described in the Build Configuration Changes section, the ABI tag includes “n” for “no GIL”. For example,
cp313-cp13n-PLATFORM. (Or it could be “t” for “threading” if Brett prefers.)

Brett’s question was about a new stable ABI.

5 Likes

I guess there’s a bit of misunderstanding here. With nogil, you can actually leverage multithreading for performance, which implies more people will probably start to use threading than today, which also implies there will be some amount of pressure on existing pure-Python libraries that aren’t thread safe today [1] to change in this respect. Therefore, even the behavior of pure Python code does not substantially change, we can expect to see a shift towards multithreading and it could be useful to have, e.g., a PyPI classifier “Tested in multicore scenarios” [2].

Ultimately, though, what the expectations from the wider community towards libraries will be is not something the core team has control over.

Personally I don’t view these possible expectations as a problem; it is similar to how you don’t need to add type annotations to your code if you don’t want to (and it still works the same) but some users will request them and if you don’t want them, you’re free to just turn down the request. I think the ecosystem has lived fine with this principle on typing so far.


  1. For some definition of “thread-safe”. Of course, pure Python code doesn’t cause UB, and won’t with nogil either, but it can exhibit buggy behavior in multithread contexts. ↩︎

  2. or another bikeshed color ↩︎

14 Likes

Probably. As long as the right things are exposed in the right places it will just fall through via sysconfig and other packages that have to figure this stuff out.

Correct, that’s more my point. For instance, I have never once concerned myself to make any of my code up on PyPI be thread-safe, and in a no-GIL world I will very likely need to start caring.

I admittedly don’t share your optimism that the transition could happen in 2-3 years/releases, so in my head this takes 5 years and thus you do end up in a position where it’s doubled. Regardless, the key point is it’s increasing the number of wheels, not decreasing.

5 Likes

I doubt that similarity holds. Typing will stay optional in perpetuity (unless Python radically changes its character), while nogil is clearly aimed at becoming the new default.

Also, typing annotations are a somewhat subjective trade-off for an improved development experience, which is (IMO) nowhere near the level of impatience that tantalizingly unrealized :sparkles:performance:sparkles: gains inspire in people.

Do you really think numpy / pandas would be free to “just turn down the request”, or the reaction if that were to happen? It’s a hypothetical example because the dev’s there are working very hard to stay abreast of all CPython changes (and Sam already ported them for his fork), but for moderately widespread packages, I think the pressure will be enormous.


I’ve been thinking about how to reduce that pressure (and the runtime-switch discussion), and was wondering if there could be a mechanism for packages to opt-out of nogil completely, even if they publish wheels built for the new ABI? E.g. a module-level setting along the lines of “all calls in the namespace of our package need to be protected by the GIL, even if the rest of CPython and other libraries aren’t”.

That way, users would be able to install everything they want, and those packages declaring themselves nogil-ready would be able to realize those benefits, while those packages that aren’t ready would have time to work on it without depriving users of the gains in other libraries.

Basically, trying to uncouple the new ABI (+ the multi-threading safety review, etc.) from the dependency requirements of users – making it less “all or nothing”, which to me sounds like a recipe for frayed tempers.

3 Likes