PEP 703: Making the Global Interpreter Lock Optional (3.12 updates)

FWIW, from a technical standpoint, subinterpreters would work fine right now for sharing a large set of read-only data. However, from a practical standpoint, the Python 3.12 stdlib won’t have a good way to take advantage of that. I do expect 3.13 to have such tools, via PEP 554. (There will probably be good tools for 3.12 on PyPI within the next year too.)

1 Like

I know this is going to sound a bit entitled, but I would like to see some folks from the scientific and financial python communities band together and show how Python the language and the stdlib can be improved in the noGIL fork. Given the enthusiasm for this PEP (and the lamenting that it isn’t accepted yet) I’m surprised no one from those communities have suggested so in this thread already. For you the benefits might be obvious but unless you provide evidence that it can benefit all Python users there is still room for doubt. I know I would very much like to see this and I am mostly in favor of this PEP!

I was thinking of a feature inspired by what @ntessore mentioned earlier: a par soft keyword. This keyword would be put in front of a for block to tell python that the loop should be run concurrently if possible (in the CPython case using threads but it should be up to the implementation to decide on the concurrency model). Implementing this based on the noGIL fork and showing that it can give immediate benefit for certain tasks would be a huge benefit to this PEP. I actually want to do this myself but I don’t know enough about CPython nor the noGIL fork to do it without considerable effort.

3 Likes

Do you mean sharing data that is purely binary/ can be treated as a buffer, like numpy arrays? or any set of python objects - I think Mark Shannon alluded to the fact that the latter is presumably harder. But serializing and deserializing objects would make subinterpreters less performance competitive and have other hurdles.

I assume there is the issue that there isn’t a well-defined way to check or guarantee a general object is “read only/immutable”.

I finally found some time to read through PEP 703 in detail. It’s well written and covers a lot of ground effectively. Thanks for such diligent work, @colesbury! There are a small number of gaps, which I’ll note below. I didn’t notice any strictly incorrect information. I do have some questions. I’ll also provide a few related insights and recommendations. Thanks for all the great work!


Here are some initial observations/recommendations/corrections which would be worth addressing in the PEP:

  • GC: stop-the-world should only affect the current interpreter
  • the PEP should clearly enumerate every way it might break interpreter isolation, if any
  • PyThreadState: a status field already exists (AKA _status)
  • PyThreadState: we could track attached/detached status right now (AKA “holds_gil”)
  • some exclusions from the limited API seem problematic (e.g. critical sections)
  • consider adding Py_mod_gil_used for the Py_mod_gil slot, to let people be explicit
  • enabling the GIL (due to ext. module/$PYTHONGIL) should only affect the current interpreter
  • “Backward Compatibility”: ABI incompatibility is more than PyObject header (e.g. PyThreadState, PyTypeObject)
  • “Backward Compatibility”: clarify that “global state” includes per-interpreter module state
  • “Build Bots”: FYI, the buildbot fleet is already lagging behind demand, so ISTM doubling the demand will require a lot of new resources and supervision
  • “How to Teach This”: part of the purpose of “How to Teach This” is to identify the various things that would need to be taught, which would benefit readers of this PEP

Likewise, here are some questions:

  • how are tp_version_tag and keys_version kept thread-safe?
  • how to relax restrictions on specialization?
  • are single-phase init (legacy) extensions allowed? (presumably yes, and treated same as missing Py_mod_gil_not_used)
  • how to minimize burden on extension maintainers? (very similar to the concern with per-interpreter GIL)
  • are there other global no-gil details that should be per-interpreter?
  • does PyThread_Swap() work the same with no-gil? (i.e. same thread, different thread states, maybe different interpreters) (breaking this would be a serious problem)
  • just to restate, does the per-thread state (thread-local var?) accommodate swapping different PyThreadStates in and out (whether same interpreter or different interpreters)?
  • extensions: how to ensure thread-safety for global state in external libraries? (basically same as with per-interpreter GIL)
  • “Backward Compatibility”: how can we be crystal clear about which borrowed reference APIs are safe? (alt name + macro? hide old name?)
  • “Backward Compatibility”: does the PyMem_SetAllocator() restriction only apply to object allocator? (i.e. are “raw” & “mem” allocators left alone?)
  • “Backward Compatibility”: who might be affected by the restriction on PyMem_SetAllocator()?
  • “Backward Compatibility”: is it okay to replace the actual “malloc” (not via PyMem_SetAllocator)?
  • “Backward Compatibility”: how to enforce restrictions on when the object allocator should be used?
  • “Build Bots”: how to support double(?) the demand?
  • “Build Bots”: who is going to set up and admin the new buildbots?
  • “Rejected Ideas”: how would introducing write barriers break the C-API? (this is relevant to recent C-API discussions)
  • what would it take to build a python binary with both modes provided?
  • could no-gil be done as a single ABI?
  • “Integration”: merge base mimalloc sooner, regardless of no-gil?
  • “Integration”: what other parts might we want to merge, regardless of no-gil?
  • “Integration”: in what other ways could the no-gil patch be split up? (seems like a lot of it is pretty deeply coupled)

Here’s something we should do in 3.12+ from which no-gil can benefit:

Let’s rename the “own_gil” field of PyInterpreterConfig to just “gil”. It’s currently a bool, so we’d have two values: “OWN” and “SHARED”. With no-gil we’d add two more values: “NEVER” (strictly disallow unsupported extensions, AKA PYTHONGIL=0) and “NO”. What would PYTHONGIL=1 match?

(This should be done in 3.12 since PyInterpreterConfig is new, so we wouldn’t have to break compatibility or deprecate it. I’ll make the change right away if our dear 3.12 release manager, @thomas, is okay with it. See gh-105603.)


Here’s one idea on how to build a single python binary with both modes provided (hence no ABI incompatibility for extensions and no new ABI tag):

  • use code generation to produce a distinct version of each affected C-API/ABI item
  • add macros matching the existing C-API for the relevant symbols/typedefs
  • use the build-time flag to dictate which version of the functions, etc. the corresponding macros point to

(There may be good reasons not to do this, or it might be viable. I didn’t spend much time thinking it through, but at first glance it seemed in the direction of reasonable.)


Here are some questions related to enabling the GIL at runtime:

  • what extra work (overhead) is skipped after the GIL is enabled? what extra work sticks around?
  • how to enable GIL without keeping any of the overhead?
  • can the GIL be re-disabled after it is enabled (e.g. by single-phase init extension)?
  • “Open Issues”: how would a runtime-controlled no-gil be different from the proposed PYTHONGIL/Py_mod_gil effect?

Related, how hard would it be to make no-gil selectable at runtime for each interpreter separately? That would be especially useful if the mode with the GIL were to use the code that doesn’t have any of the no-gil overhead (e.g. via generated code like conjectured above). A per-interpreter no-gil would allow folks to mix and match more safely, I think. It could facilitate faster adoption and less pressure on extension maintainers.


Finally, here’s one last thing to consider (one which echoes what I’ve heard others say on this thread and elsewhere):

The PEP is a bit timid about the feature’s end game. I’m particularly referring to the “Python Build Modes” Open Issue. That position’s understandable given past feedback from core developers. However, given the potential costs for core development and for the community, an explicit commitment in the PEP to the final goal is important.

Let’s be honest. Adding the --disable-gil build flag doesn’t make sense if the intention isn’t to eventually make no-gil the default/only runtime option. Why should the community invest [valuable, mostly volunteer] time to supporting no-gil if there’s even a remote chance it gets yanked? It would be “too big to fail”. Furthermore, I expect the maintenance burden of the two build/runtime modes will be enough that the core team will not want to support both forever.

To me, it’s clear that no-gil would never get yanked once we start down the path, so the PEP should be explicit about that expected outcome. Otherwise I think you’re being a bit disingenuous.

Along those lines, it seems like a runtime mode would be inevitable. The current proposal is for a build-time flag, with a runtime-switching fallback. I recognize that this approach is at least partly at the recommendation of the core devs. However, if we’re not going back once we have it, and the runtime mode is inevitable, then why go through all the pain that the build mode approach introduces? If the PEP doesn’t change on this then it should explain why we should take on the burden of a second build mode when we are planning to converge on just one eventually.

17 Likes

Yeah, I was referring to read-only buffers (or other read-only, non-object data). Solving that more generally is yet to be done, but I expect something effective for 3.13.

I suppose that depends on the data and on the serialization method. It doesn’t have to be pickle. That said, it’s better if there’s no serialization/copying involved at all. I doubt anyone disagrees on that.

Not yet, anyway. :smile:

True, but obviously switches are less frequent today due to how the GIL works and lets be honest, less people use threading today than they might in a no-GIL world. :sweat_smile: So I think people have been able to naively get away with e.g. module-level state and not worry about race conditions in their library code.

Just because your code is single-threaded it doesn’t mean some library code isn’t multi-threaded. At the pure Python level, the failure mode is something has shared state (probably module-level) that isn’t protected by a lock that is suddenly used in a multi-threaded context. And the only way to fix that is either get it fixed upstream or to monkeypatch the data structure to lock as appropriate (and then this gets back to whether the community already has or is ready to build tooling like ways to lock data structures for this sort of scenario).

To be clear, we don’t know what is “realistic” yet to then ask Sam to add it to the PEP.

Something I don’t think everyone realizes is this PEP was only submitted to the SC on May 12, which is less than a month from today. And there was already things we had to discuss in terms of other PEPs, etc. on the SC before we even get to this PEP. So while some people have pointed out how long work has been going on, there are multiple steps towards this PEP’s eventual acceptance/rejection, and the step of sending it to the SC is rather recent.

But there’s a range there of how long it will take to eliminate the bugs, how bad the bugs may be, and how many bugs there will be. That’s the risk assessment we will have to make to determine the cost:benefit ratio of this and why it’s not a simple “all things have bugs” comparison.


Here’s a question I have been thinking about (and polled about on Mastodon): would no GIL cause you to rewrite C (or other native language) code in Python? We have seen responses from people who have said “I don’t want things to get slower”, “I would hope libraries I use will take advantage”, or “I will directly take advantage of this”. And I would hope people who benchmark their code before reaching for C or Rust will then use those languages less with this work as they will be able to get more performance to begin with. But one response I haven’t seen is folks saying they would actually be willing to remove C or Rust code for Python code if there wasn’t a GIL.

5 Likes

I tried to do something like that above but a) I didn’t test it very well and b) I didn’t explain it well. I think those use who reach for multiprocessing on a regular basis [1] are a bit dazzled by the possibilities here and not taking the time to make a argument for general python. :star_struck:

I don’t think the idea needs a keyword–it’s just a parallel version of itertools (a Python equivalent to Rust’s rayon). These functions don’t protect you from shooting yourself in the foot but for lots of common tasks (like “iterate over a million things and do a thing to each one”) it should be a drop-in replacement.

I realized my previous example was partially hamstrung by using a 2-vCPU machine (which is one physical CPU). On an n2d-standard-8 (4 physical CPUs), I can use the parmap defined in the previous post to parallelize a map operation and get about a 4x speedup vs python 3.11.

Test details

I cloned nogil 3.12 and built it, and installed python 3.11 from conda-forge (I could clone python 3.12 and build that, but this was simpler). In each I just ran an interpreter and defined:

>>> from concurrent.futures import ThreadPoolExecutor
>>> from time import time
>>> def parmap(fn, *args, n=8):
...     with ThreadPoolExecutor(n) as exc:
...             yield from exc.map(fn, *args)
... 
>>> def a(n):
...     for i in range(n):
...             i += 1
... 
>>> m = [1_000_000] * 100

Then I just timed the difference between list(map(a, m)) and list(parmap(a, m))

Results:

command py311 nogil nogil / py311
a(1_000_000) 0.0214 0.0231 1.08
list(map(a, m)) 2.123 2.577 1.21
list(parmap(a, m)) 2.191 0.545 0.249

It’s curious that the plain map is so much slower when the single function call is only 8% slower [2] but I hope this demonstration is somewhat useful for illustrating why nogil has the potential to be broadly useful.


  1. people using python for science, finance, ML, etc ↩︎

  2. which is still nothing to ignore, for sure ↩︎

6 Likes

Thank you for the example.

Consider this quote from Charlie Marsh (ruff author)

The “fearless concurrency” that you get with Rust is a big one though. Ruff has a really simple parallelism model right now (each file is a separate task), but even that goes a long way. I always found Python’s multi-processing to be really challenging – hard to get right, but also, the performance characteristics are often confusing and unintuitive to me.

One could wonder if ruff would even be needed in a world where you could lint one file per thread on multi-core python

I’ll just preface this by saying that I greatly appreciate all the work that’s gone into this so far and I also appreciate the time taken to talk to randos like me who wandered in here.

That’s definitely true. But if I’m dependent on a library that’s multi-threaded, surely I have to wait until they release a nogil version? If their nogil version has problems, that’s a bug in their library, not a problem with mine.

I recognize that the transition could be painful [1], it doesn’t feel fair to count every dependency of a dependency as needing a big lift to be compatible (any more than the usual effort needed for a major release, for instance)

And to be clear, I wasn’t making a demand or anything. I just thought that was a useful discussion to have (as Eric Snow also gets into above).


  1. and I think more info about how some major packages had to be modified would be really useful! ↩︎

I am also mostly a bystander here but I would like to express how much this would help Datadog (the company I work for) and its customers. I will be speaking in a personal capacity and therefore anything I say is not necessarily endorsed by my employer.

The source of most of the data we collect comes from our Agent that users install on their infrastructure. This collects metrics, events, logs, traces, process information, network information (eBPF), and more that I don’t feel like enumerating (and definitely other stuff that I don’t even know about). This software runs on an extremely large number of hosts across the globe for many customers (small subset) whose names would be familiar to the average person anywhere.

I work on a team that creates and maintains integrations that are shipped OOTB with the Agent. These could be for databases like Postgres, web servers like NGINX, Kubernetes running anywhere, hypervisors like vSphere, IaaS cloud offerings like Azure IoT Edge, SaaS cloud offerings like Amazon MSK (Apache Kafka), hardware devices like Cisco routers, system internals like disk partitions or Windows services, broadly used things like TLS/certificates, etc.

Basically, for anything our customers care about, it is my job to find a way to extract meaningful data :slightly_smiling_face:

Most of these Agent integrations are written in Python, with each being its own (namespaced) package. While the Datadog Agent is written in Go, integrations aren’t usually for more rapid maintenance and to make it easier for customers since they can not only contribute but also run custom integrations just for their own use cases.

A single Agent will often have enabled at least a dozen integrations, usually with many instances of each. This level of concurrency with the resource usage required is directly hindered by the GIL and the way we ameliorate the constraint is hacky.

We wrote a little about connecting Go to Python here. This component is called the rtloader and here is an example of exposing a new binding to integrations (fun fact: CI passing on the first commit and getting merged was by sheer luck because my dev environment broke that morning and I could not build!).

We run each instance of each enabled integration in its own goroutine, which are assigned to a runner for work. By default, the number of runners is set to 4 and each instance is scheduled to run every 15 seconds. The GIL is managed by Go and while there is “concurrency” and performance is better than without, there is not parallelism, meaning although execution of instances can ping-pong back and forth between each other rapidly based on syscalls and other heuristics there can only ever be one running at any given moment.

Being limited by the GIL has had a negative, material impact on us. Occasionally, a customer’s environment is so large that we simply have to rewrite the integration in Go. Other times a customer has to work out how to best spread the load between different configured instances of an integration or even run multiple hosts to distribute load to different Agents. Every time I hear of a performance issue for a large customer that cannot be resolved I think “darn, a rewrite is coming” or I just feel sad that so much compute is wasted working around the lack of parallelism.

Having the option (hopefully default and only eventually) to remove the GIL would make our use case actually work without hardship for customers and us engineers. Additionally, although I’m not allowed to say how many Agents are deployed this moment nor can I give my savings estimate for fear of inferring the former, the optimization of our resource usage would have a very positive impact on energy consumption/the environment, for those who are concerned about that. We are still growing rapidly so the environmental impact would be commensurate to our growth.

If it is optional for some time rather than the default and only, dependencies that provide extension modules would have minimal impact for us since we already have a build system that can do everything from scratch and is not necessarily reliant on provided wheels. To be clear, my preference would be for this to be the default (Python 4?).

Some notes:

  1. I don’t have much insight into our backend but much of it is still written in Python (less so over time) and we would benefit there just as much as Instagram and other organizations
  2. I write a lot of CLIs and single threaded performance would help me the most, so please interpret my advocacy for proper parallelism as in direct opposition to my personal preference because I view that as the best path here for the long term future of Python
14 Likes

Thanks, Eric!

Does this mean it will be possible to declare an object graph (e.g., a dictionary with many complex objects) read-only in many subinterpreters and modify it in one other subinterpreter?

Or would it need to be “frozen” for all subinterpreters before being shared? As long as the sharing has little overhead, that could be repeated often, so could be workable as well.

In the end, I guess everything in this space is a discussion of who bears overheads: CPU and memory, core devs, extension devs, application devs / language users, and also documenters/teachers.

So far, Python’s low conceptual overhead placed a tremendous burden on core devs and (low-level) extension devs to the great benefit of language users and application devs. Perhaps this share of the burden is no longer fair in the world of many cores, given the maturity of the language and the huge ecosystem it grew.

:wave: Hello from Pantsbuild! (Mentioned above by @davidhewitt, PyO3’s author above)
I couldn’t tell how community feedback was expected to be collected, so I’ll put my little blurb here.

Our project uses a Rust engine leveraging tokio’s async event loop and wiring that up to Python’s. The engine itself is implemented as an extension module, and the application is a Python app. The only reason we use Python anymore is because it’s a VERY great language for rapid development and readability. We just couldn’t expect our wonderful plugin authors to write everything in Rust.

At the boundary, we’re consuming read-only deep Python data structures. We leverage PyO3 to make that simple. (An interpreter pool could work if those could read shared memory AND PyO3 consumed it, but still not ideal)

Python was also the first language supported in our build system in V2, and I’d argue we have best-in-class monorepo support for Python. So, we <3 Python.

The GIL is one of the reasons we see poor multi-core performance in a lot of scenarios. When we measure those we port them to Rust. Not because Rust is fast but because we can do work outside of the GIL.

I can likely make a very strong case for porting the Rust code back to Python if the GIL wasn’t holding us back. Having the code be pure-Python means it’s easier to read for everyone, and we don’t split ourselves between Rust and .pyi to get good type support. And faster development.

Lastly, our use case doesn’t rely on 3rdparty libraries (we do today, but those can easily be removed). So we’d immediately benefit from a no-gil CPython (once PyO3 support was merged). I actually tried it out, but hit a few bugs/snags. I’m eager to try it again.

So, there’s our data point :smile:

20 Likes

For sure yes. But even more important: with typing already in place, new code can be directly written in Python and compiled to native code. Pure Python libraries do not disable the GIL as extensions may do, but if the GIL is not in place and the code can be compiled (or JIT-compiled), many libraries will be directly written in Python.

mypyc is already a good example and many other compilers would surely pop up.

Let’s envision a future in which most of Python is written in Python itself until it can 100% bootstrap itself.

3 Likes

Yes, because most of the C code that I write leans heavily on the GIL for its own sanity. Given the choice between implementing my own fine-grained locking vs. assuming/hoping/dreaming that someone will make ctypes totally safe, I’d probably rest my hopes on someone adopting ctypes maintenance. (I’m not doing perf-sensitive stuff mostly, it’s OS integration.)

1 Like

Revisiting this thread and had some additional thoughts/questions [1]

I’m curious why? Do you think some people will prefer to keep the GIL around for simplicity, or that single-threaded performance will always remain better with it enabled, or some other reason?

I would think the ideal would be “no GIL, but also no performance hit for single-threaded code”, in which case there’s no need for a runtime mode. But maybe that isn’t possible [2]

Separate builds is a big infrastructure challenge [3] , but from a user perspective it’s nice that it makes compatibility crystal-clear: if you’ve released a wheel for nogil, you’re claiming compatibility. Whereas with a runtime flag, I might mix together compatible and incompatible extensions without realizing (even if there are warnings and documentation and so on, people will do this).

This is all to ask: can people imagine a transition of a split build for a release or two, and then no GIL thereafter (i.e. skip the runtime flag)? To me this seems like the most desirable outcome, and potentially a simpler transition, both for users (if you can install it, it should work) and for developers (release normally for 3.X, develop/test on 3.X-nogil, and be ready for 3.X+1 which is nogil by default).


  1. can you tell I’m excited about this PEP? ↩︎

  2. if future versions of single-threaded Python rely on the GIL for maximized optimizations ↩︎

  3. maybe a corporation with nogil aspirations could pay the bot bill…? :wink: ↩︎

Not necessarily. For instance, will pure Python projects be expected to declare they support free threading?


@colesbury Speaking of packaging and declaring support, after re-reading the PEP this weekend I assume the expectation is we will have to define an abi4 or abi3t to signal a stable ABI that supports no-GIL (FYI I prefer t or n to signal “threading” over “no GIL” as an ABI suffix)?

The PEP also mean extension module wheels will double the number of CPython wheels they produce if they want to support both with and without a GIL, correct?

And what’s your suggestion on how to tell if you are running a CPython with or without the GIL? Via sysconfig or some other mechanism (i.e., I didn’t see any mention of some change to sys to expose this)? Up until this point you could tell e.g. CPython from PyPy by the binary name as well as the feature version, but that isn’t the case here.

1 Like

I was assuming that a pure Python project is not going to have issues. This is almost the definition of nogil “working” at all, in my eyes: correctly run Python code. As I understand it [1], if a pure Python project is working now with the GIL, it shouldn’t have a problem without it. As H Vetinari pointed out, it’s totally possible that there are some deep assumptions somewhere in CPython that make this untrue, but it doesn’t sound like they’ve been uncovered by the testing so far.

For more complicated dependency graphs, it seems like there’s a transitive property. If I need A, and A depends on B, … and Z is a C extension that hasn’t been released for nogil, then I can’t install A because pip won’t resolve the dependencies.

If we imagine there were no compatibility concerns at all, and it was just a breaking change to the ABI that required all extensions to be rebuilt, that would be the scenario, right? Everything in pure Python still works but anything that depends on a compiled extension has to wait until it’s available.


  1. e.g. based on Sam’s post earlier ↩︎

1 Like

Hmm, if you can’t tell from the running interpreter that it’s a nogil build, how can packaging build backends know to build a nogil version of a package for that interpreter? Pip, for example, will try to build a wheel if one doesn’t exist - and it does that using default build settings, so there’s a built in assumption that “build a wheel” will by default build one that’s compatible with the running python.

Also, packaging tools need to know if the running python is nogil in order to determine what wheels are compatible with this interpreter…

1 Like

This point is crucial – is pip and related infrastructure ready to serve a dual-ABI world? That’s also breaking an ancient assumption. I brought it up in the other thread, but didn’t get much response.

Given the situation in the packaging space (maintainers far above capacity, huge amount of complexity / tooling diversity, many long-standing & severe painpoints, lots of legacy cruft, etc.), I doubt that trying to push such a huge change through this bottleneck will be helpful[1], i.e. this should factor into the decision of whether to aim for parallel ABIs or version-separated ones (i.e. Python 3.x=with gil / Python 4.x=nogil).

It doesn’t need “deep assumptions somewhere in CPython” for pure python code to be possibly affected.[2]

Yes, and that’s entirely reasonable. If you depend on Z, and it’s not nogil-safe, you cannot use it in a nogil-only context without stuff potentially blowing up. The situation does explain the desire towards the runtime switch though (or more generally, running gil & nogil code in the same interpreter). If that encapsulation could somehow be made fool-proof, then you could just[3] run Z with the gil, but keep all the other dependencies running without it.

To be clear – I want nogil to succeed – but I’m trying to point out how large the surface for potential problems is (which explains a large part of the hesitation in this thread). There’s also a severe mismatch here (well, even more than usual) in how much the natural desire of users for an obvious improvement will translate into a huge chunk of work for volunteer maintainers all over. It’s great that there’s so much excitement about it, but more often than not, that excitement will turn into impatience rather than someone assisting the maintainers in climbing down some really deep debugging rabbit holes.


  1. but then, as someone who’s very involved in packaging, I acknowledge my biases on this ↩︎

  2. In a footnote because I’m repeating myself: it’s enough to rely on non-trivial third-party dependencies, or do some things (like IO) where threading might change observable behaviour (especially losing some implicit ordering in handling external resources like files, opening the possibility for data races, concurrent access problems, etc.); aside from the fact that even pure python libraries might not be ready for being called in parallel from other, now suddenly nogil, python code. ↩︎

  3. a very load-bearing “just” ↩︎

3 Likes

I’m trying to understand the boundaries of that surface, partially because I’m just curious but mostly because I think defining that surface more precisely would be useful for this discussion. It’s somewhere between “everything could break” and “extensions using the ABI need an update” and narrowing down that range is helpful.

To be clear, I think this is a feature of having a different ABI, not a problem[1].

I’m just trying to understand the precise failure modes here [2].

  • If my code doesn’t use any threading now, it shouldn’t change behavior as long as the dependencies work. nogil Python isn’t going to change the ordering of what I’m doing in a single-threaded program.
  • If my dependencies work and they aren’t compiled extensions, they should continue to work whether or not they use threading, because a broken use of threading is already broken. The existing threading tools allow for arbitrary execution of different threads, and anyone using them has to account for this with the appropriate design, locks, etc.
  • If I need dependencies with compiled extensions, they might have issues with nogil, but they will need to be updated and released for nogil before I can worry about them.

Regarding “this library isn’t safe to call from a multi-threaded context”, it seems like that would have to already be true, and it’s just that no one is doing it right now. In my opinion that doesn’t mean the package is broken by this PEP, but the interest in adding this feature would increase. I already encounter this all the time in the multiprocessing context [3]. It can be annoying but I don’t usually consider it a bug in the library.

If there are issues beyond the above, I guess that’s what I meant by “deep”, in that they are not something the package developer realized they were relying on and they were (probably?) not documented behavior. If there are data races lurking in CPython that are exposed by (otherwise correct) multithreading code [4], they have yet to be found by Sam Gross or the others who have been using nogil thus far. Therefore, pretty “deep”, in my eyes. That’s not to say they don’t exist.

I understand the hesitation but it’s also a little funny to me. It seems…good? to introduce language features that are so popular that users clamor for them to be implemented. That seems like the sign of a good PEP, and more generally the sign of a good major version.

For what it’s worth, I solemnly swear to help debug/contribute PRs any of my dependencies that need to be adapted. :sweat_smile:


  1. the packaging question is a good one though, and strengthens the argument for version separation ↩︎

  2. at the very least, I need to understand this for my own code! ↩︎

  3. because many objects can’t be pickled and passed to subprocesses ↩︎

  4. and again, this should cover compatibility for any set of pure-Python packages ↩︎

3 Likes