PEP 703 (Making the Global Interpreter Lock Optional in CPython) acceptance

(Posted for the whole Steering Council.)

As we’ve announced before, the Steering Council has decided to accept PEP 703 (Making the Global Interpreter Lock Optional in CPython) . We want to make it clear why, and under what expectations we’re doing so.

It is clear to the Steering Council that theoretically, a no-GIL (or free-threaded) Python would be of great benefit, and the majority of the community seems in agreement. Threads have significant downsides and caveats, but they are widely adopted, both by software and hardware, and they do enable more scalable solutions to problems. The GIL clearly inhibits CPython in this, and removing that barrier would be a good thing.

At the same time we’re not sure if it’s possible to remove the GIL without fundamentally breaking all extension modules out there, or significantly reducing the performance or maintainability of CPython. The third-party/PyPI package ecosystem is one of Python’s strengths, and the tight, efficient integration with C libraries is one of CPython’s. It has enabled the existence of a diverse selection of packages that’s a unique selling point for Python. We need to be careful that we do not destroy those benefits, or discard decades worth of package development.

Assessing the practical impact, and the practicality of adapting third-party packages to the new free-threaded situation, is difficult without a finished implementation. The unpredictable nature of thread-related issues makes it extra difficult, as some issues won’t show up until put under significant load. The changes necessary to remove the GIL are substantive enough, and require so much coordination with other CPython development happening at the same time, that we can’t reasonably do these experiments in a fork of CPython. We also want to avoid the risk of ecosystem fragmentation and unnecessarily diverging changes because of the work being done in a fork. For PEP 703 to move forward, it has to be included in CPython’s main, and released as part of regular releases (albeit not necessarily by default).

But while we think removal of the GIL is a worthy and necessary goal, and PEP 703 is the best proposal for it so far, we can’t at this stage guarantee that it will work out. We have to, as we develop PEP 703’s implementation and the necessary user-visible changes to semantics, APIs and ABIs, continually evaluate the feasibility, and be prepared to change course – or reverse it, if that turns out to be necessary.

As mentioned before, we see this as a rollout in roughly three stages:

  • Phase I: Experimental phase, which can start immediately, in which the free-threaded build is enabled through a build-time option. This should not be the default install anywhere. At least one major Python release should include this experimental free-threaded build, to allow third-party packages to test and do their own experimentation. In this stage we should make it clear the build is experimental, not supported for “production use”, and may be reverted.
  • Phase II: Supported-but-not-default phase, which would start when the API and ABI changes have sufficiently settled, and there is sufficient community support. Exact criteria for this phase are hard to pin down at this stage, so this will have to involve some discussion among Core Devs and the community, and a decision by the SC at the time. At this point reverting should still be possible (so for example preprocessor guards should remain in place) although obviously we aren’t expecting it.
  • Phase III: Default phase, at which point the free-threaded build becomes the default (but can initially still be disabled). Again, the exact criteria are hard to pin down this far ahead, but the aim is to make this as seamless and painless a default flip as possible. Like the previous phase, the SC at the time will need to make a decision as to when this occurs. Some time after the default flip, when we have a good indication it’s no longer widely used, we should start the discussion on removing the GIL build entirely.

The details of the phases are deliberately vague, simply because we can’t know all the ecosystem impact details yet, and we don’t want to set conservative standards now and then hold people to them when in practice we’re being too cautious. (We don’t want to set overly ambitious goals and break too many things, either.)

For some of the changes necessary for the free-threaded build, like switching to mimalloc or significant changes to the GC, it may be useful to make these separately build-time opt-ins, and perhaps make them the default before the free-threaded build becomes the default build (but not before it becomes fully supported). Having them separately enableable allows for experimentation, performance measurements and debugging focusing on the isolated set of changes. We don’t want a complex matrix of build flags, so this should probably be limited to one or two.

Regarding the expected performance impact of the free-threaded build, the SC thinks a significant performance penalty is expected in a free-threaded build, and the benefit is probably worth that price. At this point we’re expecting a (worst case) performance penalty of 10-15%. We don’t want to set strict limits on acceptable performance, partly because we don’t want to get stuck in arguments about how to measure performance and partly because it will depend on user expectations and, for example, how much performance work is invalidated and how much we can expect to see recuperated over time. Solutions for the free-threaded build that are fundamentally problematic for performance improvements going forward, are less acceptable than solutions that are currently suboptimal but have room for improvement.

The performance impact should be isolated to a free-threaded build; a GIL build should not see any performance impact in existing code. For API and ABI changes necessary to support both GIL and free-threaded builds (e.g. avoiding APIs that return borrowed references or that rely on the GIL to protect shared data), it’s reasonable for the new interfaces to be slightly less performant, but we expect this to be very limited and usually lost in the noise.

In a similar vein, it’s important that as the free-threaded build lands in main, so that its implementation is considered in other development work that’s going on. New features can’t land without proper support for the free-threaded build, when the two intersect. It may seem tempting to ignore free-threaded when developing thread-adjacent changes, but in the end someone will have to make it work, and it’s neither fair nor particularly forward-thinking to expect the free-threaded maintainers to do all the work. We also need to be mindful of the cohesiveness of the language, the implementation and the C API. We have to assume the free-threaded build will be the only build in the reasonable future, and like other fundamental changes to CPython internals, we all have to learn the new way of approaching these problems. This may be a bit of a jump in terms of complexity – the GIL implicitly simplified so much – but it is a necessary step. We do expect, at least initially, the free-threaded build experts to help others ramp up here. The experimental phase is there for CPython and the Core Devs to get used to the free-threaded build as much as it is for users.

We do need a few specific things resolved before PEP 703 can leave the experimental phase. For starters, we need a solution for the ABI. The SC believes strongly that a single ABI serving both with-GIL and free-threaded builds should be possible, should be made possible, and should be required before leaving the experimental phase. If this turns out to be an unreasonable requirement, we’ll have to look at alternative solutions to ease the pressure on package maintainers (e.g. building two extension modules in the same wheel, or providing a compatibility layer through a separate library).

We also need to consider the testing matrix, both for CPython and for third-party packages. Even with a stable ABI we still need to multiply the test matrix. We probably don’t need complete coverage on all supported platforms, but we do want at least one free-threaded buildbot for each of the T1/T2 platforms, as well as some way to test the validity of the unified ABI (a way to build things in one build mode and test against an interpreter built in the other build mode). We currently rely on the stable ABI check and third-party testing of the stable ABI, but that will probably not be good enough to ensure the compatibility between the GIL and free-threaded builds. To get to the supported phase we also expect CI checks on GitHub for free-threaded builds on each of the major platforms.

There are a few specific things we want to avoid. We do not want the free-threaded build to be used as the default Python anywhere until the Core Devs and the Python community are ready for that. Obviously we can’t stop users and distributors from installing a free-threaded build by default, but we think at this stage it would be a mistake to do so for anything besides end-to-end experimentation. We also want to avoid labelling the free-threaded build “experimental” after the experimental phase. Build-time flags, defines, comments in the code should avoid the word. We want to avoid negatives in terms and flags and such, so we won’t get into double-negative terrain (like we do when we talk about ‘non no-GIL’). We’d like a positive, clear term to talk about the no-GIL build, and we’re suggesting ‘free-threaded’. (Relatedly, that’s why the build mode/ABI letter is ‘t’ and not ‘n’; that change was already made.)

In short, the SC accepts PEP 703, but with clear provisio: that the rollout be gradual and break as little as possible, and that we can roll back any changes that turn out to be too disruptive – which includes potentially rolling back all of PEP 703 entirely if necessary (however unlikely or undesirable we expect that to be).

For the whole SC,
Thomas.

91 Likes

Do you have a more specific timeline than “immediately” for the integration of free-threaded builds in git main?
(I ask quite egoistically, because I may have to rework GH-110829: Ensure Thread.join() joins the OS thread by pitrou · Pull Request #110848 · python/cpython · GitHub if the nogil work lands before that PR :slight_smile: )

2 Likes

That’s not really a SC question (it’s not like we can force people to merge or even send in PRs on a specific schedule :slight_smile: but maybe @colesbury or @ambv have an idea when the bulk of things would be ready?

1 Like

The integration into git main has already started. There will be a bit of a bottleneck on two PRs: gh-110764 biased reference counting, which is under review, and the per-object locking PR, which will come next. Those two are prerequisites for the bulk of the other library changes.

I expect your Thread.join() PR will land before the free-threading work touches the _thread module. There were some changes necessary in the nogil-3.12 fork to make Thread.join() thread-safe, but I think your PR will make those changes easier if they are still necessary.

8 Likes

I expect your Thread.join() PR will land before the free-threading work touches the _thread module.

I am naive in the ways of git/github, but if it’s desirable that Antoine’s PR land first, would it be possible to make your _thread changes dependent on it?

Skip

Yes, that’s what I plan to do.

:gift_heart:This is the single most wanted feature Ive had since starting out with Python some 3 years ago (pure Python threads w/o gil), it will benefit several of my compute heavy projects.

Thank you for this effort!

1 Like

In case it is useful to know and not already known, OCaml went through a similar transition recently to get to its “multicore” version. In previous versions there was the equivalent of a GIL.
Their approach to backward compatibility was:
The unit of parallelism is something new called a “domain”. At startup there is a single domain and any thread created using the standard library’s Thread module works just as before, as though there were a GIL. When there are multiple domains, domains run in parallel, but threads running in each domain have the equivalent of an inside-of-domain GIL. When there is a single thread per domain, that is like what I understand to be called “free-threading” here.

1 Like

You just described subinterpreters with a per-interpreter GIL (PEP 684), which landed in 3.12. :smile:

16 Likes

Can ocaml freely exchange objects between domains?
Python free thread is able to this.

3 Likes

I believe so, yes.
I have not personally written any programs using domains, yet.
I think I did not explain it well in my previous comment. A domain is a context in which a single system thread at a time may run OCaml code.

From the documentation (OCaml - Parallel programming):

5.1 Interaction with systhreads

How do systhreads interact with domains? The systhreads created on a particular domain remain pinned to that domain. Only one systhread at a time is allowed to run OCaml code on a particular domain. However, systhreads belonging to a particular domain may run C library or system code in parallel. Systhreads belonging to different domains may execute in parallel.

Also, this seems crucial:

While the user may observe non sequentially consistent behaviours, there are no crashes.

1 Like

Since the discussion about naming the new feature is still undecided, I propose “multicore”.

6 Likes

First I want to distinguish between Python the language and the (cpython) implementation. The nogil/multicore work is an implementation detail, AFAIK it should have no effect on the language definition (except maybe clarifying the memory model). So technically speaking it is really “Multicore CPython”.

You say that cpython already works on multicore setups, but… it doesn’t? When running the Python language, cpython runs on a single core. If you’re referring to multiprocess, I’d say it doesn’t count; the language as implemented by cpython is single-core.

What does “running the Python language” mean exactly?

If I add two Python integers, it invokes a C function that implements the addition of two Python integers. If I add two NumPy arrays, it invokes a C function that implements the addition of two NumPy arrays.

If adding two Python integers is “running the Python language”, why wouldn’t adding two NumPy arrays also be “running the language” as well?

Use Numpy, Pandas, Numba, Cython, Dask… and you can easily get the benefits of several CPU cores. You may not relate to that ecosystem, but it exists and is an essential part of Python’s growing popularity.

Why doesn’t it? multiprocessing has been part of the standard library for 10+ years, and is widely used. So is its cousin concurrent.futures.

You don’t get to choose what is “Python” and what is “not Python”. There’s nothing special about the multiprocessing module that makes it less Python than the threading module. There’s nothing special about NumPy or Dask that makes it less Python than Pillow, Django or SQLAlchemy.

Yes, removing the GIL is a big leap forward and will help exploit CPU parallelism in more workloads. No, it doesn’t mean that previously CPython was “monocore”. Saying so is just misrepresenting the current state of the CPython implementation.

2 Likes

I’m not sure why that suggestion struck such a cord, but this is a warning to check the language you use when you respond. You can express that you disagree with something without suggesting that someone is deliberately misinforming, lying to, or insulting others.

2 Likes

@davidism Since you’ve edited my post, can I ask you to make it clear that you’ve edited it?

(I don’t think this is the first time that such a mention is requested on this forum, by the way)

3 Likes

There is an edit icon on every post that has been edited that shows the diff of each edit. Please get back on topic.

2 Likes

I am already confused by calling Python ‘Multicore’. I just did a test with a simple TCP server, and I can make use of each CPU core, at least by 50%, despite the Global Interpreter Lock (GIL). My assumption has been that Python already runs on multiple CPU cores.

1 Like

Isn’t any non-dry term, i.e. something with branding appeal, for GIL removal going to be some level of technically inaccurate?

And I would push back on the idea that CPython can’t be associated with being “monocore”. In my experience as of Python 3.12, without spawning subprocces, any pure Python code will almost certainly perform no worse, if not in fact better, if you reserve and pin it to a single core on your CPU.

Maybe I’m missing some obvious example where this isn’t the case though.