PEP 703: Making the Global Interpreter Lock Optional

The Python Critical Sections part of the document describes how threads can only hold a single lock at a time to avoid deadlocks, where existing locks are released when a new lock is acquired.

If multiple locks are taken to ensure mutual consistency across multiple resources, is it possible that a second thread can acquire a released lock and make changes that are inconsistent with the work performed by the first thread? At first glance, this has the potential to violate program correctness in a multithreaded environment.

Hi Barry - I think you are referring to PyDict_GetItem vs. PyDict_FetchItem (and the list object equivalents). Let me know if you are referring to something else. For general context: the proposed new APIs (PyDict_FetchItem) return “new references”. The existing APIs (e.g., PyDict_GetItem) return “borrowed references.” The borrowed references are sometimes unsafe without the GIL, such as when another thread might concurrently modify the container.

The PEP does not propose making the borrowed reference versions a compile error (nor does it deprecate them). This for a few reasons:

  • Many of the uses of PyDict/List_GetItem are safe even without the GIL. For example, accessing kwargs with PyDict_GetItem is safe because the kwargs generally cannot be modified by another thread.
  • I tried this approach briefly early on, and, in my experience, wholesale replacing “borrowed references” with “new references” in existing code made it too easy to introduce new reference counting bugs.

I think the risk of introducing new reference counting bugs generally outweighs the risks of missing a PyDict/List_GetItem call that’s unsafe without the GIL. That said, extension authors know their own code best, so if someone wants to replace PyDict_GetItem with PyDict_FetchItem they certainly can do so.

I’ll add this reasoning to the PEP.

4 Likes

PEP 683 tries to be compatible with the stable ABI (abi3) and makes some trade-offs to support that, such as putting the immortal bit as the 2nd highest bit. You could make this PEP look more like a bit more like PEP 683 (by placing the immortal bit in the same spot), but I think it would only be superficial similarity because this PEP depends on biased reference counting (there are two reference counting fields instead of one), so the overall reference counting scheme is going to be different regardless. On x86-64, using one of the low bits instead of the 2nd highest bit makes checking if an object is immortal a teeny bit faster. I don’t think PEP 683 should be changed – I think it’s a good solution given the constraints of ABI stability. I’ll wait a bit to see if there are more comments on immortalization and then try to summarize this in the PEP.

Regarding motivation: I’m hoping that people that use Python for the many other use cases not mentioned in the PEP can weigh in here about either the problems the GIL poses for them or with their concerns about this PEP. To the extent that there are common themes, I’ll try to capture them in the PEP.

Regarding the build modes: I wrote this in the PEP, but my hope is for the two build modes to be temporary. I think having two build modes – for a short period – is worthwhile because it reduces the risks during integration into CPython and gives a bit more time for extensions to adapt. I don’t really see the build modes as a means for extension authors to “opt-out” of supporting it, although it’s possible it’s used that way. Like you, I think there will be pressure from users on extension authors to support it, and I think that pressure will be because the users see value in running without the GIL.

2 Likes

I like --enable-gil / --disable-gil too. I’ll wait a bit to see if there are additional comments on the build option naming and then update the PEP.

8 Likes

What is the impact on packaging, wheel tags, PyPI, pip, python org installers etc.?

2 Likes

Thanks for the response!

So if I’m reading this (and the PEP) correctly:

  • nogil needs both biased & deferred reference counting (for thread safety resp. performance)
  • nogil needs a break of the stable ABI[1]
  • it’s thus independent of PEP 683, but would replace(?) PEP 683’s changes in this space with its own

Not to nitpick, but the linked section says “These may be worthwhile if a longer term goal is to have a single build mode”, which is… at odds… with “for a short period”. BTW, this choice is a really hard problem, and it’s fine that the PEP doesn’t really commit to a timeline, but IMO those trade-offs (additional build mode – what would that imply over short/long term? – or new default) could use some more discussion.

I absolutely see libraries opting out of this, at least for a while (e.g. “making our library threadsafe needs a large overhaul of core component X, and we’ll do it in the next Y months”), which will create a situation where people want to install a given library on the nogil ABI, only to find out that it doesn’t exist (or worse: since there are no nogil-wheels, it falls back to trying to install from source) – would pip even be able to take the python ABI flavour into account for the resolver, resp. give a reasonable error?

It’s also going to create confusion for users who have no idea what an ABI is, when a library advertises itself as “we support Python 3.12 (but not nogil)”, and then users need to understand that distinction.

The least painful scenario I could imagine (very subjective, obviously) is to redeclare Python 3.13 as 4.0, no additional build mode but with changed ABI, merge nogil immediately after branching 3.12 (and publish an alpha right after), then work with the extension ecosystem over the following year to get as far as possible in getting the most used libraries “4.0/nogil-ready”. I know you have some experience with that already with the libraries you patched for the existing nogil fork, but it’s gonna take a while for the ecosystem to digest such a change IMO (whether with a separate build mode or not[2]).


  1. that in and of itself probably makes it “Python 4.0” material? ↩︎

  2. I just think a changed default will cause less friction overall, rather than subjecting the library maintainers (as well as tooling & infrastructure) to having to deal with both modes + users who can’t wait for nogil either way. ↩︎

10 Likes

This would place a huge burden on authors of third-party C-API extensions. I don’t see how we could have nogil without a situation similar to the Python 2→3 transition – two incompatible versions of Python existing in parallel for a long time.

10 Likes

In CPython Free Lists you mention that freelists will be disabled when building without the GIL – is there a reason to not have a pre-thread freelist instead?

Oh, because allocations are global, and a pre thread list just doesn’t really make sense. Sorry, I haven’t had enough coffee yet :slight_smile:

In Garbage Collection (Cycle Collection) it says

  • Elimination of generational garbage collection in favor of non-generational collector.

There is a proposal to change this. Maybe there should be an explicit section of the PEP to discuss implications for c-extension library authors.

In any case, the Distribution section should be extended to describe what resources lie behind the statement below. Has this been discussed with Anaconda (maybe the author means “the conda-forge community”, which maintains the cutting-edge versions of packages like NumPy)? Will the conda package be a fork of the upstream package or does it depend on the package maintainers to do the work?

the author will work with Anaconda to distribute a --without-gil version of Python".

2 Likes

Somewhat relatedly, this came up in my feed today about a similar issue in bind Safe memory reclamation for BIND – Tony Finch and it talks about https://liburcu.org/ (which is LGPL’d, so I suspect not of use to Python core.)

Yes, exactly. So what happens to existing calls like gc.collect(generation=1)? Do they start raising an exception?

One of the ideas that PyCXX implements is that you do not need to know the hard-to-get-right aspects of the python C API.

Borrowed refs is one hard-to-get-right that PyCXX makes go away.

In the same way I would like to make any unsafe calls to go away.
Its hard-to-get-right to know if you can or cannot use the PyDict/List_GetItem in my view.

I would like a way to make them raise compiler errors so that I can avoid ever needing to figure out if its safe or not. I’m happy to #define PyGIL_UNSAFE_API_COMPILER_ERRORS to make this an opt in feature.

I have been thinking about how I could have PyCXX allow a single binary extension to work --without-gil and --with-gil.

The ABI for the both builds of python would need to support a common set of operations that work in both builds.

It think for this to work python would need to support:

  1. always have the object header allocate the extra fields used for --without-gil
  2. always use the function version to do INCREF and DECREF so that the python.so/.dll can have the right version for how python was built.
  3. never use the Py_INCREF or Py_DECREF macros.
  4. Any macro that uses Py_INCREF or Py_DEVREF would need to use the function version that I would opt into via a #define
  5. the other 10 things I have not considered…

As defined the Py_XINCREF and Py_XDECREF are inlining the macro contents so not suitable as is.

1 Like

Small question: should the design be, technically, roughly as performing on windows than on other operating system ?

I’m curious what people think is a acceptable slowdown for single-threaded code?

My estimate for the performance impact of this PEP is in the 15-20% range, but it could be more, depending on the impact on PEP 659.

Personally, I don’t like the shared memory model of concurrency, so any slowdown is too much. But that’s just my opinion.

5 Likes

On my personnal test, slowdown was 40%, but the antivirus of windows may be guilty.

1 Like

We can’t actually test the performance, since the nogil branch is a fork of 3.9 with extensive modifications to counteract the slowdown of the extra looking and reference counting overhead.

Consequently, comparisons to 3.9 are meaningless, and because the nogil branch lacks the improvements in 3.11, comparisons to 3.11 are also meaningless.

3 Likes

Was comparing python 3.9 to nogil python 3.9, on windows. Unfortunately, there is no benchmark done for windows on CI systems, nor on GitHub - faster-cpython/benchmarking-public: A public mirror of our benchmarking runner repository

To assess how it reacts on ‘user’ Windows, measuring it (roughly) would be nice.

Very cool proposal! As an average Python dev who sometimes had to work with concurrency and parallelism, I ask this (maybe obvious) question: If this get implemented, why would we devs ever use multiprocessing over multithreading?

2 Likes