PEP 703: Making the Global Interpreter Lock Optional

If I understand correctly, changing the ABI around refcounting is a hard break, and every package will be theoretically affected. I say theoretically because it presumably needs to increment the refcount far enough to run into the additional bits that are now being used for special purposes, so I guess an argument could be made that “if an ABI falls in the forest, and there’s no one around to observe it, does it constitute a break?”.

So it’s possible that some packages could realistically get by with one given single build, but mixing ABI tags is a bad idea and shouldn’t be encouraged IMO (aside from cases where that’s explicitly part of the design scope, like the stable ABI being a compatible subset of the version-specific one). If/once that viewpoint is taken, every package would need an additional build for the nogil-ABI.

I expect that this would reflect similarly as debug or other ABI-affecting Python build flags; and packaging tooling is certainly capable of (or can be made capable of) handling this difference.

1 Like

An ABI change is tractable. There are many issues, but it’s doable, and probably reasonable for a change of this magnitude. It’ll need more work than the PEP admits, but that’s expected – it’s not a one-person job.

The API change is worse. There’s no longer a lock protecting all your C code. You now need to think about concurrency all the time. It’s like going from asyncio (other code can run only on explicit await points) to greenlet/threads (other code can run at any time). AFAIK, issues of the greenlet/thread model were a major reason for creating all of asyncio in the first place. The nogil issues will be the same.

1 Like

FWIW, if the PEP author thinks that’d be helpful/useful, I think it’s worth starting a thread to discuss the packaging aspects of this brought up so far over on the packaging category (which has the relevant folks looking at it)[1]. Encoding the conclusions from that into the PEP would certainly help clarify this.


  1. My 2 cents: collect the concerns expressed so far around packaging + this PEP, and put them in the first post on the topic as a numbered list; so that people have a shared vocabulary for addressing things. ↩︎

The PEP makes it clear that the code that inc/dec ref counts is using additonal fields in a PyObject. That means you must recompile to use nogil. Also the fact that the inc/dec is done in macros means you cannot
plug in a gil vs nogil implementation for inc/dec under the extension.

I think your could offer inc/dec as functions and not use the macros then you could think to have 1 ABI for gil/nogil. You also need gil to reserve the addititonal fields in the py object.

For the macOS installer it should be fairly easy to provide a second “nogil” installer (although by doubling the amount of work @nad has to do around releases for building the installer). An installer with both build variants is more work, and could end up being confusing for users.

1 Like

Given the complexity of this topic, it’s worth considering whether to split the PEP to help focus discussion: for example, one PEP focused on the technical aspects of creating a new GIL-less Python version, and another focused on how to deploy the change in terms of installers, packaging, etc.

4 Likes

Similar for Windows, though I expect both options would end up being confusing.

While this proposal looks impressive and a no-GIL Python would be exciting, the transition has to be handled very carefully to avoid another 2->3 situation.

If Python code and C extensions actually worked fine without changes, except for a rebuild or code that stopped compiling with Py_NOGIL, it would be a huge PR win to call the default-nogil version “Python 4” and say “look, a huge new step without all the hassle of the last migration”.

However, for the reasons @encukou already mentioned, I am afraid that it is more like a 2->3 migration, but more subtle: instead of more-or-less consistent runtime errors due to e.g. a missed bytes/str conversion, you’ll get intermittent memory safety bugs due to e.g. missing locking in a C extension.

Every C extension would have to be fine-tooth-combed, and the well-known projects that @colesbury mentions are probably among the least problematic since they have lots of highly-competent eyes and comprehensive test suites. Many C extensions won’t, especially those in closed source code that barely get the maintenance needed to compile in the latest minor release.

IMO the most conservative path is called for: provide versions with the alternate ABI, call them experimental, let people get used to them, and don’t flip the default until much later, when “runs fine with nogil” has become the general expectation.

9 Likes

That could easily be worked around by making the ABI change unconditional, regardless of whether the GIL is enabled. Then you don’t need to build separate packages for GIL and no-GIL.

8 Likes

I believe there is good sense in talking about Python4 being the multicore Python. OCaml did the same recently (OCaml Multicore - November 2021) and also did a major version increment to 5.0.

I also like the marketing part of it: Python4 for the fast future, because the world is multicore :slight_smile:

But, is a major version step feasible? Is the steering council in on it? And where does the GvR stand on this? In the resent interview with Lex Fridman GvR clearly had some reservations about 4.0. Can we get support for something like this? Will people think “oh no, not another 2->3 nightmare” (I know it is not, but it probably have to be addressed)

2 Likes

If it’s not, then what’s the point of calling it Python4? Address that part too, please.

This is not entirely true, and it will probably make many Python programs (those that do not need multithreading) slower. This PEP goes around that topic by proposing it be optional, but if/when the decision comes to make it the default, the SC need to decide what performance loss is acceptable compared to the GIL version. If I were running something primarily singlethreaded, like an asyncio server, I would not be very happy that I’m losing performance because of nogil.

I don’t envy the current SC either, choosing between accepting and rejecting this PEP seems very difficult to me.

1 Like

I think for the python developer at large the transition from Python3 to Python4 (no-GIL) will be extremely easy, like no change at all or maybe some updates to a requirement.txt file.

For the 2->3 transition every developer had to change their code, and even though most changes were straightforward, it was still a big (also mental) block to get past.

Personally I think Python4 should be the concurrent Python, that also adds real concurrency (Think Erlang, Pony, ABCL, Go channels etc.) Python4 could also remove the async/await keywords (which I guess would add the requested nightmare :-))

Good point and I don’t envy the SC either :slight_smile: This is such a good technical implementation, that so deserves to get approved.

If just @colesbury had made this 10 years ago (instead of just being a kid in school) we would likely not have had asyncio and the async/await mess. I actually started a project 10 years ago (GitHub - pylots/pyworks: Concurrent object framework), which have been waiting for exactly this, so I am relatively biased. I did a lightning talk on the subject on PyCon 2013, but 30 minutes later GvR introduces asyncio. :slight_smile:

I for one really hope that this PEP gets accepted, but I was just wondering wether calling it Python4 will make the accept easier or harder?

1 Like

Just for the record, since a few people have suggested this, async/await is fundamentally different from multithreading. They are for different things, and having one doesn’t remove the need for the other.

16 Likes

I don’t think async/await is messy, the concurrency model takes a bit getting used to but once you get there it’s very nice to know that once you are in control of execution you will keep control until you release it. Threads on the other hand seem messier to me since you cannot control exactly when you release control, and you need to use locks to keep control. However, the red/blue function part is not a great part of async, but that could be fixed in a Python 4 by making all functions coroutines if used with await.

And while I also think that this proposal is impressive, that is not enough to “deserve” acceptance IMO.

Interesting points, because I have the opposite opinion. Concurrency is way to complex to leave up to the developer to handle. Especially the developer being in control of context-switch is in my experience something you should avoid, it is much to complex.

In languages like Erlang and Pony, it is built into the language. I get that it can be difficult to add on to a language like Python. Something like Go’s concurrency implementation might work, and would also work across threads.

So Python4: no-gil and chan keyword :slight_smile:

2 Likes

Fair enough, and as Steve said above each model has their use and strengths. Just a question on Go model: I haven’t coded any Go so correct me if I’m wrong, but doesn’t each goroutine have it’s own state, and state is shared using a messaging system? And that means that each goroutine is effectively single-threaded? That model seems more similar to the work on subinterpreters than what is presented in this PEP.

1 Like

You are correct, each goroutine can be thought of as being one single thread. Which is a huge strength, since you don’t need any lock’s on (local) data. You use channels to send message between two goroutines. There seam to be two general model: shared state (with lots of locks) or message passing (with lots of messages.) I advocate for message passing, since then the Actor (goroutine) can be made without locks or any concern of other threads messing with your data.

You are also right that the sub-interpreter looks like this, but in Go you can have millions of goroutines without much overhead. Sub-interpreter is at a different scale and you probably won’t start several thousands.

Too be fair, ideally for performance you would only need one sub-interpreter per core, and at startup each sub-interpreter select which Tasks/Processes/Actors it will handle.

This is how Pony does it. At startup a Pony program counts the number of CPU cores, and it starts a “worker” per core. Each Pony Actor is then assigned a worker which lives there for its lifetime.

3 Likes