A fast, free threading Python

I just want to throw in a use case which has not yet been discussed here and in the discussions of PEP 703: GUI toolkits.

GUI toolkits are naturally using threads and therefore an approach where free threading is replaced by a different concept like sub interpreters or multiprocessing is problematic in the design & architecture of GUI applications written in python (because the toolkits being exposed in python are not aware of such concepts and most likely offer plain threading for offloading computational workload from the GUI). I’m a user of the PySide (Qt for python) project, and the PySide devs did struggle with the GIL as explained here Qt for Python 5.15.0 is out! (and also the links inside the document).

The problem of when it’s better to release or not to release the GIL in the C extensions of GUI toolkits is not straightforward, sometimes counter-intuitive (at least to me) and often there is a compromise involved depending on most common use cases but with drawbacks for other use cases.

Moreover, when creating GUI applications in python, you start struggling with the GIL when you have huge workloads happening in the background. My use case is a computer vision GUI which acts as a monitor and development environment for remote embedded systems. It is similar to what @lunixbochs reported for the realtime audio use case - keeping latencies and stutters on an acceptable level is unnecessarily difficult when you have to fight against the GIL mechanism.


Wouldn’t this be a potential problem today? “Not thread-safe” doesn’t mean “not thread-safe only when used in a free-threaded environment.”

I’m not trying to be difficult, maybe a bit pedantic. The presence of the GIL can obscure threading bugs or make them rear their ugly heads less often, but it doesn’t make code thread-safe. @colesbury’s work to remove the GIL has done a lot to remove places in the interpreter, stdlib, and some third-party libraries that relied on the GIL (knowingly or not). Most (all? almost all?) of that work will have been in C code, not Python code.


I think it could be a problem today, don’t know, I have the same question. I’m here to be curious.

Again as an interesting anecdotal data point here is a port of that troublesome C code to Python: mtenv.py
Here’s how it runs:

  • Using Python 3.11.3 this program loops for a long time without problem
  • I compiled and used nogil-3.12 commit 4526c07caee8f2e (current tip of the repo)
    and it runs 1-2 seconds before it segfaults in getenv just like the C code from 2017.

This is a reduced example. It doesn’t look like a normal Python program, it has a strange shape so that it can reproduce a crash easily. But the fundamental elements can occur in normal Python programs - various C calls that libraries use that use getenv - let’s say mktime to use the example from that blog post - and for setenv we have plain interface to it in os (os.getenv is not a plain interface to C getenv).

Yes, it is a problem today, without free threading. getenv + setenv thread safety is a problem for Python applications I run at work. We had to do a bunch of whackamole to work around segfaults resulting from extension libraries using getenv + setenv (for a while we gave up and used a terrible LD_PRELOAD hack)



At the high level this is the kind of thing I’d love someone to try creating for per-subinterpreter-GIL use! This is also quite hard, but I assume there are interested folks out there.

Intuitively I expect this winds up being the same problem that needs to be solved for free threading (which PEP-703 appears to do): our pure reference counting model is the most significant reason we have a GIL - in order to share objects between multiple threads you need to make the reference counts work without that single lock.

Someone really needs to try creating explicitly shared objects implementation for CPython and subinterpreters to prove or disprove it’s actual utility. In the absence of that, I wouldn’t point to it and suggest it is a better solution. I consider it an open avenue of future work. (Even if we get free threading, performant explicit sharing would be something I expect many would appreciate having.)


We’ve had a chance to discuss this internally with the right people. Our team believes in the value that nogil will provide, and we are committed to working collaboratively to improve Python for everyone.

If PEP 703 is accepted, Meta can commit to support in the form of three engineer-years (from engineers experienced working in CPython internals) between the acceptance of PEP 703 and the end of 2025, to collaborate with the core dev team on landing the PEP 703 implementation smoothly in CPython and on ongoing improvements to the compatibility and performance of nogil CPython.


What’s blocking the acceptance of PEP 703?


See prior and current discussion, it’s not a small decision and has large ramifications:



TL;DR Option 1 is the best choice IMHO

It seems to me there needs to be a better rationale as to the justification of free threading in Python - why would we want that, what use cases are we looking to enable exactly? Is it worth (potentially) upending the ecosystem for these use cases?

Alas I propose the mere fact that it can be done is not a good enough reason.

In my 12+y of practical experience in using Python (previously 18y+ with Java & other languages) for web, full stack and data science use cases within an enterprise context, I have never found any problem with the GIL. For CPU bound parallelism multiprocessing works fine, if more parallel processing power is needed, multi-node is the way to go; for latency bound threading is usually a good option, although multiprocessing works in this case too.

One argument that is often brought up in my discussions with fellow devs is along the lines of “I don’t want to choose between threading or multiprocessing, I just want the code to do what it does already in parallel & faster”. While it seems sensible to want that, of course it is not that easy. With this expectation, the GIL most of the time becomes a point of frustration only because the first thing that gets tried is multi threading. It seems natural to do so: threads are among the top ideas that come up when looking up ways to make computers do things concurrently. Next thing you know, there is a complaint about the GIL limitting performance, when in fact the root cause is the wrong choice of the concurrency model.

I suggest that we should raise awaress of the best practices to parallel and concurrent task execution. We should also highlight that the general concensus (afaik) is that shared-nothing approaches work best and that you should avoid shared-object concurrency if you can. In this view having the GIL is a blessing in disguise.


But why - what is the aim of all this effort for no-gil? Are we doing it just bc we can?

See my thoughts A fast, free threading Python - #102 by miraculixx

I’m sorry but this is incredibly ignorant and dismisses not just the use cases that have been vocalized for over a decade by individuals from multiple domains but also implies that organizations are willing to subsidize this effort with millions of dollars for some theoretical benefits that have not been considered.


This is the part where I worry most about:

This PEP poses new challenges for distributing Python. At least for some time, there will be two versions of Python requiring separately compiled C-API extensions. It may take some time for C-API extension authors to build --disable-gil compatible packages and upload them to PyPI. Additionally, some authors may be hesitant to support the --disable-gil mode until it has wide adoption, but adoption will likely depend on the availability of Python’s rich set of extensions.

To mitigate this, the author will work with Anaconda to distribute a --disable-gil version of Python together with compatible packages from conda channels. This centralizes the challenges of building extensions, and the author believes this will enable more people to use Python without the GIL sooner than they would otherwise be able to.

For extension developers, in practice this would mean that both the --disable-gil and GIL versions will need to be maintained for an extended period of time. Given the lifecycle of Linux releases such as ubuntu and debian at least 5 years. During this time there will be limitations on how extensions can be developed because of the dual compatibility. There is not really a choice to only develop for x or y. If you want your extension/prorgram to be seriously considered, it needs to run on both versions of the interpreter. I really don’t want another era where I have to write a limited subset of python, literally getting the worst of both worlds, for compatibility reasons. Python 2’s well deserved rest felt as a liberation to me and going back to a similar scenario fills me with dread.

1 Like

Thanks for your feedback, I appreciate the candor. However, my question is not ignorant of the use cases and the potential benefits a no-gil version to those use cases would have.

I am merely observing, from my own background, that the “GIL issues” are often raised due to considering a program and data design that are ameanable to shared-all free threading concurrency model, whereas Python (or rather CPython) effectively prefers shared-nothing concurrency. Incidentally, the latter has been known easier to reason about, to be less prone to failures, and in many cases also to be easier to scale.

I propose that this can be seen as a benefit, not a burden and I am asking if the risks inherent in a change of this dimension is warranted, including giving up on a fundamental benefit.

1 Like

Two things can be true at once. This comment is generally true, and there are also situations where shared-nothing does not work well, as pointed out in many posts on the various topics about removing the GIL.


Even if you have no use for shared memory parallelism, and only want to use the safer share-nothing approach, it is beneficial if the underlying VM can support sharing objects as it can make message passing much more efficient.

I agree that share-nothing is superior in terms of safety, but multiprocessing is an inefficient way to do it.

Erlang processes are conceptually isolated, but generally run in the same OS process (if running on the same physical machine).
The JVM supports a shared memory concurrency model, but nothing prevents you building a share-nothing application on top of it.


Yes, but do we want to? I for one deliberately left Java behind in favor of Python for all the (very often unnecessary) complexity the going powers that were and the common dogma would declare best practice. In particular Java’s multithreading and memory model is not something I miss.

Also it took the JVM years to be free of concurrency issues.

Yes, many people want to have free-threading in Python. It might be a good idea to read this thread, and the two PEP 703 discussion threads (first, second), in full (yes, all 370+ messages). Many people have shown why the existing models are not enough, and have presented real-life examples where the GIL was a bottleneck, and sometimes caused a rewrite in another language.

Free-threading might not be easy to implement at first, but it is certainly worth the effort in the long term. If PEP 703 is rejected, I would (in my personal opinion) consider it as an admission that Python is a toy language, requiring “serious” workflows to be implemented in C.


That’s a bit too strong of a position IMO. If PEP 703 is rejected, I would consider it an admission that free threading is hard, requires difficult decisions in terms of what consequences are acceptable, and will require further planning before it gets implemented.

The general tone from most people has been “this would be great, but what are the costs”. Rejecting one specific proposal for free threading doesn’t mean that nobody wants free threading.


I think this does depend on the form of rejection, though. If it came down in the same tone as some of the posts in this thread [1], I don’t imagine another Sam Gross coming along for the foreseeable future.

  1. not that I think it would, based on the discussions I’ve read ↩︎


It’s already implemented and working, with minimal performance losses compared to previous efforts and at least one mega-org committed to subsidizing the effort for years with engineers that have expertise with the code.

If this doesn’t happen now then I would be shocked if it ever did.