A fast, free threading Python

Can you provide a (hypothetical?) example?

Yep, CPython can switch threads in the middle of these two operations. So, there’s a problem.

3 Likes

Again, of course. [1] But I understood that @pf_moore made the very fine point that due to specialisations we are discussing here (e.g. BINARY_SUBSCR_DICT), and hence the GIL, things which are nominally not thread-safe are effectively so in current CPython, because they are specialised to a single native instruction. And this, I think, only needs an explicit specification for whether or not such operations are to be considered effectively “atomic” or not. Otherwise, yes, these are just undiscovered bugs, currently protected by a CPython implementation detail.


  1. Except that there’s the timer. ↩︎

… of a library starting a background thread. Not exactly a library, but idlelib.run has

    sockthread = threading.Thread(target=manage_socket,
                                  name='SockThread',
                                  args=((LOCALHOST, port),))

as a result of which threading.activecount() and theading.enumerate() returns are greater when running on IDLE. Someone once asked why the difference on Stackoverflow. (I have not idea whether no-gil will require any change to manage_socket or that chance of user code having a problem.)

import threading

THREAD_COUNT = 3
BY_HOW_MUCH = 1_000_000


class Incrementor:
    def __init__(self):
        self.c = 0

def incr(incrementor, by_how_much):
    for i in range(by_how_much):
        incrementor.c += 1

incrementor = Incrementor()

threads = [
    threading.Thread(target=incr, args=(incrementor, BY_HOW_MUCH))
    for i in range(THREAD_COUNT)
]

for t in threads:
    t.start()

for t in threads:
    t.join()

print(incrementor.c)

prints 3 million when ran it on 3.10. Does it mean you can rely on += being atomic when writing Python code? No! If you run it on 3.9 it prints between 1.5 and 2 million. Soon a Faster CPython team member can swoop in and change (not break!) it again.

BTW if Java and .net developers can have free threads, then so can we.

4 Likes

The word “can” here translates to (potentially) decades of work, which was the case for Java:

Yes we “can” (and likely should), but it requires serious commitment, and off-hand “others do it too” is not helpful here.

In the context Java and threading, it’s worth noting how threads commonly need quite a lot of developer-facing infrastructure (e.g. thread pools) that’s probably very hard to make beginner-friendly / “Pythonic”, and that they’re on a similarly large multi-{year,person} effort to move from free threads to virtual threads[1] under Project “Loom” (where – arguably – the boundaries to async programming start getting blurred), and encapsulating a lot of that in simpler interfaces through “structured concurrency”, which we have already (at least through trio).

All that to say: if we argue “Java can”, then we should also look at where those choices have led them, and what they consider as “moving forward” from there. But realistically, we’re very far from an apples-to-apples comparison in any case, and it’s better to leave that rhetorical tool hanging in the shed.


  1. latest incarnation in JDK 21 ↩︎

5 Likes

I meant it from the user perspective, referring to “maybe PEP-684 is better, because it’s safer” part of the discussion. For years Python was “parallel, but…” and now adding subinterpreters will help, but it won’t solve the entire problem.

PEP-703 on the other hand, goes pretty much all the way (though stop-the-world GC may still be a limitation) for those willing to learn how to use it and for those who already have experience with threads from other languages. Python “popularity” will increase with projects choosing it for a multicore program when it will become an option.

Will some users hurt themselves with free threading? I’ve been tracking nogil for a long while now and from what I’ve seen, for someone who writes threadsafe code already (but not native extensions) it will be really hard to run into trouble.

2 Likes

There are some multithreading traps inside glibc (relevant for Linux) that are unfortunate and sort of perennial issues (not just in Python!). For example getenv and setenv; glibc maintains that multithreaded programs must not use setenv, it’s not thread safe.

A library could start a thread, and the library wants to and would use C getenv in this thread (getenv is allowed according to glibc in a multithreaded program, following the usual logic).
The user’s program then has a threading issue: they must not use setenv, that could possibly cause segfaults (Python has setenv interfaces through os.environ and os.putenv).

(Does this issue exist in Python already today? Is there some mitigating factor that I don’t know about? How do subinterpreters deal with this? It would be great if C getenv/setenv had a major revision to be somewhat compatible with threading.)

1 Like

I just want to throw in a use case which has not yet been discussed here and in the discussions of PEP 703: GUI toolkits.

GUI toolkits are naturally using threads and therefore an approach where free threading is replaced by a different concept like sub interpreters or multiprocessing is problematic in the design & architecture of GUI applications written in python (because the toolkits being exposed in python are not aware of such concepts and most likely offer plain threading for offloading computational workload from the GUI). I’m a user of the PySide (Qt for python) project, and the PySide devs did struggle with the GIL as explained here Qt for Python 5.15.0 is out! (and also the links inside the document).

The problem of when it’s better to release or not to release the GIL in the C extensions of GUI toolkits is not straightforward, sometimes counter-intuitive (at least to me) and often there is a compromise involved depending on most common use cases but with drawbacks for other use cases.

Moreover, when creating GUI applications in python, you start struggling with the GIL when you have huge workloads happening in the background. My use case is a computer vision GUI which acts as a monitor and development environment for remote embedded systems. It is similar to what @lunixbochs reported for the realtime audio use case - keeping latencies and stutters on an acceptable level is unnecessarily difficult when you have to fight against the GIL mechanism.

12 Likes

Wouldn’t this be a potential problem today? “Not thread-safe” doesn’t mean “not thread-safe only when used in a free-threaded environment.”

I’m not trying to be difficult, maybe a bit pedantic. The presence of the GIL can obscure threading bugs or make them rear their ugly heads less often, but it doesn’t make code thread-safe. @colesbury’s work to remove the GIL has done a lot to remove places in the interpreter, stdlib, and some third-party libraries that relied on the GIL (knowingly or not). Most (all? almost all?) of that work will have been in C code, not Python code.

3 Likes

I think it could be a problem today, don’t know, I have the same question. I’m here to be curious.

Again as an interesting anecdotal data point here is a port of that troublesome C code to Python: mtenv.py
Here’s how it runs:

  • Using Python 3.11.3 this program loops for a long time without problem
  • I compiled and used nogil-3.12 commit 4526c07caee8f2e (current tip of the repo)
    and it runs 1-2 seconds before it segfaults in getenv just like the C code from 2017.

This is a reduced example. It doesn’t look like a normal Python program, it has a strange shape so that it can reproduce a crash easily. But the fundamental elements can occur in normal Python programs - various C calls that libraries use that use getenv - let’s say mktime to use the example from that blog post - and for setenv we have plain interface to it in os (os.getenv is not a plain interface to C getenv).

Yes, it is a problem today, without free threading. getenv + setenv thread safety is a problem for Python applications I run at work. We had to do a bunch of whackamole to work around segfaults resulting from extension libraries using getenv + setenv (for a while we gave up and used a terrible LD_PRELOAD hack)

3 Likes

&

At the high level this is the kind of thing I’d love someone to try creating for per-subinterpreter-GIL use! This is also quite hard, but I assume there are interested folks out there.

Intuitively I expect this winds up being the same problem that needs to be solved for free threading (which PEP-703 appears to do): our pure reference counting model is the most significant reason we have a GIL - in order to share objects between multiple threads you need to make the reference counts work without that single lock.

Someone really needs to try creating explicitly shared objects implementation for CPython and subinterpreters to prove or disprove it’s actual utility. In the absence of that, I wouldn’t point to it and suggest it is a better solution. I consider it an open avenue of future work. (Even if we get free threading, performant explicit sharing would be something I expect many would appreciate having.)

5 Likes

We’ve had a chance to discuss this internally with the right people. Our team believes in the value that nogil will provide, and we are committed to working collaboratively to improve Python for everyone.

If PEP 703 is accepted, Meta can commit to support in the form of three engineer-years (from engineers experienced working in CPython internals) between the acceptance of PEP 703 and the end of 2025, to collaborate with the core dev team on landing the PEP 703 implementation smoothly in CPython and on ongoing improvements to the compatibility and performance of nogil CPython.

128 Likes

What’s blocking the acceptance of PEP 703?

2 Likes

See prior and current discussion, it’s not a small decision and has large ramifications:

https://discuss.python.org/t/pep-703-making-the-global-interpreter-lock-optional
https://discuss.python.org/t/pep-703-making-the-global-interpreter-lock-optional-3-12-updates
https://discuss.python.org/t/poll-feedback-to-the-sc-on-making-cpython-free-threaded-and-pep-703

3 Likes

TL;DR Option 1 is the best choice IMHO

It seems to me there needs to be a better rationale as to the justification of free threading in Python - why would we want that, what use cases are we looking to enable exactly? Is it worth (potentially) upending the ecosystem for these use cases?

Alas I propose the mere fact that it can be done is not a good enough reason.

In my 12+y of practical experience in using Python (previously 18y+ with Java & other languages) for web, full stack and data science use cases within an enterprise context, I have never found any problem with the GIL. For CPU bound parallelism multiprocessing works fine, if more parallel processing power is needed, multi-node is the way to go; for latency bound threading is usually a good option, although multiprocessing works in this case too.

One argument that is often brought up in my discussions with fellow devs is along the lines of “I don’t want to choose between threading or multiprocessing, I just want the code to do what it does already in parallel & faster”. While it seems sensible to want that, of course it is not that easy. With this expectation, the GIL most of the time becomes a point of frustration only because the first thing that gets tried is multi threading. It seems natural to do so: threads are among the top ideas that come up when looking up ways to make computers do things concurrently. Next thing you know, there is a complaint about the GIL limitting performance, when in fact the root cause is the wrong choice of the concurrency model.

I suggest that we should raise awaress of the best practices to parallel and concurrent task execution. We should also highlight that the general concensus (afaik) is that shared-nothing approaches work best and that you should avoid shared-object concurrency if you can. In this view having the GIL is a blessing in disguise.

2 Likes

But why - what is the aim of all this effort for no-gil? Are we doing it just bc we can?

See my thoughts A fast, free threading Python - #102 by miraculixx

I’m sorry but this is incredibly ignorant and dismisses not just the use cases that have been vocalized for over a decade by individuals from multiple domains but also implies that organizations are willing to subsidize this effort with millions of dollars for some theoretical benefits that have not been considered.

18 Likes

This is the part where I worry most about:

This PEP poses new challenges for distributing Python. At least for some time, there will be two versions of Python requiring separately compiled C-API extensions. It may take some time for C-API extension authors to build --disable-gil compatible packages and upload them to PyPI. Additionally, some authors may be hesitant to support the --disable-gil mode until it has wide adoption, but adoption will likely depend on the availability of Python’s rich set of extensions.

To mitigate this, the author will work with Anaconda to distribute a --disable-gil version of Python together with compatible packages from conda channels. This centralizes the challenges of building extensions, and the author believes this will enable more people to use Python without the GIL sooner than they would otherwise be able to.

For extension developers, in practice this would mean that both the --disable-gil and GIL versions will need to be maintained for an extended period of time. Given the lifecycle of Linux releases such as ubuntu and debian at least 5 years. During this time there will be limitations on how extensions can be developed because of the dual compatibility. There is not really a choice to only develop for x or y. If you want your extension/prorgram to be seriously considered, it needs to run on both versions of the interpreter. I really don’t want another era where I have to write a limited subset of python, literally getting the worst of both worlds, for compatibility reasons. Python 2’s well deserved rest felt as a liberation to me and going back to a similar scenario fills me with dread.

1 Like