Is Free-Threading Our Only Option?

eric.snow · May 13, 2025, 2:28am

tl;dr We have other viable options to support the multi-core needs of Python users. We don’t need to feel like free-threading is the only option.

(FTR, this is an expansion of what I proposed to present at the language summit this year. In fact, this first part is essentially the summary and outline I submitted.)

PEP 703 and the free-threaded build of CPython have brought a lot of attention to Python’s multi-core story, which has historically been murky at best. There are a variety of things we can do to improve that story, including better docs, expose existing functionality, or even remove the GIL.

Most importantly, we need to clearly understand and agree about what users actually need solved. This is especially important before we make a final decision about PEP 703.

The main point here is that, when it comes to multi-core parallelism, we need to be clear about:

what Python’s users really need
what solutions we offer
what solutions we could offer
what their downsides are (and any secondary benefits)

Clarity in this area will help us both make good decisions and communicate better with users.

I’m not aware of any significant analysis along these lines (other than my own meager-but-best effort back in 2015).

Summary:

Part 1: status quo

use cases for Python in a threaded application
a. in Python programs
b. in extension modules
c. in embedded Python
d. what data do users want to share between threads
status quo solutions
- threads w/GIL
- multiprocessing
- distributed
- multiple interpreters (C-API-only)
deficiencies in the status quo
- threads: GIL (incl. blocking embedded threads unnecessarily)
- multiprocessing: slow, extra system resources
- no stdlib module for subinterpreters (PEP 734)
- subinterpreter rough corners
- documentation (howto) - show users how to do concurrency with Python

Part 2: possible improvements

stdlib “interpreters” module (PEP 734)
better interpreter perf
interpreter/threading helpers in stdlib (e.g. proxies, immutable)
get rid of GIL (PEP 703)

Part 3: is the free-threading build necessary?

being unnecessary does not mean PEP 703 should be rejected
alternatives allow us to make better decisions

Regarding part 3, I think it’s important that we have consensus about any available alternatives to removing the GIL. We shouldn’t need to feel like we have to accept PEP 703. To be clear, I’m not against removing the GIL; I only think we should make an informed decision.

Status Quo: Use Cases for Python in a Threaded App

This is an area where I think we could have substantially more clarity.

in Python programs
- ???
in extension modules
- ???
in embedded Python
- ???
what data do users want to share between threads?
- ???

Status Quo: Solutions

threads w/GIL
multiprocessing
distributed
multiple interpreters (C-API-only)

Status Quo: Deficiencies

threads: GIL (incl. blocking embedded threads unnecessarily)
multiprocessing: slow, extra system resources
no stdlib module for subinterpreters (PEP 734)
subinterpreter rough corners
users tend to not have much understanding of how they will be impacted by free-threading

Possible Improvements

stdlib “interpreters” module (PEP 734)
better interpreter perf
interpreter/threading helpers in stdlib (e.g. proxies, immutable)
get rid of GIL (PEP 703)

Is the Free-threading Build Necessary?

First of all, it’s important to note the following:

Sam and his team have been responsive, highly collaborative, and never uncooperative
it may be easier for some users to take advantage of free-threading than the alternatives
being unnecessary does not mean PEP 703 should be rejected
alternatives allow us to make better decisions

The question of necessity is partly a function of the following:

who actually benefits from free-threading? (what are the motivating use cases?)
how do those users benefit, and how much?
what new costs offset those benefits?
what new costs does everyone else face?

FWIW, PEP 703 does describe a number of motivating use cases. I don’t mean to suggest it doesn’t but, rather, that it would help to have a clear, broad analysis of use cases for multi-core parallelism that’s independent of PEP 703. That would put us in a position where we could better assess the options for supporting all the use cases.

The other part of the equation involves what alternatives are available. I’m most familiar with the use of multiple interpreters, but that isn’t the only viable alternative. The same questions from just above should be answered for the alternatives, so we can measure where the different solutions overlap and where they don’t. And I wouldn’t be surprised if there was a significant amount of overlap. (We just can’t be so sure yet.)

Conclusion

We shouldn’t feel like we have to accept PEP 703. We have viable alternatives that don’t have the same downsides. I’m not opposed to us keeping free-threading, as long as we are deliberate about accepting the costs. However, we must not do it solely because there doesn’t seem to be any other way to meet certain users’ needs. That just isn’t the case.

pitrou · May 13, 2025, 6:39am

One very rough corner of subinterpreters is that many extension modules are currently not compatible with them, and it’s not really obvious what’s needed to make a project compatible. As a maintainer of PyArrow, this is not even on our radar, while our project works on free-threaded builds.

methane · May 13, 2025, 7:56am

Web applications are clearly a use case that would benefit from PEP 703.

The synchronous programming model is easier for many people than the async/await programming model. And connection pooling for DBs and external HTTP services, as well as sharing large in-memory data, saves resources.

While multiprocessing can share some memory between processes using fork, the combination of fork and multithreading is difficult to use correctly. Sharing data between subinterpreters is even more difficult than with fork.

Sharing connection pools for external DBs or HTTP connections is much harder with either multiprocessing or subinterpreters.

Therefore, if we can use multithreading alone to take advantage of multiple CPU cores without using multiprocessing, it will be easier to create efficient web applications.

philthompson · May 13, 2025, 8:20am

I am taking the opposite approach for PyQt. I feel that I better
understand what is needed to support subinterpreters, that the work is
going to be needed to support free threading anyway, and my gut feel is
that subinterpreters will prove to be more useful.

pitrou · May 13, 2025, 9:35am

I’m a bit surprised, because GUIs typically have a lot of shared state and benefit from low latencies, so I would intuitively consider free-threading more useful than multiple interpreters.

philthompson · May 13, 2025, 9:56am

That may be true within the implementation of a GUI toolkit, but less so
at the level of its API.

mikeshardmind · May 13, 2025, 10:49am

Subinterpreters have only recently started to gain value with per-subinterpreter gil, yet despite that being an older change, I’ve found more libraries already “just work” with freethreading^[1].

Free threading seems more promising than multiple interpreters for most of my use cases, at least assuming we ever reach a point where it’s the norm, and not a separate build, which I believe was a goal.

Free threading is already showing performance gains in real-world code at $work, where we’ve already decided to commit to using the freethreading build for multiple services^[2] written in Python, even with its experimental nature. Despite wanting to get support for subinterpreter use, many people are skeptical of running another Python interpreter in a thread rather than just using any number of other possibilities for horizontal scaling at a different layer since subinterpreters can’t share data easily, there’s not a visible benefit over just launching multiple interpreters in orchestration.

whether that’s intentional, library design, or user-pressure for support, I can’t say. ↩︎
Ones that we’re willing to assume the risk on, given the known state of the interpreter and our dependencies. ↩︎

pitrou · May 13, 2025, 10:53am

I was talking about actual GUI apps.

pf_moore · May 13, 2025, 11:09am

For me, free threading feels like it increases the barriers to writing multi-threaded code. There’s performance benefits, certainly, but the cost is that as a user, I need to be aware of all the risks and complexities of writing safe multi-threaded code in places where traditionally the GIL protected me. It’s “the devil you know”, in that free threading is just threading, with a bit more concurrency.

Conversely, multiple interpreters feel like they provide an inherently safe model. The disadvantage being that they are not actually available at the Python level yet And as such, it’s harder to know how well they will work in practice.

Ultimately, what matters to me is “structured concurrency”. I don’t want to think about concurrency, I want a library to do that for me. All of my work tends to be at the level of something like concurrent.futures, and so what will matter to me is what gets built on top of the various threading options (and how effectively it makes things “just go faster”). Rust’s rayon crate is the sort of thing I’m looking for - I sped up a piece of code that had been written with no thought for concurrency^[1] by a factor of 3x, by literally just switching from a map method to a par_map method.

There’s a lot of promises that free threading will enable people to develop structured concurrency libraries. If that doesn’t happen, I don’t see free threading being a significant improvement for anyone who isn’t already using traditional threading. On the other hand, multiple interpreters offers a structured model from the start - what matters there is whether it’s easy to work with, and delivers “drop in” performance improvements.

I think both are probably worthwhile, for different audiences.

Admittedly, the fact that the random number crate, and rust in general, is threadsafe by default helped a lot ↩︎

encukou · May 13, 2025, 11:57am

The Python layer is available on PyPI, and works with Python 3.13+ (even the non-experimental builds) :‍)

steve.dower · May 13, 2025, 12:19pm

Actual GUI apps tend to follow a “core thread” model^[1]. Basically, one thread handles all the GUI, and it spawns off independent workers that occasionally post updates back to the core thread. You often never have interactions between those workers, and you certainly don’t want the core UI thread to be blocked by any task.^[2]

But because you need all updates to be atomic, calls back into the core thread need to be coordinated. This is the tricky bit. Full free-threading doesn’t offer anything at all to help here - building your own locks is a recipe for deadlocks, and standard queues always eventually run into issues (backpressure, contention, etc.).

The temptation to just share state directly is what gets everyone into trouble, and you have to be really strict to avoid it. The benefit of multiple interpreters here is that they are really strict for you, and so you have to work extra hard to get into trouble (as opposed to laziness getting you into trouble).

Of course, useful primitives are still needed for efficient and safe sharing of state, and those aren’t widely available yet, but that’s part of the concern being raised here: there’s no interest or investment in creating those primitives, because the hype makes it look like free-threading has already “won”, and so it becomes self-fulfilling. Eric’s trying to make sure we at least stumble into self-fulfilling 1980’s-style multithreading on purpose, because we want it, not because it seemed like the only option on offer.

I made that particular name up, but you’ll find many variants on this concept with different names out there. ↩︎
I forget the exact number, but when Windows last revised their API performance guidelines I think the rule was any function that might ever take longer than (around) 10ms had to be an async/worker thread. This is too short for a spinning HDD that’s asleep, and so open() was required to be async. Unsurprisingly, devs didn’t like this much, and so we still have apps that freeze - the mobile phone OS’s did a better job of forcing devs into this world. ↩︎

pitrou · May 13, 2025, 2:00pm

I’ll say it again, but the GIL doesn’t protect third-party Python code. It protects CPython internals, and it also protects a bit third-party C extension code (but only a bit, and less than people usually expect).

ngoldbaum · May 13, 2025, 2:07pm

I’d argue the main way the GIL “protects” library authors is by causing multithreaded scaling issues. That means people don’t try to use many libraries with threads, so no one runs into or worries about latent thread-safety issues.

Free-threading makes multithreaded parallelism viable in a wider variety of circumstances, so preexisting issues become more pressing.

IMO that doesn’t mean free-threading shouldn’t happen, it means we as a community have a lot of work to do to make libraries safe(r) to use with threads.

jamestwebber · May 13, 2025, 2:20pm

I feel like there’s an inherent tension in this and the other two threads: the PEP 703 is going too fast, but also there are still a lot of open questions about whether it’s worth it. But a lot of these questions are because it’s still not ready for many users!

It is not surprising to me that the early adopters of the freethreading build are in companies, where there are the resources to invest in switching, and they don’t have to worry about supporting external users.

There’s no way to release a package only for free-threading builds, right? I suppose a rayon-like package could be installable on GILed builds too, it just wouldn’t improve performance. That could still be helpful for people to test their code, though.

pf_moore · May 13, 2025, 2:41pm

Apologies. I keep making that mistake (it’s more of a mistake in how I describe things rather than how I write code, which I guess is at least some consolation).

What I was trying to say is that I’m likely to use threading in cases where I’d previously not have done so (precisely because of the performance benefits, and let’s be honest, because if free-threading isn’t used, what’s the point?) But I may not be as familiar with how to write safe threaded code as I need to be.

So I guess the GIL “protects” me by blocking me from using threading, as @ngoldbaum said.

And I think this is where the issue lies. We also need to bring safer ways of using multi-threading to all the new users who will now benefit from multi-threading where they haven’t before. If we don’t do that (the “structured concurrency” question) then multiple interpreters will likely be a safer option for those users.

pitrou · May 13, 2025, 2:58pm

Well, concurrent.futures.ThreadPoolExecutor is quite a reasonable way to exploit multi-threading if what you’re doing fits in that execution model (i.e. data parallelism: execute the same code over independent pieces of data).

Similar things are possible if you’re doing asynchronous programming, for example running tasks in worker threads using asyncio.

There are also third-party libraries like Dask which give you higher-level constructs for domain-specific tasks.

jamestwebber · May 13, 2025, 3:03pm

Yeah, ThreadPoolExecutor is 95% of the way to what a user might want for mapping tasks. The Python version of par_map is just

def par_map(fn, *args, max_workers=None):
    with ThreadPoolExecutor(max_workers) as exc:
        yield from exc.map(fn, *args)

pf_moore · May 13, 2025, 3:20pm

Agreed. Much of the remainder comes from improving the concurrency-awareness of the stdlib and 3rd party libraries (simply using locks everywhere makes concurrency gains much harder to achieve). But this is off-topic now, so let’s leave it there.

ronaldoussoren · May 13, 2025, 4:01pm

I’ve added free-threading support to PyObjC because that fits nicely into the platform (some of Apple’s system APIs can run callbacks on other threads).

I haven’t even started work on subinterpreter support because PyObjC changes process-global state in a way that cannot be undone, which means subinterpreter support would be incompatible with using short lived subintepreters (e.g. as an alternative to multiprocessing). I also expect that adding subinterpreter support would be harder than free-threading support.

brettcannon · May 13, 2025, 4:37pm

You can by setting the ABI in the wheel tag to e.g. cp313t and only releasing such wheels.