Unresolved Concerns About the Free-Threading Build and PEP 703

eric.snow · May 13, 2025, 1:58am

tl;dr I don’t think the major concerns we’ve brought up have been fully addressed yet.

While I support the effort to make CPython free-threaded, I also feel it’s important that we consider if it’s worth it. Specifically, from day 1 there were three main concerns I have heard consistently from the core team (and still have myself):

impact on single-threaded performance (least of the three)
additional maintenance burden on core development
additional maintenance burden for the Python community (extension maintainers, etc.)

I’m not convinced any of them have been sufficiently addressed, especially relative to how far the (in-repo) implementation has gone.

—

As has always been our way, the burden to resolve these concerns rests on the proposer. I’m not suggesting that Sam has been delinquent in this. He’s generally been quite responsive and diligent. If anything, we haven’t been so diligent in bringing these concerns forward clearly, and the free-threading effort has understandably been more focused on solving the various technical challenges that have been more present than the above concerns. Thus I’m starting this discussion to bring the concerns closer to the forefront.

(Related: I’m going to open a separate thread to discuss reservations I have about how the process has played out for PEP 703.)

—

Those 3 concerns are significant. They represent the price the entire Python community must pay for the sake of support for free-threading in the runtime. With that it mind, these key questions surface (and have been partially answered by PEP 703):

A. who actually benefits from free-threading? (what are the motivating use cases?)
B. how do those users benefit, and how much?
C. what new costs offset those benefits?
D. what new costs does everyone else face?

(I’m going to start a separate thread exploring those questions directly, specifically about if free-threading is the only viable way to meet the needs of those users.)

Do we have satisfactory answers to those questions? I believe it is essential that we make sure we do, before we get further swept away by the momentum of the project. I don’t believe it’s too late.

All that brings us back to the main concerns. In each of the following three sections, I’ve broken those concerns down by the ways in which I believe they have not really been addressed.

1. impact on single-threaded performance

we don’t have reliable-enough information about single-threaded performance impact
- current benchmarks don’t represent actual workloads well
- consequently, current benchmark results can’t provide us with sufficient confidence
we don’t have much information on how well threads perform
- do threaded programs under free-threading really run substantially faster?
- how close do we get to scaling performance (e.g. 4 threads == 4x perf)?
- mostly there are no concurrency-oriented benchmarks (especially representing actual workloads)

I do want to point out that the people working on free-threading have worked very hard toward performance parity for single-threaded code, measuring with the existing pyperformance benchmarks. What they’ve accomplished is a big deal, very impressive.

I’m not saying that effort is pointless, nor that a hypothetical workload-oriented benchmark suite would disagree much with the existing benchmarks. Instead, I’m saying we currently can’t be sure how much they might disagree. Plus, the existing benchmarks aren’t particularly useful for communicating with users about performance impact.

(Honestly, this is the concern I’m least worried about. I’d guess that the existing benchmarks are close enough. I’m just not convinced yet that such an educated guess is sufficient given the potential impact to all Python users.)

2. additional maintenance burden on core development

the initially proposed changes were substantial enough to draw concerns
extensive changes have been required (much more than initially proposed)
significant extra complexity in the runtime implementation
support for free-threading is a cross-cutting concern that leaks across the code base
introduces the invasive pain of dealing with races to all contributors in nearly all code
long tail of inadvertently introduced races (easy to cause and problem will never go away)

FWIW, I have substantial experience with the pain of free-threading through my efforts of the last ~8 years with subinterpreters, where I’ve frequently worked in the space outside the GIL. However, my pain was restricted to very limited parts of the runtime, and yet there are still a few races we still haven’t solved.

Expanding that to the entire codebase magnifies the scale of the pain and extends it to all contributors. Most contributors aren’t going to be particularly familiar with thread-safety and even those who are will still get things wrong. Threads are perilous and there’s nothing we can do about it. We should be sure that long-term burden is worth it.

3. additional maintenance burden for Python community

This applies to users and module maintainers, but especially to extension module maintainers.

similar concerns as for core development
most? modules will be subject to some sort of thread unsafety
most maintainers don’t know how to deal with races
lack of utilities to reduce the negative impact
double-ish builds currently
C-API breakage (incompatible new stable ABI)

Bonus: Threads Are a Source of Pain

I also want to reiterate the negative consequences of free-threading, which many don’t really grasp until it’s too late:

dealing with races is a massive pain
races typically manifest at irregular times
races can manifest at unexpected times, including far enough in the future that it can be difficult to identify causality
just about any change can introduce a race unexpectedly
races can manifest in very indirect ways, making debugging particularly difficult

There’s a reason why modern languages tend to pursue other solutions for parallelism than threads.

Conclusion

There are a lot of costs and risks that come with free-threading. We must be sure it’s worth it before any binding decision.

stonebig · May 22, 2025, 5:50am

The JiT was supposed to favorise multiple-interpreters, but we don’t see a meaningful effect of JIT yet on performances.

pitrou · May 22, 2025, 9:36am

How is that the case? Because the JIT compiler itself serializes its work over a single thread? Or do you mean the JIT-compiled code is not multi-threading friendly?

stonebig · May 22, 2025, 5:19pm

If I remember well, Mark was conjecturing initially that the interpreting part of cpython would be greatly reduced with JIT and similar actions, causing the “penalty” time of free-threading to increase in proportion, degrading its gain/pain ratio.