Good call, it was probably a mistake to bring up the 2-3 transition there.
A more precise way to say it is that not supporting the threading module, or only supporting it with strong caveats is a valid choice to make. See, for example, the pytest or python-zstandard documentation.
That’s more or less the status quo for a lot of libraries.
From my own experience of having done this for a few projects I would say that running the single threaded test suite under the free-threading build is unlikely to surface any issues. What did reveal issues was running the test suite with Quansight’s pytest-run-parallel. If you pip install pytest-run-parallel then you can run your tests with
pytest --parallel-threads=5 --iterations=10
This runs your test suite but runs each test multiple times simultaneously using a thread pool. You can use it with the free-threaded build or with the GIL build. I found that this threw up a lot of false positives but also surfaced what look like a few real issues (although I haven’t looked into them in detail yet). The false positives are that the test suites are not designed for multithreading and mutate global state (e.g. the warnings module filters don’t work with multithreading). The real issues are that I have see a test failure from pure Python code that I assume is something to do with a global cache and I have seen a segfault from a particular extension module under the free threading build.
Using pytest-parallel-threads is a cheap way to test multithreading using your single threaded test suite and probably gives a reasonable first approximation for what it might like if the library is used in a multithreaded program. It does not however stress some of the biggest problems like what happens when objects are shared between threads so for that you probably would need to write some explicit tests. In my case it was fairly obvious that some extension types would potentially crash if mutated by multiple threads but still not trivial to produce a test that stressed this enough to manifest the crash.
Of course having identified any issues you are left with a choice for what to do about them. @ngoldbaum outlined many valid choices. I would say that right now it should be considered reasonable to just document any known issues, say that it is all experimental, and put the package out for others to experiment with. Anyone who uses the free-threaded build with or without your package should understand that they are an early adopter and it is expected that they might run into problems.
The Steering Council (SC) approves PEP 779, with the effect of removing the “experimental” tag from the free-threaded build of Python 3.14.
Along with this, the SC considers the following requirements must be addressed during Phase II. Meeting these ensures that free-threading support can mature into a safe and scalable default in future versions of Python. This is not an exhaustive list, and any other previous requirements should still be met.
C API/ABI compatibility and Stable ABI for free-threading
Any proposed changes that break existing API or ABI guarantees must be agreed to in advance with the C API Working Group. Limited C APIs for free-threading must provide a reliable foundation so that third-party library maintainers, particularly those who rely on the C API, can plan their support with confidence. The Steering Council also expects that Stable ABI for free-threading should be prepared and defined for Python 3.15.
Requirements for new experimental projects within CPython
New experimental projects within CPython must be compatible with, and should be based on the free-threading build. The SC encourages this direction to reduce engineering complexity caused by supporting both GIL and free-threaded builds.
Performance and memory guard rails
While the official target is to stay within 10% performance degradation, any changes expected to cause a slowdown of up to 15% should be discussed with the SC in advance. We expect a review of the specific changes contributing to such degradation to understand and evaluate their impact. Memory usage increases must stay below 15% compared to the GIL build. Similarly, any changes expected to increase memory above 15% (with a hard limit of 20%) should be discussed with the SC in advance.
Documentation expectations
Documentation must be clearly written and maintained.
For Python users: What guarantees exist, and how are they affected by the free-threaded build on all APIs in all modules of the standard library.
For both Python and C API developers: Documentation on signal-safety, thread-safety, and other concurrency-related guarantees in all APIs that are publicly exposed without exceptions.
For CPython developers: Documentation on the impact of free-threading and how it should be taken into consideration while working on the language implementation.
We recommend a central “free threading landing page”, location to be decided, which provides a guide to all the disparate documentation, PEPs, timelines, other decisions, and information regarding the free threading feature in Python. If https://py-free-threading.github.io/ is that site, we recommend making it an official page and improving its discoverability and visibility (e.g. possibly moving to the python.org domain).
Preparation for high-level concurrency primitives
The Python core team should begin considering and proposing higher-level concurrency primitives that users can use safely and effectively, without requiring a deep understanding of the underlying threading mechanism. And the SC wishes that this task should be prioritized once the above tasks are stable. We recommend using the concurrent package in the stdlib for this, where appropriate.
Benchmark requirements
All claims regarding performance, memory usage, and correctness must be supported by comprehensive and repeatable tests. These evaluations should be conducted using the existing benchmark infrastructure based on pyperformance. The SC encourages the community to contribute additional benchmarks, particularly those that are affected by or relevant to free-threading and which better represent real-world workloads, to ensure broad and realistic coverage.
We are confident that the project is on the right path, and we appreciate the continued dedication from everyone working to make free-threading ready for broader adoption across the Python community.
With these recommendations and the acceptance of this PEP, we as the Python developer community should broadly advertise that free-threading is a supported Python build option now and into the future, and that it will not be removed without following a proper deprecation schedule. As approved by the 3.14 Release Manager Hugo van Kemenade, we recommend officially removing the “experimental” tag from the CPython free-threading build with the 3.14 beta 3 release.
Keep in mind that any decision to transition to Phase III, with free-threading as the default or sole build of Python is still undecided, and dependent on many factors both within CPython itself and the community. We leave that decision for the future.
Donghee, on behalf of the Steering Council
Note: This announcement has been updated to clarify terminology. For details, see the updated post.
I’m sorry, but is “Stable C API” here the same as “Limited API”? I see that the “C API Stability” doc distinguishes between “Unstable API” and “Limited API”, but I see only a few references to “Stable API” in the docs — and it all is quite confusing to me.
As approved by the 3.14 Release Manager Hugo van Kemenade, we recommend officially removing the “experimental” tag from the CPython free-threading build with the 3.14 beta 3 release.
I think there’s a risk of this being confusing if there’s no Stable C API designation change for 3.14 as well.
Thank you! That’s great news! I do have a few questions about the requirements for 3.15
I assume that here you mean Stable ABI, not API. The APIs are already (small “s”) stable (as mentioned in PEP 779, we haven’t had to make any significant changes to the free-threaded APIs in 3.14, other than adding new APIs). The (big “S”) Stable ABI is definitely an important target for 3.15 though.
I assume the numbers you’re talking about are the geomean of pyperformance benchmarks as reported by, say, the Faster CPython benchmarks? I ask because those benchmarks are configurable and it’s easy to omit certain benchmarks that cause swings in numbers, despite taking the geomean. If this is really a hard requirement, we should agree on which specific benchmarks we’re talking about here (The faster-cpython configuration already omits a couple of benchmarks, and would have ignored more if it hadn’t been for bugs in the configuration…) However, in the case of single-threaded CPU performance it’s unlikely to matter; we’re well below 15% even on antique compilers, and I don’t think there’s any reason to expect new slowdowns at this stage.
I really ask because the way pyperformance measures memory use is fairly inexact, and it uncovers details about mimalloc rather than Free-threaded Python itself. Pyperformance reports max RSS use, which is a dodgy number to rely on for multiple reasons. When instrumenting benchmarks with the really excellent memray tool, it’s very clear that free-threaded Python allocates about 5-7% more memory (because of the bigger object header as well as things like QSBR) – and we can easily prove this – but max RSS is much higher because of how mimalloc manages its pools and arenas. If we have to make sure the max RSS change as reported by pyperformance stays below 20%, we may have to consider trade-offs between single-threaded performance, multi-threaded performance and memory use. I don’t know if the pyperformance benchmarks are the best way to guide those trade-offs.
Is the intent here to set hard limits, or to make sure we’re doing our due diligence, protecting against degradations over time, and documenting the expected changes in memory use?
Does this mean you want experimental packages for this in the standard library? I believe it would be much better to develop these as regular third-party packages until we have a clear view of what APIs and use-cases warrant standard library implementations.
As we’re already doing both of those things (having comprehensive, repeatable evaluation based on pyperformance, and encouraging people to submit additional benchmarks) I see no problem with this, but I will point out that new benchmarks will affect existing performance numbers. This is a potential concern for us, if a bunch of benchmarks would get added that perform a lot worse (in performance or memory use) in free-threaded Python, but also for people who care about single-threaded performance, as adding a lot of multi-threaded benchmarks “devalues” the single-threaded numbers. If we make the pyperformance benchmarks an on-going requirement for work on CPython (rather than just for Faster CPython and Free-threaded Python), I think CPython (and the SC in particular) should be more involved with decisions around those benchmarks. Not just infrastructure for running them, but actually developing them into sound, stable benchmarks.
I’ll revise this paragraph with exact terms with the SC members. Sorry again for the confusion. I think that we need to use exact terms here. (And I don’t want to make 2nd mistake..)
I assume the numbers you’re talking about are the geomean of pyperformance benchmarks as reported by, say, the Faster CPython benchmarks
Yes, I believe that every time we talk about performance is geometric means and we think that we will not abuse the benchmark suite
I really ask because the way pyperformance measures memory use is fairly inexact.
I think we can discuss this in more detail using an OH. But what the SC feels about the memory increase is that it could become a burden for large-scale infrastructure companies when we switch to free-threading as the default build. So what the SC expects is that we should continue trying to optimize memory usage as much as possible, if we can. Until 3.14, we focused more on performance than memory usage, so now is a good time to start digging into that.
if a bunch of benchmarks would get added that perform a lot worse (in performance or memory use) in free-threaded Python, but also for people who care about single-threaded performance, as adding a lot of multi-threaded benchmarks “devalues” the single-threaded numbers.
I believe we always add them if they reflect realistic workloads, and that’s part of the natural evolution process. We’ve always overcome the challenges that come with it.
I believe you’re already well under this CPU wise so it’d be surprising to me for CPU performance to be an issue at this point.
The intent is “overall”. As pointed out upthread there is no grand definition of a canonical set of benchmarks or which ones are more important than others. So focus on agreeing amongst the core team that the ballpark slowdown across a slew of non-dismissed benchmarks is under that. Presumably run within pyperformance – though we shouldn’t blindly assume that anything being in pyperformance implies it is a useful benchmark for this purpose - it’s just a place we should put useful for some purpose at time of addition benchmarks – vs not using free-threading.
Note: This is not an invitation for anyone to try and make non free-threading builds faster as a way to attempt to block free-threading from happening. We’ve already declared :its-happening: status. It is time for future performance improvement work to consider free-threading up front. Perhaps that might be better worded as: “consider 3.14 a baseline to measure 3.14t and 3.15t against”?
The SC’s role is not to be the right people to pick and choose individual benchmarks. Consider the SC as tie breakers who can be deciders and pick favorites within whatever people are not coming to agreement over.
FWIW agreed with what Thomas noted about our lack of memory performance benchmarking goodness. I suggest using this as an opportunity to help define some core team consensus on what meaningful memory utilization benchmarks are with a repeatable way to measure them. Agreed that max RSS is generally a poor measure.
No, this means that the allowed performance regression is 10% but if that isn’t possible, the regressions may go up to 15% but it must be discussed with the SC first. 15% is the hard-limit (but I imagine if we’re over that, there would be further discussion ). The same applies for the memory limit (up to 15% is allowable by default and if we have to go up to 20%, it must be discussed with the SC).
The intent really is to ensure there are no surprises, to give the FT team some workable leeway to do the engineering work needed without micromanagement from the SC, but also to ensure that if some choices require larger trade-offs in memory or performance, that those choices are made deliberately and with full transparency to everyone (core dev, Python user, etc.). The great progress that’s been made so far gives us confidence that we’re striking the right balance, but as they say, there are unknown unknowns. We accept that’s the case, but really want to prevent folks from making choices in isolation in CPython main that have visible effects in either dimension without appropriate discussion ahead of time.
Agreed! Maybe ft-utils is already the place, and it’s totally appropriate to experiment there or elsewhere. We encourage moving useful APIs, libraries, and utils, into the stdlib when y’all think they’re ready and useful to the general FT user base. There’s already a lot of appetite for such APIs and I think we should help folks work together on those and not keep reinventing the wheel for what those look like.
Does this mean you want experimental packages for this in the standard library?
One possible idea is to adopt the free-threading-optimized dictionary (aka atomic dict) that was introduced at last year’s Language Summit, rather than replacing the current dictionary directly.
The Steering Council has updated one paragraph of its earlier announcement to clarify terminology.
Original wording
1. C API/ABI compatibility and Stable C API for free-threading
Any proposed changes that break existing API or ABI guarantees must be agreed to in advance with the C API Working Group. The Steering Council also expects that Stable C APIs for free-threading should be prepared and defined for Python 3.15. These APIs must provide a reliable foundation so that third-party library maintainers, particularly those who rely on the C API, can plan their support with confidence.
Updated wording
1. C API/ABI compatibility and Stable ABI for free-threading
Any proposed changes that break existing API or ABI guarantees must be agreed to in advance with the C API Working Group. Limited C APIs for free-threading must provide a reliable foundation so that third-party library maintainers, particularly those who rely on the C API, can plan their support with confidence. The Steering Council also expects that Stable ABI for free-threading should be prepared and defined for Python 3.15.