PEP 779: Criteria for supported status for free-threaded Python

I can’t speak for the other authors, but I certainly don’t expect anyone to do anything (but perhaps that says more about my expectations of people in general). Whether pure-Python packages are automatically thread-safe to the extent the package maintainers want it to be thread-safe is very difficult to say. “Probably”, if the package doesn’t have global state and users don’t expect to be able to use other state (like state in instances) to be meaningfully usable from multiple threads at once. Ultimately free-threaded Python does not change that.

But certainly there are no expectations on anyone but CPython developers in PEP 779, because that’s not what the PEP is about. It’s about whether the free-threaded design is stable enough, and tested enough, to stop pretending it can break without notice.

I’m unclear what this addresses. Are we talking about stability in CPython, or what we tell users they are allowed to demand of library authors? Because we’re already at beta-level stability. The point of PEP 779 is that we’re ready for stable (but still opt-in) free-threading support in CPython. I do not think pretending it’s not stable is a good way to move forward.

So far, the objections to calling it supported that I’ve seen are about how users who don’t actually understand the relationship between CPython, Python, and Python packages would perceive the status in the larger community. Does it help if instead of ‘supported’ we just drop “experimental”, possibly expanding that to “the regular deprecation policies now apply”? I still want CPython itself to consider it supported – including Core Dev buy-in – but if “supported” is the word that somehow causes users to appear at package maintainers’ doorsteps with pitchforks demanding that support, we don’t have to use that specific term.

Sure! What changed my mind is seeing the benefits in real programs, actually working to make code thread-safe, and seeing the enthusiasm for this feature from anyone who worked with it.

I wish there was more concrete work to share to show the massive potential of free-threading. To be clear, I was never skeptical about the benefit of free-threading, but seeing actual programs that were 10 times more efficient still blew my mind. That’s not a micro-benchmark by any means, that was something that switched from using multiprocessing to using threads for a big workload. On the kinds of programs I’m talking about here, 1% is considered a big deal. Other examples include the numpy PR @ngoldbaum linked to earlier (https://github.com/numpy/numpy/pull/27896 – keep in mind the graph is log scale), which shows a ~20x speedup. I didn’t expect to see that big a difference, and I didn’t expect it to be this easy to achieve in the kinds of programs that would benefit from it.

The other thing that I didn’t expect was how easy it would be to make things work. I did not truly realise how simple Sam’s design is to the end user. (Not Sam’s fault, PEP 703 does try to convey this, I just didn’t listen :P) The way it uses the fact that existing Python C API code has to deal with the GIL being released at certain (many!) points, to provide an eventually-consistent view of the world, to provide non-deadlocking critical sections, and to provide lockless access to things like lists and dicts, is incredibly clever and elegant. It means almost all of the assumptions made by existing C code just work.

Another thing I hadn’t really appreciated about making things work right: it’s very easy to test thread-safety in a free-threaded build. In a GIL build it’s often very difficult to trigger thread races, just because of how the timing usually ends up. You can write code that you think is safe, passes all the tests even when you throw a few hundred threads at it, and then you see it crash in 0.01% of the calls when you push it to production. It’s much easier to trigger races in a free-threaded build, which makes it much easier to find them too. Honestly, the free-threading-specific things I’ve worked on were way easier than understanding the dict implementation, or the current bytecode interpreter(s).

Oh god no. We do very little maintenance. Everything right now is development (mostly on performance). The maintenance cost of free-threading should be very low, all things considered. It’s highest for things like lists and dicts, which have fairly intricate mechanisms for lock-free access that rely on co-operation from the memory allocator (which is why we use mimalloc). Changing the internals of those types, yeah, that became more complicated. For the most part, the complexity in CPython is all hidden away in a few basic building blocks, and the basic rules about thread-safety are pretty simple, even if we haven’t written them all down yet. (That’s definitely on the short-term TODO list.) Ongoing maintenance on the free-threaded implementation is not going to cost any kind of full-time commitment.

Yes, that’s fair. PEP 779 is not meant to determine the overall outcome of the experiment. It’s only about the implementation being stable, and about it not going away without the usual deprecation cycle.

I see “supported” as mid-way between “experimental” and “default”, because we don’t have any meaningful scale. But I’m not hung up on the term, it’s just the term the SC used in its acceptance. I am hung up on the term “experimental”, because it leads to incorrect expectations. The free-threaded build isn’t breaking in incompatible ways, and it’s not just going away. It’s not a toy, an experimental idea, an unproven attempt to solve the underlying problem. Even in 3.13 it’s none of those things, although in 3.13 the performance cost was much, much higher (40%, if I recall correctly). It certainly isn’t any of those things in 3.14.

The reason I’m adamant that we do not call it experimental is because we need the community to move with us, in order to shape things like documentation, APIs, utility functions, ways of using threads. Not just library authors, but end users too. We can’t implement everything in Python, declare it done, and then have libraries start using it, add their free-threaded Python support, declare it done and then have users use it. The feedback cycle between users and package maintainers and CPython is critically important to get everything in a good shape. We cannot get to “stable enough to be the default” without users actually using it, and we cannot expect users to use it if we say it’s not supported.

And yeah, I think this means users will come to package maintainers and ask “when free-threaded Python support?”. I think they have to, because otherwise how will package maintainers know their users care? How will they know what users want from the free-threaded support? How will they know what they’re missing from CPython to make their ideas work?

17 Likes

FWIW, this is identical observation and motivation to supporting Windows ARM64 builds (which have been labelled “experimental” since 3.10 or so, and have largely not gained any ecosystem support due to the lack of demand). But nothing incompatible has happened there since the very earliest release (which was a default install directory change, nothing in the runtime itself, IIRC).

I’d be happy to have a middle ground state between “experimental” and “default”, and we probably just need to bikeshed a name that we aren’t already using to imply other things (which I think is the problem with “supported”). Something that implies that package maintainers should put it in their matrix, but regular users/consumers should treat it as a test scenario and be prepared to not push it to production. (Incidentally, this is basically the same aim as our beta and RC releases, which also suffer from the issue of not getting into package maintainers’ matrices early enough.)

9 Likes

In this thread, “beta” has been proposed. I like the sound of that, despite that we also have (orthogonal) alpha and beta releases. Maybe “beta-quality”?

7 Likes

The FAQ makes some promises about certain operations on built-in types being atomic “in practice”, and this has been essentially unchanged for the last 16 years, so there are probably significant amounts of Python code relying on the behaviors mentioned there.

2 Likes

I’m afraid “beta” might still signal to many (end) users that things can still change considerably, at least that has been my experience when preparing packages for Python beta releases. I tend to wait until the release candidate to make a release a required element of my test matrices.

I have little insight on stability of APIs and therefore can’t comment on whether release candidate would be appropriate. But if you want some end user perspective on the bikeshed I would argue that “release candidate” might be a good indicator to users that the build is ready to test with, if that is indeed the goal of this PEP.

Edit after @guido pointed out I was muddling things: with (end) users in the original comment I meant the package developers that seem to be the target of these builds. From the discussions I have read so far those currently seem to be the intended “users” of the free threaded builds.

1 Like

This is perhaps the source of our different POVs. Making the implementation subject to the usual deprecation rules does feel like “calling it” on the overall experiment to me, as there won’t be any realistic way to undo nogil once deprecation & removal takes 5+ years.

1 Like

And that’s exactly what we want end users to think. The beta is for testing by package developers.

1 Like

Sorry, I should have been clearer. I meant package developers with “users” as currently they seem to be the intended audience of these builds.

I’ll edit my post to reflect this. The rest of my comment still stands: as package maintainer I tend to wait until release candidates to fully add the build to the test matrix.

1 Like

What could “change considerably” exactly? I can understand performance improving, some bugs being fixed, perhaps new APIs to better take advantage of free-threading, etc., but the fundamental behavioral change of no-GIL vs. with-GIL should pretty much remain the same.

1 Like

We shouldn’t even be having considerable changes during regular betas. The biggest permitted changes are to revert things to be closer to the previous release, and while that can be annoying for early adopters of new features, it really shouldn’t impact anyone just trying to be compatible between versions.

It’s unfortunate that people interpret it as “too unstable to test my code with”, because the sole reason we do betas is for people to test their code with it. But this has been an ongoing issue that doesn’t need to be re-argued right now, except that it probably rules out trying to use “beta” again?

What about something along the lines of a “validation” release, or “migration” or even “pre-migration” build? So we would have “experimental” releases (for early adopters), “migration” releases (for downstream developers/distros) and “stable” releases (for end users).

3 Likes

Given that this is partially (maybe largely) a social issue, I wonder if a more enticing name would be helpful for attracting adoption. Something like “early-adopter phase” or “developer phase” might convey that this stage is for people to work with the release and adapt their packages.

I know that marketing feels a little icky here but it does matter.

18 Likes

That’s fair, and describing the feature as stable in 3.14+ makes sense. It really is just the plausible misreading of “supported” as “production ready” that is concerning me.

I’m not sure we’re going to find a word that captures our intent on that front better than “beta” does, but a phrase seems possible. If we can also adopt similar phrasing to describe the regular alpha/beta/final release cycle, that wouldn’t be a bad thing.

We tend to use “production ready” to describe final releases, and just say “not yet production ready” before that point.

Perhaps it would work to explicitly describe free threading (and beta releases in general) as “library testing ready”?

Then tweak the wording in release announcements along the lines of:

  • alpha: not yet production ready, library regression testing ready
  • beta: not yet production ready, library testing ready (including new feature adoption)
  • rc: expected to be production ready
  • final: production ready
7 Likes

“Feature complete”?
The feature itself is ready and will be covered by the backwards-compatibility policies we use for all other code [beta, rc; final]. But, users should expect gaps in docs and packaging support, and some bugs in the stdlib.

1 Like

Well, somewhat amusingly, we described Python 3.0 as a “production release”.

The original announcement by @barry on python-dev was a bit more detailed but similarly confident:

We are confident that Python 3.0 is of the same high quality as our
previous releases, such as the recently announced Python 2.6. We will
continue to support and develop both Python 3 and Python 2 for the
foreseeable future, and you can safely choose either version (or both)
to use in your projects.

4 Likes

Thoughts from old new Python developer: I’ve been coding professionally since 1990 and started Python around six years ago with 3.5.

Free threading should be experimental or supported. As long as it is not the default, the user is primarily responsible for understanding the consequences of starting an alternate interpreter. Bikeshedding euphemisms for we think it’s ready, but you can shoot yourself in the foot if you don’t read the documentation aren’t helpful.

With consumer-grade computers becoming common for the last decade or so and threading being a common topic in undergraduate computer science courses, I don’t consider threading a niche programming technique but a standard part of a software engineer’s toolkit.

I’ve never considered the global interpreter lock (GIL) to be part of the Python language but rather an artifact of the reference implementation that I’ve hoped would go away sooner rather than later. (This is not intended to be a complaint; Python is awesome and free, and I appreciate the community’s efforts in making it what it is.)

(I’m using the term lock to mean any valid concurrency control).

Supported should mean that existing single-threaded Python3.13 should produce the same results as the global interpreter lock (GIL) version.

In the absence of explicit documentation to the contrary:

  • I would expect to be able to use classes without locking. e.g. a = Widget( ), b = Widget() should work simultaneously in threads.
  • I would not expect to be able to mutate objects in multiple threads without locking. This includes standards types dict, list, set et al. (I’ve always thought collections.deque existed to be the sequence to share across threads.)
  • I would not expect to be able to share iterators across threads.

Big picture: Single thread code should generally not have to pay the runtime cost of concurrency checking unless a class is explicitly designed for threaded applications.

Long term, consistent with the “batteries included” philosophy, Python should have an atomic module to support thread-safe operations on basic types.

1 Like

Is there a guide for what it means for a package to support free threading?

Like I get that it means it works, but does it mean all the objects are thread safe or just that they don’t crash when used in multiple threads?

I guess I’m not really even sure what it means as a package developer myself. I could run my unit tests in a free threaded build, but that doesn’t really test cases with multiple threads since it wasn’t part of the test-plane before. Should it be now?

Is there an easy way to see if a requirements list is also free threading safe? Say I use requests from pypi. Do I need to manually check requests and it’s dependencies or do we have tooling to do that?

2 Likes

I hope that https://py-free-threading.github.io answers the questions you have. If you have specific things you’re looking for that you’d like to see, please feel free to open issues.

Like I get that it means it works, but does it mean all the objects are thread safe or just that they don’t crash when used in multiple threads?

I think we haven’t really settled as a community on exactly what “supporting free-threading” means. This is complicated by the fact that people assume the GIL means their code is thread-safe, when often the opposite is true. Multi-threaded test coverage is also not very good in many packages, including fundamental packages.

Here’s my take:

  • Thread safety bugs due to use of mutable global state should definitely be fixed. Those are likely bugs in the GIL-enabled build too but may be hard to trigger in the default configuration.
  • Pure functions and immutable objects should definitely be thread-safe, and any lack of thread safety should be fixed.
  • You should think about and clearly document what thread safety guarantees you want to provide for any classes defining objects with mutable state in your public API. There are a number of options here and I don’t think there’s a one-size-fits-all solution.
    • If you want to guarantee thread-safe operation in all multithreaded contexts, it’s likely you will need to add locking to guarantee that, which may have a performance penalty.
    • It’s much cheaper (at least in a language with access to atomic intrinsics or with atomics in the standard library, not sure how this would work in pure-python) to define a flag that is set on an object when a thread takes ownership. If another thread tries to read or write to the object with the flag set, that can be a runtime error. This might make sense for a library wrapping a low-level compression object where a compression context can only be safely accessed by a single thread at a time, for example.
    • You could also leave it up to the user of your library to use your library safely. Of course exactly what is safe and isn’t will take some careful thought. Just as an example, right now (and also for a very long time in the GIL-enabled build) it’s pretty easy to crash the interpreter in a multithreaded program using NumPy. We should make it so the interpreter doesn’t crash in situations like that, but getting to the point where that is true will be complicated and there will always be escape hatches via the NumPy C API that we can’t really do anything about. IMO there’s a lot of room for libraries to provide immutable data structures that are thread-safe by construction to avoid issues like this.

IMO, yes, you should add multithreaded tests, even if the free-threaded build wasn’t a thing. Since the threading module exists, your users are free to use your library in a multithreaded context whether or not you would like to support that.

See Validating thread safety with testing - py-free-threading for more guidance on how to write multithreaded tests.

Is there an easy way to see if a requirements list is also free threading safe? Say I use requests from pypi. Do I need to manually check requests and it’s dependencies or do we have tooling to do that?

If it’s a pure-python library, the assumption is that it’s safe in the sense that CPython is safe (e.g. no UB and no data races, but race conditions are possible). Also, single-threaded use will be identical to the GIL-enabled build, so unless you have native extensions you should probably expect existing single-threaded tests to pass.

If a dependency publishes a cp313t wheel on pypi, that is one way of signalling that they support free-threading, but you should probably look through their documentation to what extent they support free-threading, particularly for direct dependencies you want to use in a multi-threaded context.

If you are able to install your library, but at runtime CPython re-enabled the GIL, then you either have a native extension in your library or in a dependency that has not explicitly marked free-threaded support. It’s possible to publish cp313t wheels that re-enable the GIL like this and IMO it’s a valid choice for a library author to do this (although it does make things more difficult for people who want to experiment). You can force the GIL to be disabled with an environment variable or command-line flag to the interpreter.

IMO having more tooling and support in packaging tools to help with this would be great, but right now most of the tracking for this is happening in our tracking table and in @hugovk’s automatically updated tracking table for projects that publish platform-specific wheels.

4 Likes

I think that in fact, many packages simply don’t care about their thread safety status. One reason for that might be that not many users of the package actually use it in a multi-threaded context, but it may simply be that the author genuinely isn’t interested.

Free-threading is sparking people’s interest in multi-threading, but that’s often library consumers. So I see library authors getting increased pressure from outside to care about thread safety, but they may well still have no personal interest in the matter.

I disagree. They should be fixed if the library author cares about thread safety, and if the runtime penalty for single-threaded code is sufficiently low that the library author is willing to accept it in return for thread safety (although if they deliberately choose not to make that trade-off, they should document that fact, of course).

Maybe you view the above as obvious, but I’ve seen enough well-meaning users raising issues and insisting that “you must fix X”, pointing to statements like your post as justification. The reality is that (as has been said elsewhere in this thread) most pure Python libraries are no more or less thead safe with free threading than they were with the GIL. So if they have survived this long, free threading shouldn’t make any significant difference unless the library author wants it to.

1 Like

I fully agree with this. I was responding in the context of a question from a library author interested in adding free-threaded support. I think not supporting the free-threaded build is an entirely valid choice for a library to make, just like not supporting Python 3.0 was.

3 Likes

Unlike “not supporting Python 3.0”, the free threading build is not a backward incompatible change, though. So it’s reasonable to “assume I don’t have to do anything” without feeling that’s a choice to deliberately not support free threading.

(Sorry, this is getting off-topic, so I should stop. But I couldn’t let the implication that free threading was like the Python 2-3 transition pass unchallenged…)

2 Likes