PEP 779: Criteria for supported status for free-threaded Python

I think the discussions are heading into a few different directions, most of which aren’t really on point for PEP 779. I don’t mind relitigating PEP 703 (it’s not hard to show the positive impact it will have, and that it’s well worth the cost), and I don’t mind making a re-evaluation of PEP 703 part of the requirements listed in PEP 779 if the SC wants to re-visit their decision, but I don’t think it’s helpful to use this thread for that discussion. I would like to focus the discussion in this thread on the promotion from phase I to phase II.

Normally when we introduce potentially breaking changes, we don’t have phase I. We start at phase II: the feature is available, but off by default in case it breaks someone. Phase I for PEP 703 is explicitly to prove to the SC and the Core Devs that the implementation is sound. It’s also to stop people from relying on it too much, because the implementation was yet to be proven sound. I think the ‘experimental’ tag is now only going to delay and hamper further work.

Concretely, for PEP 779, I would like to hear from more people what they need from us (the people working on the PEP 703 implementation), to be convinced the implementation is sound enough (and maintainable, desirable, stable enough).

That said, I do think all the discussion has been informative so I’m going to respond to a few comments ;p

How would keeping the CPython implementation in a state of flux, with no guarantees for stability, improve that situation? As I mentioned, we have seen pushback from packages on PRs for free-threading compatibility, and I don’t blame them at all. Signing up to support this when the feature is explicitly experimental and subject to more change than PEP 387 normally allows is a big ask.

If your concern is that users will be too demanding, I’m not sure how PEP 703 is different from past features, be it asyncio or binary wheels or what not. I think it’s a good idea to remind everyone in all communication that it’s still going to take a long time for most packages to support free-threading in a meaningful way. I don’t think claiming the feature is experimental when it is not is the way to do that.

Concretely, what would you use as the criteria for determining whether this CPython feature is ready to start its life like any other CPython feature?

This sounds like you’re equating this PEP as proposing for the free-threaded Python build to be the default. I would very much interpret the “this is premature” response to be because the feature is still experimental, which is the thing this PEP proposes to change. We are not talking about making it the default here.

Warning on improper concurrent access would be a really nice thing, but it’s expensive. It’s basically what ThreadSanitizer does for C/C++. I’m not sure we’ll ever be able to make that work as a -X flag, given how much overhead that will likely require, but I do hope we can eventually create something TSan-like for Python. It’s going to need use-cases – real-world threading problems – to motivate and guide it, though. It’s not really something you can make without knowing which actual problems you need to detect.

As for adding things to a list without a lock, I just want to mention that list.append() is safe. All the list operations are atomic. You will not lose items if multiple threads call list.append(). (Of course, something like 'if item not in mylist: mylist.append(item)` is not safe, but that’s not something we can fix in list objects.)

I think pushing the Stable ABI to 3.15, and starting it in an early alpha, is the prudent thing to do. I don’t think we need to wait for the Stable ABI to land for PEP 703’s implementation to be considered supported, though. (I wasn’t thinking of the Stable ABI, as such, when I wrote "provide feedback on APIs and ABIs.)

Do you think Stable ABI support should be required for the PEP 703 implementation to be considered supported?

Threading is absolutely a specialised tool! Nobody should ever doubt that. I’m sorry if anything I said suggested otherwise. (Please point me at it so I can fix that.) I tried to make it clear that free-threading is not a magic go-fast button and that code will probably have to be adapted, possibly redesigned, to make full use of it. Threading is definitely very complex, and not just because of thread-safety. Writing correct threaded code is hard (even without free-threading), and writing performant threaded code is doubly so.

I think for users, the language is pretty straight-forward (even if the implications are not): the build is experimental, which means PEP 387 does not apply. Once it’s supported, PEP 387 applies.

PEP 779 isn’t targeted at users, though, it’s targeted at the Core Devs, because it’s specifically about whether they think the PEP 703 implementation is stable enough to support. So, yes, the PEP talks about internals, because that’s the thing that matters for phase II. That’s what phase I was about.

Documenting the exact semantics of concurrent access is something we should do, although it’s also something we have to figure out. We don’t have a concrete list (yet), because it’s very much a trade-off between performance and thread-safety (or thread-consistency), which we make with real-world use-cases in mind. For example, should using the same iterator from multiple threads guarantee sequential consistency (each item is produced once, and only once)? Even the people working on fixing these issues don’t always agree :stuck_out_tongue: Additionally, we don’t want to guarantee too much from the outset because it’s very hard to revert that kind of promise once you’ve made it.

7 Likes

Because Windows is basically always cross-compiling, ABI options (including platform) should be considered to be explicitly set, possibly with defaults. I assume that not all build backends have caught on yet (since even platform cross-compiling isn’t an option), but it’s certainly the direction it should go.

Passing Py_GIL_DISABLED as Antoine showed is the intended outcome, although the build backend should decide whether or not to specify it based on its own options (which may include “am I being run in a free-threaded Python” for a default, but there ought to be no reason why you need an experimental runtime to run a build :wink: ).

Dynamically generating a header file in the Windows build system is significantly more complicated than autoconf[1]. And for the sake of one option that’s entirely portable and is intended to evaporate in time, really not worth it.


  1. In terms of making it reliable. For example, IDEs are currently broken because the file in the repo isn’t actually called pyconfig.h, but calling it that will certainly lead to interference in other builds, and eventually the distribution of the wrong file. So we’re kind of stuck, at least until we get rid of generation completely and just have a static file. ↩︎

From what I’ve experienced[1] working with free threading as a library author, I have everything I need already and the current APIs are sufficient for my needs. Enabling this for any reasonably designed concurrent code[2] has been relatively easy, though the people working on the code where I’ve enabled free-threading support have concurrency experience with other languages, and concurrency is a primary concern. Removing the reliance on the gil for freethreading in one library resulted in the standard build measurably improving in performance as well due to the changes improving the code.

From interactions with others:

  • The experimental label is definitely slowing library authors from even attempting to support it. Many projects won’t even publish a development-only classified wheel (ie. one pip won’t even pickup without --pre) with free threading because of the experimental label.
  • for a while, many projects were blocked on even attempting to support it because they were using cython, py03, or some other build time dependency that could not work with free threading.

  1. audioop-lts supports it, minimal work was required. All 3 native extensions I maintain at work support it, and we are intending to attempt to use freethreading in production with the knowledge that we may not be able to for 3.14. ↩︎

  2. Any code that properly splits problems to either not require explicit synchronization or can easily add explicit synchronization in limited places replacing prior places which were protected by the gil ↩︎

4 Likes

Can you point out where that is documented? Even under the GIL, there’s a huge amount of misinformation and uncertainty about details like that. And even more when you consider similar (but different under the hood) cases like count += 1.

I’m pretty sure there’s a lot of code out there which is technically unsafe, but because of a combination of the GIL and people not really stressing threads in Python, it broadly works fine. Free threading runs the risk of removing both of those safety nets. And authors won’t know how to fix such code, because the docs have so little information on thread safety guarantees.

Getting advice from @ngoldbaum and @colesbury is nice, but it’s not scalable :slightly_smiling_face:

I wasn’t suggesting otherwise. But people do use threading in Python, and often relatively naively. None of the pip maintainers are threading specialists (to my knowledge) but we’re reviewing and writing PRs that try to improve performance by using multithreading. And free threading will make doing that harder. At least I assume so - I’ve not heard anyone suggest that thread safety under free threading will be easier than relying on the GIL (and not even that it will be “no harder”).

What I don’t want to see is free threading being viewed by the general programming community as “thread safe (and memory safe) code on Python just got harder to write”. And IMO the experimental phase (where we can change things without backward compatibility concerns) is exactly the right time to explore how we address that. We’ve been doing it at the C API level, but when are we going to do the same at the Python/stdlib level?

The example from my code is particularly relevant here. If we need to change the random module to expose a more thread-friendly API, isn’t the experimental phase precisely the right time to do this? The same with concurrent.futures.

Personally, as a core dev who works on the Python code in the stdlib, rather than the C implementation, I don’t feel that I’m comfortable supporting free threading. Maybe I shouldn’t even be supporting GIL-based threading - that’s a fair point to make - but to whatever level my view matters, I’d rather we did a bit more on the Python level before moving free threading to “supported” status.

2 Likes

It’s here although it then immediately tells you not to take advantage of implicit locking…

Yeah, I’d struggle to interpret that as reassuring in any way…

2 Likes

Depends on what you mean by “supported”, really.
As I think about it more, I’d say we’re at a “Phase I½” – the API is stable on CPython’s side, but there is not yet sufficient community support, and we’re still stabilizing the ABI.
And that’s something that can go in release notes (preferably with SC’s blessing). Even the original pronouncement sees the “rollout in roughly three stages” [emphasis mine].
I don’t think “Phase II” has been reached in 3.14, but “Phase II” is just an internal milestone. PEP 779 is about placing that pre-determined line so we can say whether we reached it, but is that important? Can we instead tell people where we are?

I see several relevant axes for “supported”:

  • Stability of CPython
    • :slight_smile: the feature itself is on track to become the default
    • :slight_smile: the API is ready to be covered by PEP 387
    • :slight_smile: the version-specific ABI is stable too (I imagine people would be quite angry if even the 3.13t ABI changed in a bugfix release!)
    • :frowning: the stable ABI is not there
    • :frowning: the stdlib has some outstanding bugs
  • Packaging ecosystem support – not really core[1], but seems like a big part of the required “community support”
  • Expectations for third-party library maintainers: … we need to be very careful here

  1. IMO due to past political divisions we should get over) ↩︎

8 Likes

Following that link I found this:

However, Python has not historically guaranteed specific behavior for concurrent modifications to these built-in types [list, dict etc.]

I believe that is incorrect, at least in pure Python: Those operations have always been protected by the GIL so multi-threaded use was always fine (at the operation level, like a.append(x) – but never for composite operations like a[i] += 1).

That was very much by design when threading using the GIL was first implemented. So I believe that specific claim is wrong.

There are different reasons for caution with those data types though – sharing a mutable object is slow because of the locking or lock-free algorithm. But the reason should not be that they aren’t thread-safe.

1 Like

I’m happy to update that paragraph if you’d like to phrase it differently. There were a few reasons I wrote it that way:

  • The language reference doesn’t specify any multithreading behavior, and I didn’t want to the HOWTO to make stronger guarantees of future behavior than we currently provide.
  • Our documentation generally does not encourage behavior that is specific to CPython. Implementations like Jython do not have a GIL. I’m not sure what Jython’s behavior is for things like concurrent list.append().
  • Attempting to rely on the GIL for thread-safety for moderately complicated code often leads to bugs, because it’s hard to reason about what the GIL protects. (It’s similarly difficult to reason about what the per-object locks protect in the free threading build). That’s why the HOWTO recommends using threading.Lock. For example, WeakValueDictionary was prone to multithreading bugs until recently even with the GIL.

In generally, I still think encouraging people to use threading.Lock or other synchronization primitives in multithreaded code is the right advice.

2 Likes

I think this is the problematic part of the PEP and the three step logic suggested on how to get to a default free threaded build.

When a user reads “supported version” (regardless of whether enabled per default or enabled in special builds), they expect everything to be ready to be used. Not “we have a working implementation now, please test it and report any findings you may have to improve it”.

It would be much better to add a sound and ready-to-be-tested phase between steps I and II of what you wrote in the PEP 703 SC announcement, with a clear statement saying that while the implementation is ready for playing around with and using to migrate code to this new approach, it is not yet complete enough to be called “supported”, since a lot of work still needs to be done.

Some things which are really missing at the moment for a truly supported variant of CPython (just picking a few random ones which come to mind, in no particular oder):

  • Many APIs in the stdlib don’t include any information in the docs of whether they are thread-safe or not or what needs to be taken into consideration for using them in a free threaded environment

  • It would be really helpful to know (and have documented) which basic operations are atomic in Python when run in the free threading build, e.g. is list.pop() still thread-safe and what about x += 1 ?

  • While figuring out which parts are not thread-safe, we’ll surely identify cases which will need to be made thread-safe (e.g. sys.modules access or the warnings context manager to name two cases

  • A basic primer as intro to threaded programming should go into the docs (probably as a tutorial). This should highlight the mindset needed to write such apps and the common pitfalls to be aware of. Threading is hard and in many ways non-intuitive. Just like in networking there are lots of ways things can break and all of them will eventually break, given enough time to run.

  • TextIOWrapper should probably be made thread-safe to avoid garbaled output with open() or print().

  • Iteration over sequences and dictionaries would benefit a lot from being made thread-safe out-of-the-box or at least come with tooling to make this very straight forward. Yes, you can wrap access in locks or do copies and iterate over those, but this requires a lot of boilerplate code.

  • The interaction of threaded applications with asyncio code needs more care. E.g. asyncio.Queues and asyncio.Locks (et al.) are not thread-safe. It would be a shame to have to write code which needs two locks to make safe: one for asyncio and another one for threads.

  • Thread locks will need to be made more user friendly, e.g. the context managers should feature timeout parameters (which currently only exist on the .acquire() calls). Since we’ll be using these locks a lot, it would probably also make sense to make them fast to use by e.g. having a special syntax for this or have he JIT special case “with thread_lock:” to generate fast PyMutex code.

15 Likes
  • The language reference doesn’t specify any multithreading behavior, and I didn’t want to the HOWTO to make stronger guarantees of future behavior than we currently provide.
  • Our documentation generally does not encourage behavior that is specific to CPython. Implementations like Jython do not have a GIL. I’m not sure what Jython’s behavior is for things like concurrent list.append().

But free-threading is an evolution of CPython, which has always had those guarantees (implied or explicit). So arguing that the language reference or the existence of Jython doesn’t guarantee safe behavior doesn’t feel like playing fair to me (Jython is missing other things that you would never be allowed to remove from CPython, and ditto for the language reference).

Best practices may recommend not relying on e.g. list.append() being atomic, but given that you did spend the effort of making it atomic (but not a[i] += 1) I think users deserve a complete write-up of which built-in operations are intended to be atomic. It will help debugging (“no, it can’t be this list.append(), because that’s atomic”). Maybe it will even help tooling intended to find possible races.

13 Likes

It seems like the difference between implied and explicit is the crux of a lot of uncertainty about free-threading. It’s impossible for most users/third-party developers to identify what’s an implicit guarantee versus an implementation detail.

4 Likes

Maximum RSS is a reasonable rough estimate for what really matters (but not perfect). pyperformance wasn’t explicitly designed for to test memory usage, but it’s not bad as a rough relative yardstick.

For what it’s worth, we are measuring the memory usage of free-threaded builds on a weekly basis. For example, here’s the one from yesterday. Roughly 20% increase geomean. There’s probably something obviously broken on the tomlib benchmark (but let’s not rabbit hole about that here – creating an issue would be better for that…):

benchmarking-public/results/bm-20250315-3.14.0a6±e82c2ca-NOGIL/bm-20250315-linux-x86_64-python-e82c2ca2a59235bc1965-3.14.0a6±e82c2ca-vs-base-mem.svg at main · faster-cpython/benchmarking-public

2 Likes

Indeed, the situation feels more like an alpha testing to beta testing transition to me, so perhaps we could literally use that framing?

  • 3.13: experimental free-threading with alpha level stability
  • 3.14: experimental free-threading with beta level stability
  • 3.15: target for stable (but still opt-in) free threading support
8 Likes

I honestly don’t think free threading will make anything harder in that regard. If it does, then it might be a bug worth fixing, or a misconception worth correcting.

On the contrary, the fact that you don’t have to think about the GIL anymore (in terms of: “does this operation release the GIL so that I can parallelize it over multiple Python threads?”), should make things easier.

There is no “right time to do this”, it’s a regular feature request IMHO. While it’s unfortunate that random is so parallelization-averse currently, it’s not a deal-breaker for proper usage of free-threading in Python.

It sounds like “being comfortable” is a large part of the issue. I understand your reluctance, and it’s a sign that we core developers are scrupulous and intellectually humble (which is generally a good thing!).

We’ve built so much mythology about the GIL that one might get the impression that it’s protecting against all kinds of mishaps. In really, when writing pure Python code it doesn’t [1]. The protection provided by the GIL mostly applies to CPython internals (and even then, we have often been over-confident because the GIL can be released in many more operations that one might imagine).

In your case, I would perhaps recommend you make the free-threading build your default CPython development build and just run with it. Hopefully you’ll end up concluding that it doesn’t make much of a difference in your daily practice.


  1. except when using ctypes perhaps :slight_smile: ↩︎

4 Likes

Any Python implementation worth its salt certainly makes list.append atomic. Too much software would break otherwise. So I would be +1 to make it a language guarantee.

2 Likes

I realise that at the C level special care is needed to prevent memory corruption during append if there is also concurrent read/write access in other threads. At the more abstract Python level though is there a difference between saying that “append is atomic” and the current docstring: “Append object to the end of the list”?

At the Python level it seems that the object either is or is not in the list so I’m not sure how append could not be atomic. In a multithreaded context there is no guarantee afterwards that the appended item actually is in the list if other threads can remove items or that it is at the end of the list if other threads also append. There is no guarantee that a thread concurrently iterating over the list will or will not see the appended item. I assume that if one thread calls list.append and another calls list.extend then there is no guarantee that the appended item would be before or after all of the extended items.

What exactly is guaranteed by saying that list.append is atomic?

A method I wonder about for atomicity is dict.setdefault:

class A:
    def __init__(self, val: int):
        self.val = val

interned = {}

def get_A(val: int) -> A:
    """Return a globally unique A for each val."""
    try:
        return interned[val]
    except KeyError:
        return interned.setdefault(val, A(val))

# Every thread gets the same instance even if
# calling get_A(42) at the exact same time.
obj1 = get_A(42)
obj2 = get_A(42)
print(obj1 is obj2) # True

Can it be depended on that this is atomic so that if two threads try to set the same key at the same time then both threads are guaranteed to receive the same value back from setdefault?

I have library code that uses setdefault default like this (or weakref equivalent). I am not sure that it is appropriate in that code to use threading.Lock partly because I don’t even know whether the code is being used in a multithreaded context. It seems like if setdefault is atomic then it is a useful primitive for making something simple like this threadsafe.

I assume that in CPython either GIL or per-object locks make setdefault atomic. I can imagine implementations where it is not atomic though because you can implement setdefault in Python using the other dict methods but I don’t see how you could make it atomic if you did.

2 Likes

And equally, if setdefault is not atomic, I don’t see how to make this thread safe without adding a Python-level threading.Lock - which I would expect to be a significant overhead for code that may never actually be called in a multithreaded context.

I appreciate that it’s hard to document the thread safety of every part of the stdlib. But without that, I don’t see how it’s reasonable to expect 3rd party libraries to make statements about their thread safety - not only are we not giving them the information that they need to do so, but we’re also expecting them to do something that we’re not willing to do ourselves.

4 Likes

I’m not sure what difference you’re talking about here.

That is unrelated. list.append appends the object at the end of the list at the time the actual appending is done. The fact that some other thread might sneak in just afterwards and mutate the list is unrelated, and is not even GIL-related.

Your get_A certainly isn’t atomic. I have no idea about dict.setdefault itself, but I would be surprised if the implementation didn’t guarantee that.

Can we agree that this is unrelated to the discussion about free-threaded Python? This lack of guarantees really applies to GIL-enabled Python as well. It’s just that you weren’t bothered about it before.