I can’t speak for the other authors, but I certainly don’t expect anyone to do anything (but perhaps that says more about my expectations of people in general). Whether pure-Python packages are automatically thread-safe to the extent the package maintainers want it to be thread-safe is very difficult to say. “Probably”, if the package doesn’t have global state and users don’t expect to be able to use other state (like state in instances) to be meaningfully usable from multiple threads at once. Ultimately free-threaded Python does not change that.
But certainly there are no expectations on anyone but CPython developers in PEP 779, because that’s not what the PEP is about. It’s about whether the free-threaded design is stable enough, and tested enough, to stop pretending it can break without notice.
I’m unclear what this addresses. Are we talking about stability in CPython, or what we tell users they are allowed to demand of library authors? Because we’re already at beta-level stability. The point of PEP 779 is that we’re ready for stable (but still opt-in) free-threading support in CPython. I do not think pretending it’s not stable is a good way to move forward.
So far, the objections to calling it supported that I’ve seen are about how users who don’t actually understand the relationship between CPython, Python, and Python packages would perceive the status in the larger community. Does it help if instead of ‘supported’ we just drop “experimental”, possibly expanding that to “the regular deprecation policies now apply”? I still want CPython itself to consider it supported – including Core Dev buy-in – but if “supported” is the word that somehow causes users to appear at package maintainers’ doorsteps with pitchforks demanding that support, we don’t have to use that specific term.
Sure! What changed my mind is seeing the benefits in real programs, actually working to make code thread-safe, and seeing the enthusiasm for this feature from anyone who worked with it.
I wish there was more concrete work to share to show the massive potential of free-threading. To be clear, I was never skeptical about the benefit of free-threading, but seeing actual programs that were 10 times more efficient still blew my mind. That’s not a micro-benchmark by any means, that was something that switched from using multiprocessing to using threads for a big workload. On the kinds of programs I’m talking about here, 1% is considered a big deal. Other examples include the numpy PR @ngoldbaum linked to earlier (https://github.com/numpy/numpy/pull/27896 – keep in mind the graph is log scale), which shows a ~20x speedup. I didn’t expect to see that big a difference, and I didn’t expect it to be this easy to achieve in the kinds of programs that would benefit from it.
The other thing that I didn’t expect was how easy it would be to make things work. I did not truly realise how simple Sam’s design is to the end user. (Not Sam’s fault, PEP 703 does try to convey this, I just didn’t listen :P) The way it uses the fact that existing Python C API code has to deal with the GIL being released at certain (many!) points, to provide an eventually-consistent view of the world, to provide non-deadlocking critical sections, and to provide lockless access to things like lists and dicts, is incredibly clever and elegant. It means almost all of the assumptions made by existing C code just work.
Another thing I hadn’t really appreciated about making things work right: it’s very easy to test thread-safety in a free-threaded build. In a GIL build it’s often very difficult to trigger thread races, just because of how the timing usually ends up. You can write code that you think is safe, passes all the tests even when you throw a few hundred threads at it, and then you see it crash in 0.01% of the calls when you push it to production. It’s much easier to trigger races in a free-threaded build, which makes it much easier to find them too. Honestly, the free-threading-specific things I’ve worked on were way easier than understanding the dict implementation, or the current bytecode interpreter(s).

they’re working full-time on maintaining the free-threading build right now
Oh god no. We do very little maintenance. Everything right now is development (mostly on performance). The maintenance cost of free-threading should be very low, all things considered. It’s highest for things like lists and dicts, which have fairly intricate mechanisms for lock-free access that rely on co-operation from the memory allocator (which is why we use mimalloc). Changing the internals of those types, yeah, that became more complicated. For the most part, the complexity in CPython is all hidden away in a few basic building blocks, and the basic rules about thread-safety are pretty simple, even if we haven’t written them all down yet. (That’s definitely on the short-term TODO list.) Ongoing maintenance on the free-threaded implementation is not going to cost any kind of full-time commitment.

Declaring the implementation stable is fine, but it’s way too early IMO to determine the overall outcome of the nogil experiment on an ecosystem level
Yes, that’s fair. PEP 779 is not meant to determine the overall outcome of the experiment. It’s only about the implementation being stable, and about it not going away without the usual deprecation cycle.

Maybe part of the issue here is a difference in how we’re interpreting these terms like “experimental” and “supported”. Like, your statement above gives me the impression you see “supported” as closer to “experimental” than to “default”, but I would read “supported” as very close to “stable enough to be the default”. That means we shouldn’t call it supported until we’re within sight of making it the default.
I see “supported” as mid-way between “experimental” and “default”, because we don’t have any meaningful scale. But I’m not hung up on the term, it’s just the term the SC used in its acceptance. I am hung up on the term “experimental”, because it leads to incorrect expectations. The free-threaded build isn’t breaking in incompatible ways, and it’s not just going away. It’s not a toy, an experimental idea, an unproven attempt to solve the underlying problem. Even in 3.13 it’s none of those things, although in 3.13 the performance cost was much, much higher (40%, if I recall correctly). It certainly isn’t any of those things in 3.14.
The reason I’m adamant that we do not call it experimental is because we need the community to move with us, in order to shape things like documentation, APIs, utility functions, ways of using threads. Not just library authors, but end users too. We can’t implement everything in Python, declare it done, and then have libraries start using it, add their free-threaded Python support, declare it done and then have users use it. The feedback cycle between users and package maintainers and CPython is critically important to get everything in a good shape. We cannot get to “stable enough to be the default” without users actually using it, and we cannot expect users to use it if we say it’s not supported.
And yeah, I think this means users will come to package maintainers and ask “when free-threaded Python support?”. I think they have to, because otherwise how will package maintainers know their users care? How will they know what users want from the free-threaded support? How will they know what they’re missing from CPython to make their ideas work?