What about multiple trove classifiers for this? I think these will remain useful even later on since there seem to be no plans currently to ever remove the gil fallback entirely for extensions. Below is just a sample set, but there’s reason to communicate to people testing if they are testing in crash anywhere waters or crashes unexpected but possible or if the extension author believes they are fully ready or if there are currently known limitations the user needs to go read about.
I like that, especially if the tiers came with specific suggested criteria.
In drafting suggested criteria, I ended up adjusting some of the suggested tier names, though:
experimental: turns the GIL back on by default. For experimentation and feedback only
beta: leaves the GIL off, documentation of limitations may be incomplete
stable: leaves the GIL off, thread safety requirements are documented
resilient: leaves the GIL off, breaking the documented rules will reliably give Python exceptions rather than segfaults. Silent data corruption may still be possible, but shouldn’t be trivial to provoke.
These could be given numbers (like the development stage classifiers) to make the relative ordering explicit.
When considering ecosystem readiness for stage 2 of the roll-out, we’d be looking for projects with extension modules declaring their free-threading support to be at least stable, and preferably resilient.
If we’re bikeshedding the “multiple classifiers” idea (which I find excellent), my preference is with @Liz 's suggestion. Perhaps shorten it as “unstable / beta / limited support / supported”.
My main concern with the “limited support” tier name is that if we make that tier name too unattractive, folks may be tempted to claim the highest tier before they’ve really achieved it.
Whereas if the names for those two tiers are some flavour of “good” and “better” rather than “worse” and “good”, that temptation should be lower.
(I don’t really mind whether the experimental tier is called “experimental” or “unstable”, although what mild preference I do have is again due to the latter having slightly more negative connotations)
Edit: after posting that, I realised that CPython using “unstable” and “limited” as terms related to API compatibility guarantees likely also plays a part in putting me off reusing them here. I wasn’t consciously giving that aspect any weight, though.
Personally, I also like the idea of the multiple classifiers, and I find the ones proposed by @Liz very clear.
I also like the meanings that @ncoghlan described, with Elizabeth’s naming, and with this modification:
Data corruption can take the form of semantically wrong data corruption, or corruption that can lead to crashes, I think here you’re referring to the former, please correct me if I’m wrong.
With semantic data corruption let me refer to the cases in which a program gets into a state which is correct from the standpoint of the library, but incorrect from the standpoint of the semantics of the program.
E.g. if you have a counter that gets incremented once by three threads, you semantically intend it to end up being 3.
These are the cases which should be handled by the programmer and not the library maintainers, just as it is the case for built-in data types, for instance.
In the counter example it is up to the programmer to wrap the accesses to the shared integer with a lock — this has always been the case.
Maybe another category on top for libraries that explicitly test/defend against cases the maintainer may consider pathological such as the resizing an array whilst it’s being used (which I’d personally consider to be not so improbable to happen organically if numpy met asyncio)? It would be a first step in deciding if a given free threaded library should be allowed anywhere even remotely secure.
We are currently talking about people putting out experimental builds of libraries for use with an experimental build of CPython. If you want to use these things in a secure production environment then it is your responsibility to know what you are doing but I would advise not making the decision based on trove classifiers.
In a few years I’m sure things will have settled down but there will always be a risk of bugs including those that cause segfaults and they don’t necessarily need to have anything to do with using threads.
Oh definitely. That’s why I say first step and not only factor to consider. Just knowing that a maintainer will take a segfault seriously even if it’s caused by doing something dumb is useful information.
I’ve repeated that summary below, but also have a process question: perhaps we should make this a PEP? Defining the trove classifiers is one thing, but we’d also like to see sites like https://py-free-threading.github.io/ updated with guidance on how to set them correctly, and potentially even the SC using them as a data point when deciding whether or not to advance the PEP 703 rollout to phase 2 (as described in PEP 703 (Making the Global Interpreter Lock Optional in CPython) acceptance ).
4 status tiers have been identified so far, different suggested names for some of the tiers are separated by “/”:
Experimental/Unstable: For experimentation and feedback only. Any binary extensions provided for free-threaded builds turn the GIL back on by default.
Beta: Free threaded usage is supported, but documentation of constraints and limitations may be incomplete. Any binary extensions provided for free-threaded builds leave the GIL disabled.
Stable/Limited/Limited Support: Free threaded usage is supported, and associated constraints and limitations are documented. Violating the documented constraints may result in segfaults rather than Python exceptions. Any binary extensions provided for free-threaded builds leave the GIL disabled.
Resilient/Supported: Free threaded usage is supported, and associated constraints and limitations are documented. Violating the documented constraints will reliably give Python exceptions rather than segfaults. Any binary extensions provided for free-threaded builds leave the GIL disabled.
Pure Python packages are permitted to use these classifiers (for example, a pure Python package may add cross-thread locking or subinterpreter invocations such that an underlying Experimental or Beta library effectively becomes Stable or Resilient when used via the free threading support layer).
An explicit numeric ordering (along the lines of that used for Development Stage classifiers) is not currently defined, but potentially could be.
I think it would be clearer to talk about the different levels in terms of enabling or disabling free-threading rather than disabling or enabling the GIL.
If I understand correctly the difference here between “experimental” and “beta” is just whether or not py_mod_gil_disabled is set so that free-threading is enabled by default. I’m not sure that the names really capture that distinction.
When I proposed an “unstable” option above, it was not with intention that extensions turn the gil back on. Those extensions are the status quo and don’t need a trove classifier.
This was meant as an indicator that the maintainer knows it’s unstable and subject to segfault in the normal paths, and it exists for others to test, debug, and contribute during this transitional phase. wheels that exist in such an unstable state should probably only be published as prereleases on pypi to prevent the average user from getting them.
If I had to suggest well defined definitions for these, and people want a classifier that means “this re-enabled the gil”
unsupported: re-enables the gil
unstable: This is known to segfault or cause other major issues of stability with the gil disabled in cases the maintainers consider a supported use. Wheels should not be published in this state in a way that pip or other resolvers will pick them up without user opt-in.
beta: This is not known to segfault or cause major issues when used as documented, but the maintainers are not ready to consider this production ready and believes it isn’t ready to promote to either of the next two
documented limitations: When used only as documented, the extension is expected to not have issues caused by the removal of the gil. The extension will only segfault if used outside of documented usage patterns.
supported: When used only as documented, the extension is not subject to data races. The extension will raise python exceptions rather than segfault, even in cases outside of documented support that are possible to trigger from python code.
The last qualifier here is the reason I suggested that the appropriate way to publish these wheels is to have them turn the GIL back on by default. That way only folks setting PYTHON_GIL=0 will be exposed to the crashes - anyone else will just drop back to running with the GIL active (with the related import warning).
While running the free-threaded build at all is opting in to active participation in the ecosystem level free threading development process to some degree, triggering the import warning is still a nicer way to alert people that a particular library still has relatively immature free-threading support than either not providing wheels at all, or else letting them hit the crashes without any warning.
I understand the idea, but disagree with the approach. If development for freethreading is at a stage where even the intended way to use a library is anticipated to crash the interpreter, users can build it themselves. If users can’t build it themselves, they probably aren’t the right audience for that stage of development.
I don’t think setting an environment variable should result in crashing an interpreter. I don’t see that as an acceptable outcome, and think people are playing way too fast and loose with this if there’s going to be a chance of getting enough feedback from more involved use. I think it’s going to slow down adoption of freethreading if some devs say “just enable this environment var”, and then different devs have different ideas of what is an acceptable outcome of doing so.
Just like code with native dependencies may not support all platforms right away, user and developer expectations should not be that free threading is free to enable. It takes time, it changes the rules of what’s allowed, and it is an active choice when a project is ready to distribute for and support that configuration.
True, the level of wheel adoption has reached a point where falling back to a source build is itself a pretty decent warning that you’re treading on dubious ground.
With that definition, adding the Free Threading :: Unstable classifier would be the project saying “Free-threaded wheels aren’t missing as an oversight, they’re missing because they don’t work (yet)”.
Projects at that stage can also still leave the GIL enabled by default, though. That way, even if the source build goes off without a hitch at install time (which is entirely plausible for extensions which only need a C or C++ compiler), users will still get the GIL compatibility warning at import time.
Trying to summarise the proposed levels in a memorable way:
Unstable: Free threading doesn’t work (without external locking)
Beta: Free threading should work (as far as we know)
Stable/Limited: Free threading works (as long as you don’t poke it too hard)
Resilient/Supported: Free threading works (and you’re doing well if you break it)
With the adjusted meaning, I agree “Unstable” fits the first tier better than “Experimental” does. I still prefer Stable/Resilient for the upper two tiers, but I’d also be fine with Limited/Supported.
A big +1 from me on standardizing useful meanings and providing these. Being able to find which of my dependencies aren’t free threading ready gives me a list of projects I can consider assisting become free-threading ready, and the level of readiness allows me to understand better the level of scrutiny I should be putting on a move to free threading even in the case where I have full trust in an existing dependency’s maintainers.
I’m not sure the names matter as much as the definitions and that people are encouraged to understand those definitions. So the name should probably match the definition, but beyond that, few preferences.
I don’t distribute source dists anymore (people are welcome to grab from git), but if I did, I think the approach here would be to detect a free threading build and check for PROJECT_NAME_YES_IM_READY_TO_CRASH or some other suitably alarming, yet project scoped env var, and conditionally allow building for 3.13t (With gil not disabled, for dev purposes). the env var based solution at runtime impacts other libraries too, and I wouldn’t encourage people to forcibly disable the gil globally.
From my perspective the the whole point of putting out wheels for the free-threading build right now is so that people can try them with free-threading. Likewise the whole point of a user installing the free-threading build of CPython is that they want to test free-threading. I don’t see what the point is in giving them wheels that disable free-threading by default.
I don’t imagine that many package authors have any intention of distributing packages that disable free-threading in the long term so putting out wheels to test that configuration does not really make sense. You can still test the package with free-threading disabled if you want by setting PYTHON_GIL=1.
So yes you could put out wheels that disable free-threading by default but what would be the point?
If you were particularly interested in collecting feedback on the mode where you have a free-threading build but free-threading is disabled at runtime then maybe it would make sense. That is not the intended mode though either now for testing or later for general use.
I am not saying that it is bad to have the mode where free-threading is disabled at runtime or that extension modules build like that by default. Right now a pip install will download and build extension modules for the free-threading build that have never been built or tested in that configuration. It might even be that import proj gives an immediate segfault in any multithreaded situation. It is reasonable that free-threading is disabled by default when building packages that have not seen any testing or update since CPython 3.13 was released.
I just don’t really see why anyone would actually build and upload cp313t wheels that disable free-threading right now. If there were known problems that were likely in reasonable use then I would either not upload wheels at all or fix the problems first.
One reason to distribute experimental wheels is to reveal the unknown unknowns though: you might be wrong about what is the main source of problems or their likelihood, severity etc. You need people to test the actual free-threading mode to reveal this though so again it doesn’t make sense to disable free-threading by default.
Yeah, Liz also made a solid case for adopting that point of view: if a project’s free threading support is so immature as to still need the GIL enabled, then make it require a source build rather than publishing pre-built wheels that still disable the GIL.
Even better would be if there was a way to say explicitly that it should not even be built form source like requires-python: not-free-threaded. For most users a clear install-time error message would be better than a build failure or some trove classifiers.
There is a lack of packaging metadata around the free-threading build e.g. currently we want to put out a release with cython >=3.0,<3.1 which is the right version constraint for the non-freethreading build but for cp313t we will need at least cython >= 3.1 i.e. a version that does not exist yet. Maybe I have missed it but I don’t see how to give different version constraints for the two cases.