Yes, because most of the C code that I write leans heavily on the GIL for its own sanity. Given the choice between implementing my own fine-grained locking vs. assuming/hoping/dreaming that someone will make ctypes totally safe, I’d probably rest my hopes on someone adopting ctypes maintenance. (I’m not doing perf-sensitive stuff mostly, it’s OS integration.)
Revisiting this thread and had some additional thoughts/questions [1]
I’m curious why? Do you think some people will prefer to keep the GIL around for simplicity, or that single-threaded performance will always remain better with it enabled, or some other reason?
I would think the ideal would be “no GIL, but also no performance hit for single-threaded code”, in which case there’s no need for a runtime mode. But maybe that isn’t possible [2]
Separate builds is a big infrastructure challenge [3] , but from a user perspective it’s nice that it makes compatibility crystal-clear: if you’ve released a wheel for nogil, you’re claiming compatibility. Whereas with a runtime flag, I might mix together compatible and incompatible extensions without realizing (even if there are warnings and documentation and so on, people will do this).
This is all to ask: can people imagine a transition of a split build for a release or two, and then no GIL thereafter (i.e. skip the runtime flag)? To me this seems like the most desirable outcome, and potentially a simpler transition, both for users (if you can install it, it should work) and for developers (release normally for 3.X, develop/test on 3.X-nogil, and be ready for 3.X+1 which is nogil by default).
Not necessarily. For instance, will pure Python projects be expected to declare they support free threading?
@colesbury Speaking of packaging and declaring support, after re-reading the PEP this weekend I assume the expectation is we will have to define an abi4 or abi3t to signal a stable ABI that supports no-GIL (FYI I prefer t or n to signal “threading” over “no GIL” as an ABI suffix)?
The PEP also mean extension module wheels will double the number of CPython wheels they produce if they want to support both with and without a GIL, correct?
And what’s your suggestion on how to tell if you are running a CPython with or without the GIL? Via sysconfig or some other mechanism (i.e., I didn’t see any mention of some change to sys to expose this)? Up until this point you could tell e.g. CPython from PyPy by the binary name as well as the feature version, but that isn’t the case here.
I was assuming that a pure Python project is not going to have issues. This is almost the definition of nogil “working” at all, in my eyes: correctly run Python code. As I understand it [1], if a pure Python project is working now with the GIL, it shouldn’t have a problem without it. As H Vetinari pointed out, it’s totally possible that there are some deep assumptions somewhere in CPython that make this untrue, but it doesn’t sound like they’ve been uncovered by the testing so far.
For more complicated dependency graphs, it seems like there’s a transitive property. If I need A, and A depends on B, … and Z is a C extension that hasn’t been released for nogil, then I can’t install A because pip won’t resolve the dependencies.
If we imagine there were no compatibility concerns at all, and it was just a breaking change to the ABI that required all extensions to be rebuilt, that would be the scenario, right? Everything in pure Python still works but anything that depends on a compiled extension has to wait until it’s available.
Hmm, if you can’t tell from the running interpreter that it’s a nogil build, how can packaging build backends know to build a nogil version of a package for that interpreter? Pip, for example, will try to build a wheel if one doesn’t exist - and it does that using default build settings, so there’s a built in assumption that “build a wheel” will by default build one that’s compatible with the running python.
Also, packaging tools need to know if the running python is nogil in order to determine what wheels are compatible with this interpreter…
This point is crucial – is pip and related infrastructure ready to serve a dual-ABI world? That’s also breaking an ancient assumption. I brought it up in the other thread, but didn’t get much response.
Given the situation in the packaging space (maintainers far above capacity, huge amount of complexity / tooling diversity, many long-standing & severe painpoints, lots of legacy cruft, etc.), I doubt that trying to push such a huge change through this bottleneck will be helpful[1], i.e. this should factor into the decision of whether to aim for parallel ABIs or version-separated ones (i.e. Python 3.x=with gil / Python 4.x=nogil).
It doesn’t need “deep assumptions somewhere in CPython” for pure python code to be possibly affected.[2]
Yes, and that’s entirely reasonable. If you depend on Z, and it’s not nogil-safe, you cannot use it in a nogil-only context without stuff potentially blowing up. The situation does explain the desire towards the runtime switch though (or more generally, running gil & nogil code in the same interpreter). If that encapsulation could somehow be made fool-proof, then you could just[3] run Z with the gil, but keep all the other dependencies running without it.
To be clear – I want nogil to succeed – but I’m trying to point out how large the surface for potential problems is (which explains a large part of the hesitation in this thread). There’s also a severe mismatch here (well, even more than usual) in how much the natural desire of users for an obvious improvement will translate into a huge chunk of work for volunteer maintainers all over. It’s great that there’s so much excitement about it, but more often than not, that excitement will turn into impatience rather than someone assisting the maintainers in climbing down some really deep debugging rabbit holes.
but then, as someone who’s very involved in packaging, I acknowledge my biases on this ↩︎
In a footnote because I’m repeating myself: it’s enough to rely on non-trivial third-party dependencies, or do some things (like IO) where threading might change observable behaviour (especially losing some implicit ordering in handling external resources like files, opening the possibility for data races, concurrent access problems, etc.); aside from the fact that even pure python libraries might not be ready for being called in parallel from other, now suddenly nogil, python code. ↩︎
I’m trying to understand the boundaries of that surface, partially because I’m just curious but mostly because I think defining that surface more precisely would be useful for this discussion. It’s somewhere between “everything could break” and “extensions using the ABI need an update” and narrowing down that range is helpful.
To be clear, I think this is a feature of having a different ABI, not a problem[1].
I’m just trying to understand the precise failure modes here [2].
If my code doesn’t use any threading now, it shouldn’t change behavior as long as the dependencies work. nogil Python isn’t going to change the ordering of what I’m doing in a single-threaded program.
If my dependencies work and they aren’t compiled extensions, they should continue to work whether or not they use threading, because a broken use of threading is already broken. The existing threading tools allow for arbitrary execution of different threads, and anyone using them has to account for this with the appropriate design, locks, etc.
If I need dependencies with compiled extensions, they might have issues with nogil, but they will need to be updated and released for nogil before I can worry about them.
Regarding “this library isn’t safe to call from a multi-threaded context”, it seems like that would have to already be true, and it’s just that no one is doing it right now. In my opinion that doesn’t mean the package is broken by this PEP, but the interest in adding this feature would increase. I already encounter this all the time in the multiprocessing context [3]. It can be annoying but I don’t usually consider it a bug in the library.
If there are issues beyond the above, I guess that’s what I meant by “deep”, in that they are not something the package developer realized they were relying on and they were (probably?) not documented behavior. If there are data races lurking in CPython that are exposed by (otherwise correct) multithreading code [4], they have yet to be found by Sam Gross or the others who have been using nogil thus far. Therefore, pretty “deep”, in my eyes. That’s not to say they don’t exist.
I understand the hesitation but it’s also a little funny to me. It seems…good? to introduce language features that are so popular that users clamor for them to be implemented. That seems like the sign of a good PEP, and more generally the sign of a good major version.
For what it’s worth, I solemnly swear to help debug/contribute PRs any of my dependencies that need to be adapted.
the packaging question is a good one though, and strengthens the argument for version separation ↩︎
at the very least, I need to understand this for my own code! ↩︎
because many objects can’t be pickled and passed to subprocesses ↩︎
and again, this should cover compatibility for any set of pure-Python packages ↩︎
I’d expect an abi4 by the time we switch to a single build, but not immediately.
Extension modules would produce two wheels per platform for 3.13 but that would not lead to doubling the total number of wheels, because extensions typically build for multiple previous Python releases. My expectation is that the two build modes would only be for 2-3 CPython releases, as described in the Python Build Modes section. (And based on comments, I think a single build mode after 2 releases is a better goal than after 3 releases.)
C code (compile time): The Py_NOGIL macro is defined.
Build: For building extensions, etc., the appropriate information is automatically propagated to sysconfig variables when the ABI flags are changed. These are used by setuptools, for example. (e.g., sysconfig.get_config_var('EXT_SUFFIX')).
Run time: The reference implementations provide the sys.flags.nogil to indicate whether the GIL is enabled at runtime, but that is not specified in the PEP.
So in the interim, when there are two build options, and projects will want to publish wheels for both nogil and gil builds, what would the appropriate wheel tags be for the nogil build? The tags are Python interpreter, abi and platform, as defined here. If there isn’t a different abi, which tag will distinguish?
As described in the Build Configuration Changes section, the ABI tag includes “n” for “no GIL”. For example, cp313-cp13n-PLATFORM. (Or it could be “t” for “threading” if Brett prefers.)
I guess there’s a bit of misunderstanding here. With nogil, you can actually leverage multithreading for performance, which implies more people will probably start to use threading than today, which also implies there will be some amount of pressure on existing pure-Python libraries that aren’t thread safe today [1] to change in this respect. Therefore, even the behavior of pure Python code does not substantially change, we can expect to see a shift towards multithreading and it could be useful to have, e.g., a PyPI classifier “Tested in multicore scenarios” [2].
Ultimately, though, what the expectations from the wider community towards libraries will be is not something the core team has control over.
Personally I don’t view these possible expectations as a problem; it is similar to how you don’t need to add type annotations to your code if you don’t want to (and it still works the same) but some users will request them and if you don’t want them, you’re free to just turn down the request. I think the ecosystem has lived fine with this principle on typing so far.
For some definition of “thread-safe”. Of course, pure Python code doesn’t cause UB, and won’t with nogil either, but it can exhibit buggy behavior in multithread contexts. ↩︎
Probably. As long as the right things are exposed in the right places it will just fall through via sysconfig and other packages that have to figure this stuff out.
Correct, that’s more my point. For instance, I have never once concerned myself to make any of my code up on PyPI be thread-safe, and in a no-GIL world I will very likely need to start caring.
I admittedly don’t share your optimism that the transition could happen in 2-3 years/releases, so in my head this takes 5 years and thus you do end up in a position where it’s doubled. Regardless, the key point is it’s increasing the number of wheels, not decreasing.
I doubt that similarity holds. Typing will stay optional in perpetuity (unless Python radically changes its character), while nogil is clearly aimed at becoming the new default.
Also, typing annotations are a somewhat subjective trade-off for an improved development experience, which is (IMO) nowhere near the level of impatience that tantalizingly unrealized performance gains inspire in people.
Do you really think numpy / pandas would be free to “just turn down the request”, or the reaction if that were to happen? It’s a hypothetical example because the dev’s there are working very hard to stay abreast of all CPython changes (and Sam already ported them for his fork), but for moderately widespread packages, I think the pressure will be enormous.
I’ve been thinking about how to reduce that pressure (and the runtime-switch discussion), and was wondering if there could be a mechanism for packages to opt-out of nogil completely, even if they publish wheels built for the new ABI? E.g. a module-level setting along the lines of “all calls in the namespace of our package need to be protected by the GIL, even if the rest of CPython and other libraries aren’t”.
That way, users would be able to install everything they want, and those packages declaring themselves nogil-ready would be able to realize those benefits, while those packages that aren’t ready would have time to work on it without depriving users of the gains in other libraries.
Basically, trying to uncouple the new ABI (+ the multi-threading safety review, etc.) from the dependency requirements of users – making it less “all or nothing”, which to me sounds like a recipe for frayed tempers.
We aren’t on the same page. My post that you are responding to is about pure Python packages, for which caring about correctness in multithreaded contexts will definitely remain optional, while your concerns are about C extensions.
I’m cognizant of not going in circles here. I believe a key point in this discussion is “how optional is nogil”, with several voices in this thread overestimating (IMO[1]) how easy it will be for projects to opt out, resp. for the broader infra to handle and the ecosystem to digest.
FWIW, I believe the parallel to typing (re: opt-in) came up for the first time, at least in this thread. Similarly, I don’t think someone mentioned a module-level opt-out of nogil as a thought experiment yet.
while I’m not a CPython developer, I’m a SciPy maintainer, and very involved in conda-forge for many years (regularly across 100+ packages), so I don’t think my concerns fall under pointless bickering. ↩︎
I think the PEP would be helped by making the point that pure Python code will not require changes much more central. I understand you are saying this “by omission” in the Backwards Compatibility section, buried deeply in the PEP, but a lot of the confusion in discussions seems to come from a vague idea that the GIL offer protection at the Python level. So perhaps the PEP could explicitly, and prominently in the abstract, explain that pure Python code which currently isn’t worried about concurrency will not have to worry about concurrency under the proposed changes.
That would enable the GIL for everyone, right? I think Vetinari was asking if a given module could say “I need the GIL whenever I’m running, but you can turn it off the rest of the time”
That sounds more complicated than the existing proposal, to me, but it would certainly be very useful if possible.
Daan and I have talked about this by email and Skype and he’s offered to help with the issues and extensions on his side. We haven’t gotten into all the specifics and I don’t want to impose on Daan’s time while the SC still hasn’t decided if they want to pursue this PEP. I can ask Daan to comment here if you’d like.
I’ll need to think about what those microbenchmarks would look like. A number of people have written about using it in real applications, and I’ve also discussed scaling on a real application in my EuroPython talk. Here’s a link to the relevant section of the talk. I think those are more useful to understand scaling capabilities than microbenchmarks.
In terms of high-level limitations, the most common bottleneck is reference counting contention, but the extent to which it comes up depends on the application.
A note from the Faster CPython project in response to Greg’s question about bytecode specialization and optimization in a nogil world. Our ultimate goal is to integrate a JIT into CPython, although this is still several releases away (most optimistically, an experimental JIT could be shipped with 3.13).
We’ve had a group discussion about how our work would be affected by free threading. Our key conclusion is that merging nogil will set back our current results by a significant amount of time, and in addition will reduce our velocity in the future. We don’t see this as a reason to reject nogil – it’s just a new set of problems we would have to overcome, and we expect that our ultimate design would be quite different as a result. But there is a significant cost, and it’s not a one-time cost. We could use help from someone (Sam?) who has experience thinking about the problems posed by the new environment.
I expect that Brandt will post more details, but the key issue appears to be that much of our current and future designs use inline caches and divide the execution of a specialized bytecode instruction into guards and actions. Guards check whether the cache is still valid, and deoptimize (jump to an unspecialized version) when it isn’t. Actions use the cache to access the internals of objects (e.g. instance attributes or the globals dict) to get the desired result faster. An important optimization is guard elimination, which removes redundant guards. This is performed before machine code generation (JIT).
Free threading complicates the design of guards and actions. In 3.11 and 3.12, the GIL ensures that the state of the world doesn’t change between a guard and its action(s), so that the condition checked by the guard still holds when the actions run. Our plans for 3.13 include splitting bytecode instructions into micro-ops (uops), where each uop is either a guard or an action. But with free threading, it is possible for the world to change between a guard and a corresponding action. This can lead to unsafe use of an object’s internals, producing incorrect results and even crashes.
Solving such problems with locking would likely be slower than just not specializing at all, so we will need to be cleverer. We’re entering uncharted territory here, since most relevant academic papers and real-world implementations use either a statically typed language (Java) or a single-threaded runtime (JavaScript).
A relatively speedy decision timeline would benefit us, since our schedule for 3.13 and 3.14 will be quite different depending on what happens with nogil. If nogil is accepted, we’ll have to prioritize making sure that the specialization implementation is thread-safe, and then we have to design a new, thread-safe approach to a JIT.
It looks like Sam left us quite a bit of work regarding the thread-safety of the current specialization code (i.e., what’s already in 3.12). Brandt can explain it better. Even if we decide that for now we’re better off just not specializing when CPython is built with nogil (since it will be used mainly to create multi-core apps), that’s only a temporary measure that can buy us 1-2 releases, but I don’t expect that we would continue work on our current JIT plans that depend on a GIL; instead, after salvaging the existing specializations, I expect us to go back to the drawing board and come up with a new plan. This will set those JIT plans back by 1-2 releases, unless additional funding appears.
In the meantime we’re treading water, unsure whether to put our efforts in continuing with the current plan, or in designing a new, thread-safe optimization architecture.