For what it’s worth I have an “application” that would definitely be interested in the nogil work. It’s a bit of an odd case, though, so I’m not sure how relevant it is. But I’ll add it here just in case it’s of interest.
My use case is an ongoing project of mine to try to demonstrate that having a high-level language can often be a better choice than a lower-level one, simply because higher-level languages make it easier to use more powerful abstractions, which might be impractical in lower-level languages. As a result, runtime can be faster even though in an absolute sense, the lower level language has better performance.
In this particular context, I’m looking at Monte Carlo simulation of games of chance. So the basic workload is to generate a few millions of random game states, calculate a “score”, and aggregate the results. Performance is crucial here, because I’m trying to compete with a (more or less) hand-coded C++ program. At the moment Python sucks in that comparison, because there’s no way Python can compete with C++ on raw calculation speed. But my big advantage would be if I could run the “calculate a score” step across multiple CPUs, which gives me a significant speed boost over the single-threaded C++ code. The C++ developer has already said that rewriting his code to use multiple threads would be impractical, and he doesn’t need to because the single-threaded code is fast enough. For me, though, a factor-of-n improvement through parallel execution could allow me to be competitive in this (highly biased against Python) comparison.
The key here is to be able to spread the workload across multiple cores, without additional complexity. Threads would be great for this if the GIL didn’t prevent Python code from running concurrently. However, the “calculate a score” code is essentially an arbitrary, user-supplied calculation, and as such requiring that code to be thread-safe or re-entrant isn’t acceptable (it violates the requirement to demonstrate that high-level languages give easy access to advanced constructs, as well as impacting the performance of code that’s right in the hot loop of the program).
I believe that this type of application would benefit from subinterpreters because the isolation properties of subinterpreters handle the need for safely running arbitrary code. I think there’s still some work to add appropriate co-ordination strategies for subinterpreters, but that’s a relatively straightforward matter of API design.
I believe that nogil won’t help directly with this problem, because in its raw form, it exposes the co-ordination issues with free-threading to the user. But I believe it offers the opportunity to build additional, safe and high-level abstractions for concurrency which would help - at least as much as the subinterpreter model, and quite possibly even more.
If there’s a follow-up plan with nogil to develop high-level concurrency abstractions, then IMO it will be a huge step forward for Python (and likely worth some level of performance cost in the short term). But given that we’re currently struggling to even work out how the transition to a “multi core by default” Python ecosystem will work, I fear that it’s premature to base my expectations on something like that happening any time soon. So for the short to medium term, I see subinterpreters as an important practical solution for me, with nogil being something that would be a much longer term benefit (for my use case - the fact that it’s an immediate benefit to other people is a relevant but separate point).