But while single-threaded performance improvements automatically benefit every Python program that needs more performance, multi-core only benefits those folks who are able to rewrite the performance-critical part of their application to benefit.
Or do you see us changing the language to benefit from multi-core? Even if this took the form of new primitives (e.g. a parallel for or map) that would require application developers to modify their program (not every for can be parallelized without changing the program’s meaning, and a compiler that can reason about this in a Python context seems like research project).
Of course, until now, two alternatives (viable or not) have been writing some libraries (e.g. numpy) in C or C++ and using multi-core at that level, and (for certain types of applications) multi-processing. In 3.12 we’re already adding subinterpreters with their own GIL to the palette.
It’s not so much that the work cannot be done in parallel. The problem is more that in a GIL-free world the work on single-threaded performance requires a different approach (see e.g. Brandt’s post in the other thread). The experts (not just Mark, but also several academic folks whom I asked for advice) seem to agree that this different approach is not just different, it takes more effort, and there is less previous work we can borrow.
This means that it would be helpful to know sooner rather than later what the SC is going to decide: If the SC decides to keep the GIL, the best road to the best single-threaded performance is to continue the work that Mark and the rest of the Faster CPython team have already planned – if we keep the GIL, we don’t need to worry about other threads invalidating our caches, versions and what have you. OTOH, if the SC decides to accept free-threading (whether in the form of PEP 703 or some variant or alternative), we should stop the current work and start redesigning the optimization architecture to be truly thread-safe. And we should seek additional funding (or accept that we won’t get even close to the 5x in 5 years goal for single-threaded performance).
I understand that this just increases the pressure on the SC, which I know you don’t need (if y’all resign under the pressure like I did in 2018, where would we be?
). But I worry that you might be betting on hope as a strategy: choosing Mark’s option (2) and hoping that the demand will lead to (3) – exactly what Mark says is a mistake.
Like Mark, I hope that you’re choosing (3) – like Mark says, it’s clearly the best option. But we will need to be honest about it, and accept that we need more resources to improve single-threaded performance. (And, as I believe someone already pointed out, it will also be harder to do future maintenance on CPython’s C code, since so much of it is now exposed to potential race conditions. This is a problem for a language that’s for a large part maintained by volunteers.)