If we were to transition to nogil mode, the biggest concern for users would be extension modules. While Python code already has to deal with preemptive scheduling, C extensions have always seen their data structures protected by the GIL, and this would go away.
In particular, consider a hypothetical template extension that lets you create Template objects and call their expand() method. The expand() method doesn’t modify the user-visible state of the template, but it might cache some data on the Template object (for example, a cached parse of the template string into an AST that is created on demand). What might happen if an app creates a Template object and shares it between threads, each of which might call its expand() method. Potentially two threads might call expand() on a “virgin” Template object; then both would parse the template string, and both would write the AST in the internal cache field of the Template object.
That’s probably just a memory leak, and it’s easily avoided. But we could easily imagine a scenario that updates several fields and is subject to a more serious race condition – one that could potentially cause a segfault or (worse!) subtle data issues in the application (imagine accidentally revealing one user’s data to another whose request runs concurrently in a different thread).
None of this should stop us from introducing nogil mode. But we might think twice before we adopt Sam’s idea of assuming that extension modules are thread-safe by default. Perhaps we could have extensions declare when they consider themselves threadsafe when they are imported (default false), and give the user control over what happens: a flag or env var could select between the following:
- GIL mode (most backwards compatible),
- nogil mode (fails when an extension is imported that does not declare it is threadsafe), and
- unsafe nogil mode (allows using extensions even if they don’t declare themselves thread-safe).
Unsafe mode should probably print a warning when a non-thread-safe extension is imported, so it’s easier to know whom to blame. (The standard -W flag could be used to suppress those warnings.)
Note that the core runtime and the stdlib are assumed to be thread-safe here. That’s not the user’s responsibility but ours, and I trust that Sam has already done a good job here. (And e.g. mimalloc helps too.)
Of course, there might be other options, such as a GIL-like lock that is only acquired before entering a non-thread-safe extension module (there were some clever ideas for this on Discord).
We might also consider the possibility that extensions might lie about their thread-safety, either by accident (the developer looks over the code, fixes some obvious race conditions, tests everything thoroughly, and then declares it thread-safe, but somehow has overlooked a subtle problem), or intentionally (the developer is lazy and believes their users should just write code that avoids the race conditions in the extension).
If the developer does due diligence before declaring their extension thread-safe but accidentally overlooks a race condition, so be it. Nobody’s perfect, users will hopefully report it and it can be fixed in the extension’s next release.
If developers lie about their extension’s thread-safety, they do their users a disservice – we know from long experience in many languages with free threading that most users don’t know how to avoid (or even recognize!) race conditions, and Python users aren’t any difference. So let’s assume this doesn’t happen.
Then why have unsafe nogil mode? Because there are exceptions to the rules, and there are certainly users who do know how to avoid race conditions. For example if you have data that you can cleanly partition into separate chunks to be processed separately by multiple cores, and some library you use is not yet thread-safe, but you know your usage of it does not depend on the library to be thread-safe. This is a compromise, for sure, but it will help some folks, and hopefully it will not be too attractive for those folks who shouldn’t use it. It should definitely have “unsafe” in the name of the flag or option used to enable it, so users know the ice is thin.