Nogil mode and extensions

If we were to transition to nogil mode, the biggest concern for users would be extension modules. While Python code already has to deal with preemptive scheduling, C extensions have always seen their data structures protected by the GIL, and this would go away.

In particular, consider a hypothetical template extension that lets you create Template objects and call their expand() method. The expand() method doesn’t modify the user-visible state of the template, but it might cache some data on the Template object (for example, a cached parse of the template string into an AST that is created on demand). What might happen if an app creates a Template object and shares it between threads, each of which might call its expand() method. Potentially two threads might call expand() on a “virgin” Template object; then both would parse the template string, and both would write the AST in the internal cache field of the Template object.

That’s probably just a memory leak, and it’s easily avoided. But we could easily imagine a scenario that updates several fields and is subject to a more serious race condition – one that could potentially cause a segfault or (worse!) subtle data issues in the application (imagine accidentally revealing one user’s data to another whose request runs concurrently in a different thread).

None of this should stop us from introducing nogil mode. But we might think twice before we adopt Sam’s idea of assuming that extension modules are thread-safe by default. Perhaps we could have extensions declare when they consider themselves threadsafe when they are imported (default false), and give the user control over what happens: a flag or env var could select between the following:

  • GIL mode (most backwards compatible),
  • nogil mode (fails when an extension is imported that does not declare it is threadsafe), and
  • unsafe nogil mode (allows using extensions even if they don’t declare themselves thread-safe).

Unsafe mode should probably print a warning when a non-thread-safe extension is imported, so it’s easier to know whom to blame. (The standard -W flag could be used to suppress those warnings.)

Note that the core runtime and the stdlib are assumed to be thread-safe here. That’s not the user’s responsibility but ours, and I trust that Sam has already done a good job here. (And e.g. mimalloc helps too.)

Of course, there might be other options, such as a GIL-like lock that is only acquired before entering a non-thread-safe extension module (there were some clever ideas for this on Discord).

We might also consider the possibility that extensions might lie about their thread-safety, either by accident (the developer looks over the code, fixes some obvious race conditions, tests everything thoroughly, and then declares it thread-safe, but somehow has overlooked a subtle problem), or intentionally (the developer is lazy and believes their users should just write code that avoids the race conditions in the extension).

If the developer does due diligence before declaring their extension thread-safe but accidentally overlooks a race condition, so be it. Nobody’s perfect, users will hopefully report it and it can be fixed in the extension’s next release.

If developers lie about their extension’s thread-safety, they do their users a disservice – we know from long experience in many languages with free threading that most users don’t know how to avoid (or even recognize!) race conditions, and Python users aren’t any difference. So let’s assume this doesn’t happen.

Then why have unsafe nogil mode? Because there are exceptions to the rules, and there are certainly users who do know how to avoid race conditions. For example if you have data that you can cleanly partition into separate chunks to be processed separately by multiple cores, and some library you use is not yet thread-safe, but you know your usage of it does not depend on the library to be thread-safe. This is a compromise, for sure, but it will help some folks, and hopefully it will not be too attractive for those folks who shouldn’t use it. It should definitely have “unsafe” in the name of the flag or option used to enable it, so users know the ice is thin.

6 Likes

I’m not fully convinced it will be that simple either. I think we may have to pick representative extensions out there, port them to nogil mode, and see what the diff looks like. That will hopefully inform us a bit better about the potential difficulty overall for the community to make such a switch. I don’t think we will ever get close to an accurate estimate of overall impact since we have no way to know how common the various “kinds” of extension modules there are. But at least we can know if porting is hard for all forms that extension modules take, or only some.

That seems to cover all the scenarios. The trick is testing all of that and for how long do we keep the options around?

I personally think whether we do nogil is going to come down to the difficulty of updating extension modules. I think Sam’s work shows the technical side within CPython can be solved to an acceptable level, but it’s really the impact on the community that’s going to make or break this.

4 Likes

We’d need only minimal testing of unsafe nogil mode as part of the core test suite, since it works exactly the same as regular nogil mode except for not refusing to load default-unsafe extensions.

+1 on the rest.

I sent him a question directly about this (before seeing your Discord thread). I don’t understand how it’s safe to make that kind of assumption. Maybe most extensions happen to work if called concurrently from multiple threads. I agree with Guido that it seems a bad assumption to make. You might have crashes or very hard to debug issues, due to non-thread-safe extensions.

Can we have a single lock, like the old GIL that we take before entering any extension that’s not marked as nogil safe? Obviously you lose concurrency once you start using those extensions but it seems the safer approach. I thought Sam said something like that kind of simple one lock design doesn’t work. I don’t know why though.

I think he was talking about a design where you keep the GIL acquire/release calls in old modules, and remove them from new modules. And he’s right that that wouldn’t work – it stops old modules from racing with each other, but would let old modules race with new modules that are touching the same state.

This is fixable though, AFAICT. You’d have to convert GIL into a shared/exclusive (“reader/writer”) lock. Then nogil-friendly code would still take the GIL, but in shared mode; garbage collection and old code would take it in exclusive mode. That way when old code is running, it locks out all other threads from accessing the interpreter state, whether old or new.

I see the problem. But can’t you easily deadlock with the reader/writer lock? E.g. if a safe extension, while holding the read lock, calls something that enters an unsafe extension that tries to acquire the write lock.

Edit: I think I know the answer, the “safe” extension should drop the read lock if it’s going to re-enter CPython or other extensions.

There are also rwlock designs that let you “upgrade” a read lock to a write lock.

How big of a problem is this? On first glance a nogil mode would increase the likelihood of data races actually happening, but in general a lot of C API calls could already result thread switches or reentrancy due to interacting with objects/code writing in Python.

I expect that most of my extension code would continue working correctly if the GIL were to be taken before entering an extension not marked as gil-safe, except for my next point. That said, most of them are fairly simple wrappers around C/C++ libraries.

I’m a more worried about borrowed references in the C API, a nogil mode can introduce a race in code like item = PyList_GetItem(aList, anIndex); Py_INCREF(item).

How likely is it that unrelated extension modules touch the same state in a thread-unsafe manner?

A global flag doesn’t sound desirable at all. Many scenarios will involve both thread-safe and thread-unsafe extensions, and you don’t want to give up the benefit of parallelism for the occasional legacy extension.

This sounds indeed more desirable.

1 Like

I don’t know! If someone can figure out to gather some data that would be interesting for sure. I guess the main thing to worry about would be extensions touching interpreter-global state, without any lock besides the GIL. If we switch to finer-grained locking in the interpreter itself, but an extension module doesn’t know to take the finer-grained locks, then that’d be bad.

Maybe in a lot of cases this would turn out OK because the extension module isn’t poking at C struct directly, but rather calling some C API function, and we’ll add locking into those C API functions? But there might be extensions that touch shared C struct internals directly, or that need to perform multiple C API calls atomically, so fine-grained locking inside C API calls won’t be enough.

That’s what I think too.