How to Get the "Current" Thread State

eric.snow · January 11, 2023, 12:53am

Right now there are multiple places we record the thread state that is associated with (and in use by) the current thread:

in _PyRuntime.gilstate.tstate_current
- the original (1997)
- only guaranteed “current” if GIL held
in thread-local storage via _PyRuntime.gilstate.autoTSSkey
- part of the PEP 311 gilstate implementation (2004)
- a little slower (maybe?) than tstate_current (due to TSS API)
- easier to understand
- does not rely on GIL
in _PyRuntime.ceval.gil.last_holder
- part of the “new” GIL implementation (2009)
- overlaps with tstate_current

main question: Do we need all three?

I suspect we can get rid of tstate_current and would like to do so.

More info:

`_PyRuntime.gilstate.tstate_current`

This is the original “the current thread state”, added in the big Python 2.2 change where we added PyThreadState and PyInterpreterState.

currently relatively fast to read/write (an offset on a static variable)
only reliable if GIL held
messy with a per-interpreter GIL
used for more than just the gilstate API
probably belongs on _PyRuntime.ceval.gil more than where it is (my bad, 2017)
set exclusively in _PyThreadState_Swap() (incl. to acquire/release GIL) and _PyThreadState_DeleteCurrent()
currently used (read) for:
- _PyThreadState_GET(), etc. (hundreds of cases)
- PyEval_AcquireLock() and PyEval_ReleaseLock()

`_PyRuntime.gilstate.autoTSSkey`

This is where I’d expect to find the “current” thread state.

(currently) uses the “slower” TSS API
could use faster C11 syntax (or older compiler extensions, e.g. __thread, __declspec( thread ))
used for more than just the gilstate API
set only during runtime init and when each new thread state is created
currently used (read):
- for gilstate API (incl. PyGILState_GetThisThreadState())
- internally in Python/pystate.c

`_PyRuntime.ceval.gil.last_holder`

used internally in the GIL implementation for some checks and for bookkeeping
may completely duplicate tstate_current (while GIL held)

guido · January 11, 2023, 1:27am

I think there’s an excess .ceval in there? Looking at _PyRuntimeState_GetThreadState() in pycore_pystate.h I see a mention of a variable runtime->gilstate.tstate_current.

Ditto here.

FWIW I’m guessing that @pitrou is the only person who understands all this.

pitrou · January 11, 2023, 10:22am

Well, off the top of my head, that’s the main point: last_holder is not reset when the GIL is released, so the information of which thread last held the GIL is kept available.

As for autoTSSkey, I don’t remember working on that part.

steve.dower · January 11, 2023, 1:07pm

I doubt we can get rid of the thread-local storage field, at least without changing all public APIs to require passing in the thread state explicitly (like most other embeddable language runtimes do). But hopefully we can update internal APIs to pass threadstate around and so we only have to pay the cost at boundaries. Unfortunately, tying Python’s state to the OS thread like this causes some real pain when embedding.^[1]

Presumably in a per-interpreter GIL world, gilstate has to be looked up from PyInterpreterState, which has to be found from PyThreadState which is going to need the TSS API (or an explicit parameter). So I’d say it’ll depend on which “context” object is going to be in TSS: PyInterpreterState or PyThreadState?

My gut feel is that PyThreadState will continue to be in TSS, and so we’ll need the TSS API to get it if we don’t have it, and gilstate.tstate_current can go. (If we have a PyObject* then I expect we can trace back to the interpreter state that owns it, but probably not safely without knowing that we hold that interpreter’s GIL, and if we knew that then we wouldn’t need to find it.) We’ll want to be careful not to use TSS any more often than we need to, and I really do hope that we one day make it easy for embedders to explicitly control the threadstate when they’re calling back into Python code.

Though to be fair, I haven’t had to work on this since Python 3.7, so maybe we’ve actually got some APIs to “just set” the current state now. We didn’t at the time (or they did extra validation and would fail if you tried to move thread), and so it was impossible to use native thread pools to execute Python code, for example. ↩︎

eric.snow · January 11, 2023, 3:05pm

Fixed! Thanks for noticing.

pitrou · January 12, 2023, 10:00am

Why do you say it’s impossible to execute Python code from a thread pool?

(side note: Discourse footnotes are pretty but they also prevent from copy-pasting their contents…)

steve.dower · January 12, 2023, 1:35pm

I should say it’s impossible (or very nearly impossible) to resume Python code execution in a thread pool. It’s fine if you start it running and wait for it to finish, but that’s not how embedding typically works.

Take Blender for example. Most of its physics simulation is going to happen in multithreaded C++, but you can write custom expressions in Python as part of it. If the processing is being run in a thread pool, you have to create a new Python thread state every time you want to call back into it, because you can’t guarantee that you’re on a “known” native thread, even if you know (or have decided) that it doesn’t matter.

It gets worse in something like Minecraft, where you call Python code which calls back into the game and has to wait for something to complete (async/await style, though a custom native implementation). The “completion” signal arrives on any available thread from the thread pool, but the Python code has to be attached to its original thread, so you can’t do anything except native message passing from the completion signal. It also means you can’t be running the Python code on a thread pool thread (at least in this system), because you have to block it forever waiting for the extra signal.

Both cases would be fine to take a global interpreter lock and execute their code, because they’re controlled enough to not have any real reliance on the native thread they’re currently on. But because Python internally requires so much consistency between the GIL, the threadstate and the OS thread, it’s not possible to just do this. You always end up with a dedicated thread that only runs Python code, forbid Python threading (which will mess with your dedicated thread), and set up message passing primitives to interact with the host application.

(Incidentally, neither case is a “general purpose Python environment” situation. Nobody is installing arbitrary third-party modules or adding new native code. If your app is going to allow that, you don’t really have much choice but to run Python as a separate process. I’m more concerned about apps that want expression evaluation or short snippets run in the context of the main process, rather than running an entire app/script’s worth of Python code.)

pitrou · January 12, 2023, 1:50pm

That’s what the PyGILState API automatically does, for the record. It will create a Python thread state if one does not already exist for the native thread.

That sounds like a very specific architecture TBH.

steve.dower · January 12, 2023, 2:00pm

At least up until 3.7 (as I said, I haven’t had to do this since then), the API changed in virtually every version, and sometimes in micro-versions. Sometimes “ensure” would do it, sometimes it would crash. Sometimes “create” would do it. Sometimes that would crash. Sometimes you had to do one before the other, sometimes after, sometimes not at all. We had the most hideous code to handle this in our old debugger, and it certainly was not amenable to embedders.

It’s also a big, heavyweight operation for potentially doing a single attribute lookup (if that’s all the user wants to do). Also not conducive to embedders. Lua keeps winning here for a range of reasons, but this is definitely one of them.

Perhaps, but it’s not uncommon, at least on the user side of applications. Actually, it’s not that uncommon on the server side either - everyone tries to parallelise operations using thread pools (or equivalents), and there’s a lot of completion-triggered event handling.

The only thing that makes it uncommon is that it’s really hard to do it with CPython, so most people give up. I’ve seen it done more often with IronPython, tbh, because it’s so much easier to make it work But even more often with Lua or JavaScript (V8).