TLS related code in Python/pystate.c

Hi,

I stumbled across the following code in pystate.c in Python-3.12.4

#ifdef HAVE_THREAD_LOCAL
_Py_thread_local PyThreadState *_Py_tss_tstate = NULL;
#endif

static inline PyThreadState *
current_fast_get(_PyRuntimeState *Py_UNUSED(runtime))
{
#ifdef HAVE_THREAD_LOCAL
    return _Py_tss_tstate;
#else
    // XXX Fall back to the PyThread_tss_*() API.
#  error "no supported thread-local variable storage classifier"
#endif
}

I am trying to understand the comment // XXX Fall back to the PyThread_tss_() API, if the code is supposed to fall back to PyThread_tss_* api in case of no TLS support, why is there an #error thrown? will Python-3.12.4 not build without thread local storage support or is there some toggle for the same wherein we can build Python-3.12.4 without TLS support. Python-3.11 didn’t have a requirement of TLS(this code was not present in pystate.c in 3.11).

any help is much appreciated
thank you

Since Python 3.12 added support for a per-interpreter GIL (PEP 684), they use TLS to store the thread state (additional info in the issue), so yeah you can’t build CPython 3.12 and up without TLS.

The PR/commit message provides some context for your specific question about the #error directive:

Note that we do not provide a fallback to the thread-local, either falling back to the old tstate_current or to thread-specific storage (PyThread_tss_*()). If that proves problematic then we can circle back. I consider it unlikely, but will run the buildbots to double-check.

Thank you @bytemarx

I am trying to replace thread_local using thread specific storage api’s like thread_getspecific and thread_setspecific, since these are primitive and supported on many platforms including mine.

can something like the following be used in place of the above-mentioned code,

pthread_key_t tls_key;

// Function to set the thread-local tstate
void set_thread_local_pystate(PyThreadState* tstate) {
    pthread_setspecific(tls_key, tstate);
}

then in current_fast_set the following will be called as,

static inline void
current_fast_set(_PyRuntimeState *Py_UNUSED(runtime), PyThreadState *tstate)
{
    assert(tstate != NULL);
#ifdef HAVE_THREAD_LOCAL
    _Py_tss_tstate = tstate;
#elif defined(NO_THREAD_LOCAL)
    int status = 0;
    status = pthread_key_create((pthread_key_t *)&tls_key, NULL);
    set_thread_local_pystate(tstate);
    _Py_tss_tstate = tstate;
#else
    // XXX Fall back to the PyThread_tss_*() API.
#  error "no supported thread-local variable storage classifier"
#endif
}

will this work? or there are other considerations that need to be taken care of?

can we use any thread specific storage Apis (PyThread_tss_*) in place of #error directives?

any help is much appreciated.

So I’m not too familiar with using the POSIX thread-specific data APIs so correct me if I’m wrong, but I don’t think this will work as you’re overwriting your data key with each call to current_fast_set (you also still have _Py_tss_tstate = tstate; but I assume that was by mistake).

I’m also not really sure what you mean by “these are primitive and supported on many platforms” since _Thread_local/thread_local is part of the standard since C11/C++11. Can you share why you’re trying to use the POSIX API over the thread_local specifier?

I am using a legacy platform, which doesn’t support thread_local, it has most of functionality of c11 but missing TLS. So have to use POSIX thread specific APIs. Hence looking for alternatives, since TSS Posix Apis were already part of Python-3.11 it worked on my platform.

In current_fast_set I saw the following code with comment,

#else
    // XXX Fall back to the PyThread_tss_*() API.
#  error "no supported thread-local variable storage classifier"
#endif

the comment says to fall back to PyThread_tss_*() API, and hence was trying to find out whether is an alternative.

As the comment in pystate.c implies, we should be able to use the PyThread_tss_*() API to get the same effect. (Note that it will almost definitely be slower.) The caveat is that using that API means we must also manage the lifecycle of both the key (Py_tss_t *) and the values. At least part of that would be done during runtime initialization and finalization, rather than in current_fast_set(). None of this is especially tricky, but does require some knowledge of the runtime implementation. If you’d like to take a stab at it, I’d be glad to point you at the right code and/or give you feedback.

To be clear, using platform-specific APIs for the implementation isn’t really an option. We should only be using Python-specific APIs that wrap the platform-specific ones.

Thank you Eric. Yes I would like to go with PyThread_tss_*() approach, please do help me with some pointers.