Platform-agnostic yielding of the GIL

ZeroIntensity · January 13, 2025, 6:00pm

This is similar to Add Yield() function to python Threading, but now we’ve got some proper motivation for it.

When developing a multithreaded program, it’s sometimes desirable to explicitly release the GIL to let another thread execute (it’s probably useful for free-threading too, but I’m not going to focus on that). This is generally done with time.sleep(0), so something like:

def thread_2():
    do_some_other_computing()


def thread_1():
    do_some_computing()
    time.sleep(0)  # Explicitly give the GIL to thread 2, instead of relying on the switch interval

Recently, it was discovered that time.sleep(0) is significantly slower on Unix systems–see this issue. As it turns out, fixing this has other negative implications (take a look at @picnixz’s PR), so this behavior will probably remain for the time-being.

Due to this issue, it’s sub-optimal to use time.sleep(0) for yielding the GIL, at least on non-Windows systems. The suggested alternative is to use os.sched_yield, but that doesn’t exist on Windows! Solutions using os.sched_yield will need an additional code path that falls back to time.sleep(0) if they want to support Windows.

So, this isn’t overall great. I propose adding a new function to threading (let’s call it yield_thread) that does this independent of the platform, which will both:

Shed some light on the problem with time.sleep(0), so people stop using it for scheduling.
Provide a seamless drop-in for time.sleep(0) (unlike os.sched_yield).

Using the example from before, it would be very similar in usage:

def thread_2():
    do_some_other_computing()


def thread_1():
    do_some_computing()
    threading.yield_thread()

FWIW, there’s some precedent in other languages for this, but I’m not particularly sure it matters:

Java’s Thread.yield()
C++11’s std::this_thread::yield
Rust’s thread::yield_now
JavaScript’s scheduler.yield()

csm10495 · January 13, 2025, 6:07pm

I wonder why that same method (os.sched_yield) can’t just call SwitchToThread function (processthreadsapi.h) - Win32 apps | Microsoft Learn on Windows.

ZeroIntensity · January 13, 2025, 6:15pm

It could, but it’s a thin wrapper around the POSIX sched_yield function. It wouldn’t make much sense to me to try and patch it for Windows, considering it doesn’t support POSIX.

The other issue is that sched_yield is supposed to be used in conjunction with the other scheduling functions–reliably patching all of those for Windows would be much more difficult (if not impossible).

hauntsaninja · January 14, 2025, 8:08am

The slowness here is that on Linux, when using the default scheduler, clock_nanosleep takes 50 microseconds. This is an explicit implementation choice in Linux and one that I think is basically fine… this is smaller than all the example small numbers mentioned in the thread you link. It’s two orders of magnitude smaller than default sys.getswitchinterval().

I think the users who encounter this (seemingly one in three years for CPython) are probably better served by changing the scheduling policy (e.g. via os.sched_setscheduler) or futzing with PR_SET_TIMERSLACK. After all, Linux man pages aren’t too bullish on sched_yield when used with the default scheduler: “Use of sched_yield [with the default scheduler] is unspecified and very likely means your application design is broken”. Though a yield API using sched_yield would put us in good company with other languages, as you point out.

I kind of feel time.sleep(0) is fine for 99.9% of users and the 0.1% users are better off making exactly the syscalls they need or tuning their process settings.

I don’t know much about what fairness guarantees the current lock backing the GIL provides that would let us yield in user space — but if it does, that could be an interesting thing to expose.