AFAIK, in current CPython, all of the above is correct, works as you’d expect, and has no race conditions.
At most one thread is active at once, and switching the active thread invokes a memory barrier (through e.g. pthread_mutex or Windows CriticalSection). The GIL makes things safe.
(If the GIL is removed, the documentation you ask for will need to be written. That would be a development question about current state of an experimental feature, not about using Python; best discussed in the parent topic.)