Proposal: Add Text to the Language Reference About Runtime Components

Hi all,

From gh-135944, which I created recently:

As I’ve been working on documentation for using multiple interpreters, I’ve found that there really isn’t any clear explanation of how the Python runtime operates from a high level. (Perhaps I missed it.)

In particular I think we’d benefit from having a brief overview of Python’s runtime components, both relative to the host and relative to Python’s own execution context. To me, the natural place to find this is as a section of the “execution model” page in the language reference.

I’ve proposed a PR: gh-135945. The added text isn’t meant to change the language, but rather to describe the status quo clearly so we can refer to the section from elsewhere. However, it still relates to the language specification (with corresponding wide effect), so I wanted to make sure it got broader visibility here.

Any objections? Thoughts?


FTR, here’s the proposed text, as of the time of this post:

(expand)

Please make any suggestions about the new text via review comments on the PR.

From https://github.com/python/cpython/issues/135945:

.. _execcomponents:

Runtime Components
==================

Python's execution model does not operate in a vacuum.  It runs on a
computer.  When a program runs, the conceptual layers of how it runs
on the computer look something like this::

   host computer (or VM or container)
     process
       OS thread (runs machine code)

While a program always starts with exactly one of each of those, it may
grow to include multiple of each.  Hosts and processes are isolated and
independent from one another.  However, threads are not.

Not all platforms support threads, though most do.  For those that do,
each thread does *run* independently, for the small segments of time it
is scheduled to execute its code on the CPU.  Otherwise, all threads
in a process share all the process' resources, including memory.
The initial thread is known as the "main" thread.

.. note::

   The way they share resources is exactly what can make threads a pain:
   two threads running at the same arbitrary time on different CPU cores
   can accidentally interfere with each other's use of some shared data.

The same layers apply to each Python program, with some extra layers
specific to Python::

   host
     process
       Python runtime
         interpreter
           Python thread (runs bytecode)

When a Python program starts, it looks exactly like that, with one
of each.  The process has a single global runtime to manage Python's
process-global resources.  Each Python thread has all the state it needs
to run Python code (and use any supported C-API) in its OS thread.
Depending on the implementation, this probably includes the current
exception and the Python call stack.

In between the global runtime and the thread(s) lies the interpreter.
It completely encapsulates all of the non-process-global runtime state
that the interpreter's Python threads share.  For example, all its
threads share :data:`sys.modules`.  Every Python thread belongs to a
single interpreter and runs using that shared state.  The initial
interpreter is known as the "main" interpreter, and the initial thread,
where the runtime was initialized, is known as the "main" thread.

.. note::

   The interpreter here is not the same as the "bytecode interpreter",
   which is what runs in each thread, executing compiled Python code.

Every Python thread is associated with a single OS thread, which is
where it runs.  However, multiple Python threads can be associated with
the same OS thread.  For example, an OS thread might run code with a
first interpreter and then with a second, each necessarily with its own
Python thread.  Still, regardless of how many are *associated* with
an OS thread, only one Python thread can be actively *running* in
an OS thread at a time.  Switching between interpreters means
changing the active Python thread.

Once a program is running, new Python threads can be created using the
:mod:`threading` module (on platforms and Python implementations that
support threads).  Additional processes can be created using the
:mod:`os`, :mod:`subprocess`, and :mod:`multiprocessing` modules.
You can run coroutines (async) in the main thread using :mod:`asyncio`.
Interpreters can be created using the :mod:`concurrent.interpreters`
module.
5 Likes

live thoughts as I read through that draft. i’m trying to play devils advocate and read this in all possible ways to come up with complaints. make sense?


you start out with “host computer” (and mentioning VM or container - but I don’t think we should touch the VM or container terms, they don’t matter). then later go on to say “for the small segments of time it is scheduled to execute its code on the CPU” in reference to threads - but we’ve never defined what a CPU is. a host computer is not a CPU as elliptically implied in the note. But we shouldn’t explain that, or explain the idea of time slicing. There really is no need to know if there are multiple CPU cores or how parallel execution happens, just that it must be assumed to. The key point to understand to me, even in that first note, is that threads must be assumed to run at the same time at potentially different non-synchronized rates, seeing a different view of overall changing global state

I’d also suggest using the term “host machine” instead of “host computer” as the word machine ties in better with “virtual machine” (which we don’t need to mention but will be familiar to many) - whereas what a “computer” at this point is thought of more as a physical object with a wide variety of sizes from a campus of warehouses down to an SoC.

You mention “OS” several times but should probably put it in the diagram between host machine and process.


“Every Python thread belongs to a single interpreter and runs using that shared state.”

could be misread to imply that each thread is its own interpreter using shared interpreter state. Perhaps word that as “Every Python thread that is part of the same interpreter shares Python state with all other threads in that interpreter …”

“For example, an OS thread might run code with a first interpreter and then with a second, each necessarily with its own Python thread.”

I’m really not sure how to parse this. it could be read to imply that there often are multiple interpreters and that they must have their own threads. or alternatively that threads imply multiple interpreters. I think it needs to be clear. “each necessarily with its own Python thread state tied to the specific interpreter” would make more sense to me.

“However, multiple Python threads can be associated with the same OS thread.”

I think this is confusing because in the most common case of there is only one interpreter this is not true. Any OS thread that executes within an interpreter has to have a single Python thread state matching that OS thread. There cannot be multiple running Python threads in one interpreter associated with the same OS thread.

1 Like

The trend to document better the architecture of the interpreter in the repo is very welcome to me. And clear terminology is invaluable, so thanks.

It is always helpful when statements about “Python” manage to distinguish what is specific to CPython from what belongs to the language, including what should be in every stdlib and how it behaves. It can be difficult to be sure sometimes, even for core devs, but I appreciate the effort.

Got it. Might well do that if I can usefully add.

1 Like