PEP 734: Multiple Interpreters in the Stdlib

eric.snow · December 15, 2023, 12:14am

PEP 734 is the new proposal I’m introducing to replace PEP 554. The new PEP is available online: https://peps.python.org/pep-0734/. I’ve also included the text at the bottom of this post.

Why a new PEP? PEP 554 was created over 6 years ago and has gone through numerous rounds of discussion and revision over that time. Consequently, it had grown a bit cluttered, disorganized, and unfocused. Rather than clean it up and discard the various artifacts of those discussions, I turned the latest revision into a new PEP, to replace 554.

PEP 734 has two main objectives:

expose the existing multi-interpreter C-API capability to Python code
add a mechanism for communicating between interpreters

The PEP generally aims for a minimal foundation on which we can build as needed in the future. We don’t need to solve all the problems in this proposal.

Main differences from PEP 554:

cleaned up the text
simplified the proposed APIs
channels have been replaced with queues
Interpreter.run() now automatically runs in the background (new OS thread)
added Interpreter.exec_sync(), which runs in the foreground (current OS thread)
added Interpreter.prepare_main()
added concurrent.futures.InterpreterPoolExecutor

Seeking feedback about “foreground” vs. “background”:

I’m interested in getting some feedback about things related to foreground vs. background execution. Here, “foreground” refers to executing in the current OS thread, effectively pausing whatever was already running (similar to the builtin exec() and to function calls). “Background” refers to executing in a new OS thread; what was already running keeps running (similar to threading.Thread).

One occasional point of confusion in the past has been about whether subinterpreters execute in the foreground or the background. (The correct answer is “foreground”.) In order to help reduce possible confusion, I renamed Interpreter.run() to Interpreter.exec(), to emphasize the similarity with the builtin exec() (rather than with threading.Thread). I later changed it further to Interpreter.exec_sync(). I’m still not 100% comfortable with the name and would like feedback on perhaps finding something better. Here are some possibilities:

“exec()” (my preference, but too confusing?)
“exec_foreground()”
“exec_fg()”

I don’t consider this a critical point, and don’t mind leaving exec_sync(), but your thoughts on this would be helpful.

PEP text:

(expand)

PEP: 734
Title: Multiple Interpreters in the Stdlib
Author: Eric Snow <ericsnowcurrently@gmail.com>
Status: Draft
Type: Standards Track
Created: 06-Nov-2023
Python-Version: 3.13
Replaces: 554


.. note::
   This PEP is essentially a continuation of :pep:`554`.  That document
   had grown a lot of ancillary information across 7 years of discussion.
   This PEP is a reduction back to the essential information.  Much of
   that extra information is still valid and useful, just not in the
   immediate context of the specific proposal here.


Abstract
========

This PEP proposes to add a new module, ``interpreters``, to support
inspecting, creating, and running code in multiple interpreters in the
current process.  This includes ``Interpreter`` objects that represent
the underlying interpreters.  The module will also provide a basic
``Queue`` class for communication between interpreters.  Finally, we
will add a new ``concurrent.futures.InterpreterPoolExecutor`` based
on the ``interpreters`` module.


Introduction
============

Fundamentally, an "interpreter" is the collection of (essentially)
all runtime state which Python threads must share.  So, let's first
look at threads.  Then we'll circle back to interpreters.

Threads and Thread States
-------------------------

A Python process will have one or more OS threads running Python code
(or otherwise interacting with the C API).  Each of these threads
interacts with the CPython runtime using its own thread state
(``PyThreadState``), which holds all the runtime state unique to that
thread.  There is also some runtime state that is shared between
multiple OS threads.

Any OS thread may switch which thread state it is currently using, as
long as it isn't one that another OS thread is already using (or has
been using).  This "current" thread state is stored by the runtime
in a thread-local variable, and may be looked up explicitly with
``PyThreadState_Get()``.  It gets set automatically for the initial
("main") OS thread and for ``threading.Thread`` objects.  From the
C API it is set (and cleared) by ``PyThreadState_Swap()`` and may
be set by ``PyGILState_Ensure()``.  Most of the C API requires that
there be a current thread state, either looked up implicitly
or passed in as an argument.

The relationship between OS threads and thread states is one-to-many.
Each thread state is associated with at most a single OS thread and
records its thread ID.  A thread state is never used for more than one
OS thread.  In the other direction, however, an OS thread may have more
than one thread state associated with it, though, again, only one
may be current.

When there's more than one thread state for an OS thread,
``PyThreadState_Swap()`` is used in that OS thread to switch
between them, with the requested thread state becoming the current one.
Whatever was running in the thread using the old thread state is
effectively paused until that thread state is swapped back in.

Interpreter States
------------------

As noted earlier, there is some runtime state that multiple OS threads
share.  Some of it is exposed by the ``sys`` module, though much is
used internally and not exposed explicitly or only through the C API.

This shared state is called the interpreter state
(``PyInterpreterState``).  We'll sometimes refer to it here as just
"interpreter", though that is also sometimes used to refer to the
``python`` executable, to the Python implementation, and to the
bytecode interpreter (i.e. ``exec()``/``eval()``).

CPython has supported multiple interpreters in the same process (AKA
"subinterpreters") since version 1.5 (1997).  The feature has been
available via the :ref:`C API <python:sub-interpreter-support>`.

Interpreters and Threads
------------------------

Thread states are related to interpreter states in much the same way
that OS threads and processes are related (at a hight level).  To
begin with, the relationship is one-to-many.
A thread state belongs to a single interpreter (and stores
a pointer to it).  That thread state is never used for a different
interpreter.  In the other direction, however, an interpreter may have
zero or more thread states associated with it.  The interpreter is only
considered active in OS threads where one of its thread states
is current.

Interpreters are created via the C API using
``Py_NewInterpreterFromConfig()`` (or ``Py_NewInterpreter()``, which
is a light wrapper around ``Py_NewInterpreterFromConfig()``).
That function does the following:

1. create a new interpreter state
2. create a new thread state
3. set the thread state as current
   (a current tstate is needed for interpreter init)
4. initialize the interpreter state using that thread state
5. return the thread state (still current)

Note that the returned thread state may be immediately discarded.
There is no requirement that an interpreter have any thread states,
except as soon as the interpreter is meant to actually be used.
At that point it must be made active in the current OS thread.

To make an existing interpreter active in the current OS thread,
the C API user first makes sure that interpreter has a corresponding
thread state.  Then ``PyThreadState_Swap()`` is called like normal
using that thread state.  If the thread state for another interpreter
was already current then it gets swapped out like normal and execution
of that interpreter in the OS thread is thus effectively paused until
it is swapped back in.

Once an interpreter is active in the current OS thread like that, the
thread can call any of the C API, such as ``PyEval_EvalCode()``
(i.e. ``exec()``).  This works by using the current thread state as
the runtime context.

The "Main" Interpreter
----------------------

When a Python process starts, it creates a single interpreter state
(the "main" interpreter) with a single thread state for the current
OS thread.  The Python runtime is then initialized using them.

After initialization, the script or module or REPL is executed using
them.  That execution happens in the interpreter's ``__main__`` module.

When the process finishes running the requested Python code or REPL,
in the main OS thread, the Python runtime is finalized in that thread
using the main interpreter.

Runtime finalization has only a slight, indirect effect on still-running
Python threads, whether in the main interpreter or in subinterpreters.
That's because right away it waits indefinitely for all non-daemon
Python threads to finish.

While the C API may be queried, there is no mechanism by which any
Python thread is directly alerted that finalization has begun,
other than perhaps with "atexit" functions that may be been
registered using ``threading._register_atexit()``.

Any remaining subinterpreters are themselves finalized later,
but at that point they aren't current in any OS threads.

Interpreter Isolation
---------------------

CPython's interpreters are intended to be strictly isolated from each
other.  That means interpreters never share objects (except in very
specific cases with immortal, immutable builtin objects).  Each
interpreter has its own modules (``sys.modules``), classes, functions,
and variables.  Even where two interpreters define the same class,
each will have its own copy.  The same applies to state in C, including
in extension modules.  The CPython C API docs `explain more`_.

.. _explain more:
   https://docs.python.org/3/c-api/init.html#bugs-and-caveats

Notably, there is some process-global state that interpreters will
always share, some mutable and some immutable.  Sharing immutable
state presents few problems, while providing some benefits (mainly
performance).  However, all shared mutable state requires special
management, particularly for thread-safety, some of which the OS
takes care of for us.

Mutable:

* file descriptors
* low-level env vars
* process memory (though allocators *are* isolated)
* the list of interpreters

Immutable:

* builtin types (e.g. ``dict``, ``bytes``)
* singletons (e.g. ``None``)
* underlying static module data (e.g. functions) for
  builtin/extension/frozen modules

Existing Execution Components
-----------------------------

There are a number of existing parts of Python that may help
with understanding how code may be run in a subinterpreter.

In CPython, each component is built around one of the following
C API functions (or variants):

* ``PyEval_EvalCode()``: run the bytecode interpreter with the given
  code object
* ``PyRun_String()``: compile + ``PyEval_EvalCode()``
* ``PyRun_File()``: read + compile + ``PyEval_EvalCode()``
* ``PyRun_InteractiveOneObject()``: compile + ``PyEval_EvalCode()``
* ``PyObject_Call()``: calls ``PyEval_EvalCode()``

builtins.exec()
^^^^^^^^^^^^^^^

The builtin ``exec()`` may be used to execute Python code.  It is
essentially a wrapper around the C API functions ``PyRun_String()``
and ``PyEval_EvalCode()``.

Here are some relevant characteristics of the builtin ``exec()``:

* It runs in the current OS thread and pauses whatever
  was running there, which resumes when ``exec()`` finishes.
  No other OS threads are affected.
  (To avoid pausing the current Python thread, run ``exec()``
  in a ``threading.Thread``.)
* It may start additional threads, which don't interrupt it.
* It executes against a "globals" namespace (and a "locals"
  namespace).  At module-level, ``exec()`` defaults to using
  ``__dict__`` of the current module (i.e. ``globals()``).
  ``exec()`` uses that namespace as-is and does not clear it before or after.
* It propagates any uncaught exception from the code it ran.
  The exception is raised from the ``exec()`` call in the Python
  thread that originally called ``exec()``.

Command-line
^^^^^^^^^^^^

The ``python`` CLI provides several ways to run Python code.  In each
case it maps to a corresponding C API call:

* ``<no args>``, ``-i`` - run the REPL
  (``PyRun_InteractiveOneObject()``)
* ``<filename>`` - run a script (``PyRun_File()``)
* ``-c <code>`` - run the given Python code (``PyRun_String()``)
* ``-m module`` - run the module as a script
  (``PyEval_EvalCode()`` via ``runpy._run_module_as_main()``)

In each case it is essentially a variant of running ``exec()``
at the top-level of the ``__main__`` module of the main interpreter.

threading.Thread
^^^^^^^^^^^^^^^^

When a Python thread is started, it runs the "target" function
with ``PyObject_Call()`` using a new thread state.  The globals
namespace come from ``func.__globals__`` and any uncaught
exception is discarded.


Motivation
==========

The ``interpreters`` module will provide a high-level interface to the
multiple interpreter functionality.  The goal is to make the existing
multiple-interpreters feature of CPython more easily accessible to
Python code.  This is particularly relevant now that CPython has a
per-interpreter GIL (:pep:`684`) and people are more interested
in using multiple interpreters.

Without a stdlib module, users are limited to the
:ref:`C API <python:sub-interpreter-support>`, which restricts how much
they can try out and take advantage of multiple interpreters.

The module will include a basic mechanism for communicating between
interpreters.  Without one, multiple interpreters are a much less
useful feature.


Rationale
=========

A Minimal API
-------------

Since the core dev team has no real experience with
how users will make use of multiple interpreters in Python code, this
proposal purposefully keeps the initial API as lean and minimal as
possible.  The objective is to provide a well-considered foundation
on which further (more advanced) functionality may be added later,
as appropriate.

That said, the proposed design incorporates lessons learned from
existing use of subinterpreters by the community, from existing stdlib
modules, and from other programming languages.  It also factors in
experience from using subinterpreters in the CPython test suite and
using them in `concurrency benchmarks`_.

.. _concurrency benchmarks:
   https://github.com/ericsnowcurrently/concurrency-benchmarks

Interpreter.prepare_main() Sets Multiple Variables
--------------------------------------------------

``prepare_main()`` may be seen as a setter function of sorts.
It supports setting multiple names at once,
e.g. ``interp.prepare_main(spam=1, eggs=2)``, whereas most setters
set one item at a time.  The main reason is for efficiency.

To set a value in the interpreter's ``__main__.__dict__``, the
implementation must first switch the OS thread to the identified
interpreter, which involves some non-negligible overhead.  After
setting the value it must switch back.
Furthermore, there is some additional overhead to the mechanism
by which it passes objects between interpreters, which can be
reduced in aggregate if multiple values are set at once.

Therefore, ``prepare_main()`` supports setting multiple
values at once.

Propagating Exceptions
----------------------

An uncaught exception from a subinterpreter,
via ``Interpreter.exec_sync()``,
could either be (effectively) ignored, like ``threading.Thread()`` does,
or propagated, like the builtin ``exec()`` does.  Since ``exec_sync()``
is a synchronous operation, like the builtin ``exec()``,
uncaught exceptions are propagated.

However, such exceptions are not raised directly.  That's because
interpreters are isolated from each other and must not share objects,
including exceptions.  That could be addressed by raising a surrogate
of the exception, whether a summary, a copy, or a proxy that wraps it.
Any of those could preserve the traceback, which is useful for
debugging.  The ``ExecFailure`` that gets raised
is such a surrogate.

There's another concern to consider.  If a propagated exception isn't
immediately caught, it will bubble up through the call stack until
caught (or not).  In the case that code somewhere else may catch it,
it is helpful to identify that the exception came from a subinterpreter
(i.e. a "remote" source), rather than from the current interpreter.
That's why ``Interpreter.exec_sync()`` raises ``ExecFailure`` and why
it is a plain ``Exception``, rather than a copy or proxy with a class
that matches the original exception.  For example, an uncaught
``ValueError`` from a subinterpreter would never get caught in a later
``try: ... except ValueError: ...``.  Instead, ``ExecFailure``
must be handled directly.

Limited Object Sharing
----------------------

As noted in `Interpreter Isolation`_, only a small number of builtin
objects may be truly shared between interpreters.  In all other cases
objects can only be shared indirectly, through copies or proxies.

The set of objects that are shareable as copies through queues
(and ``Interpreter.prepare_main()``) is limited for the sake of
efficiency.

Supporting sharing of *all* objects is possible (via pickle)
but not part of this proposal.  For one thing, it's helpful to know
that only an efficient implementation is being used.  Furthermore,
for mutable objects pickling would violate the guarantee that "shared"
objects be equivalent (and stay that way).

Objects vs. ID Proxies
----------------------

For both interpreters and queues, the low-level module makes use of
proxy objects that expose the underlying state by their corresponding
process-global IDs.  In both cases the state is likewise process-global
and will be used by multiple interpreters.  Thus they aren't suitable
to be implemented as ``PyObject``, which is only really an option for
interpreter-specific data.  That's why the ``interpreters`` module
instead provides objects that are weakly associated through the ID.


Specification
=============

The module will:

* expose the existing multiple interpreter support
* introduce a basic mechanism for communicating between interpreters

The module will wrap a new low-level ``_interpreters`` module
(in the same way as the ``threading`` module).
However, that low-level API is not intended for public use
and thus not part of this proposal.

Using Interpreters
------------------

The module defines the following functions:

* ``get_current() -> Interpreter``
      Returns the ``Interpreter`` object for the currently executing
      interpreter.

* ``list_all() -> list[Interpreter]``
      Returns the ``Interpreter`` object for each existing interpreter,
      whether it is currently running in any OS threads or not.

* ``create() -> Interpreter``
      Create a new interpreter and return the ``Interpreter`` object
      for it.  The interpreter doesn't do anything on its own and is
      not inherently tied to any OS thread.  That only happens when
      something is actually run in the interpreter
      (e.g. ``Interpreter.exec_sync()``), and only while running.
      The interpreter may or may not have thread states ready to use,
      but that is strictly an internal implementation detail.

Interpreter Objects
-------------------

An ``interpreters.Interpreter`` object that represents the interpreter
(``PyInterpreterState``) with the corresponding unique ID.
There will only be one object for any given interpreter.

If the interpreter was created with ``interpreters.create()`` then
it will be destroyed as soon as all ``Interpreter`` objects have been
deleted.

Attributes and methods:

* ``id``
      (read-only) A non-negative ``int`` that identifies the
      interpreter that this ``Interpreter`` instance represents.
      Conceptually, this is similar to a process ID.

* ``__hash__()``
      Returns the hash of the interpreter's ``id``.  This is the same
      as the hash of the ID's integer value.

* ``is_running() -> bool``
      Returns ``True`` if the interpreter is currently executing code
      in its ``__main__`` module.  This excludes sub-threads.

      It refers only to if there is an OS thread
      running a script (code) in the interpreter's ``__main__`` module.
      That basically means whether or not ``Interpreter.exec_sync()``
      is running in some OS thread.  Code running in sub-threads
      is ignored.

* ``prepare_main(**kwargs)``
      Bind one or more objects in the interpreter's ``__main__`` module.

      The keyword argument names will be used as the attribute names.
      The values will be bound as new objects, though exactly equivalent
      to the original.  Only objects specifically supported for passing
      between interpreters are allowed.  See `Shareable Objects`_.

      ``prepare_main()`` is helpful for initializing the
      globals for an interpreter before running code in it.

* ``exec_sync(code, /)``
      Execute the given source code in the interpreter
      (in the current OS thread), using its ``__main__`` module.
      It doesn't return anything.

      This is essentially equivalent to switching to this interpreter
      in the current OS thread and then calling the builtin ``exec()``
      using this interpreter's ``__main__`` module's ``__dict__`` as
      the globals and locals.

      The code running in the current OS thread (a different
      interpreter) is effectively paused until ``exec_sync()``
      finishes.  To avoid pausing it, create a new ``threading.Thread``
      and call ``exec_sync()`` in it.

      ``exec_sync()`` does not reset the interpreter's state nor
      the ``__main__`` module, neither before nor after, so each
      successive call picks up where the last one left off.  This can
      be useful for running some code to initialize an interpreter
      (e.g. with imports) before later performing some repeated task.

      If there is an uncaught exception, it will be propagated into
      the calling interpreter as a ``ExecFailure``, which
      preserves enough information for a helpful error display.  That
      means if the ``ExecFailure`` isn't caught then the full
      traceback of the propagated exception, including details about
      syntax errors, etc., will be displayed.  Having the full
      traceback is particularly useful when debugging.

      If exception propagation is not desired then an explicit
      try-except should be used around the *code* passed to
      ``exec_sync()``.  Likewise any error handling that depends
      on specific information from the exception must use an explicit
      try-except around the given *code*, since ``ExecFailure``
      will not preserve that information.

* ``run(code, /) -> threading.Thread``
      Create a new thread and call ``exec_sync()`` in it.
      Exceptions are not propagated.

      This is roughly equivalent to::

         def task():
             interp.exec_sync(code)
         t = threading.Thread(target=task)
         t.start()

Communicating Between Interpreters
----------------------------------

The module introduces a basic communication mechanism through special
queues.

There are ``interpreters.Queue`` objects, but they only proxy
the actual data structure: an unbounded FIFO queue that exists
outside any one interpreter.  These queues have special accommodations
for safely passing object data between interpreters, without violating
interpreter isolation.  This includes thread-safety.

As with other queues in Python, for each "put" the object is added to
the back and each "get" pops the next one off the front.  Every added
object will be popped off in the order it was pushed on.

Only objects that are specifically supported for passing
between interpreters may be sent through a ``Queue``.
Note that the actual objects aren't sent, but rather their
underlying data.  However, the popped object will still be
strictly equivalent to the original.
See `Shareable Objects`_.

The module defines the following functions:

* ``create_queue(maxsize=0) -> Queue``
      Create a new queue.  If the maxsize is zero or negative then the
      queue is unbounded.

Queue Objects
-------------

``interpreters.Queue`` objects act as proxies for the underlying
cross-interpreter-safe queues exposed by the ``interpreters`` module.
Each ``Queue`` object represents the queue with the corresponding
unique ID.
There will only be one object for any given queue.

``Queue`` implements all the methods of ``queue.Queue`` except for
``task_done()`` and ``join()``, hence it is similar to
``asyncio.Queue`` and ``multiprocessing.Queue``.

Attributes and methods:

* ``id``
      (read-only) A non-negative ``int`` that identifies
      the corresponding cross-interpreter queue.
      Conceptually, this is similar to the file descriptor
      used for a pipe.

* ``maxsize``
      Number of items allowed in the queue.  Zero means "unbounded".

* ``__hash__()``
      Return the hash of the queue's ``id``.  This is the same
      as the hash of the ID's integer value.

* ``empty()``
      Return ``True`` if the queue is empty, ``False`` otherwise.

      This is only a snapshot of the state at the time of the call.
      Other threads or interpreters may cause this to change.

* ``full()``
      Return ``True`` if there are ``maxsize`` items in the queue.

      If the queue was initialized with ``maxsize=0`` (the default),
      then ``full()`` never returns ``True``.

      This is only a snapshot of the state at the time of the call.
      Other threads or interpreters may cause this to change.

* ``qsize()``
      Return the number of items in the queue.

      This is only a snapshot of the state at the time of the call.
      Other threads or interpreters may cause this to change.

* ``put(obj, timeout=None)``
      Add the object to the queue.

      The object must be `shareable <Shareable Objects_>`_, which means
      the object's data is passed through rather than the object itself.

      If ``maxsize > 0`` and the queue is full then this blocks until
      a free slot is available.  If *timeout* is a positive number
      then it only blocks at least that many seconds and then raises
      ``interpreters.QueueFull``.  Otherwise is blocks forever.

* ``put_nowait(obj)``
      Like ``put()`` but effectively with an immediate timeout.
      Thus if the queue is full, it immediately raises
      ``interpreters.QueueFull``.

* ``get(timeout=None) -> object``
      Pop the next object from the queue and return it.  Block while
      the queue is empty.  If a positive *timeout* is provided and an
      object hasn't been added to the queue in that many seconds
      then raise ``interpreters.QueueEmpty``.

* ``get_nowait() -> object``
      Like ``get()``, but do not block.  If the queue is not empty
      then return the next item.  Otherwise, raise
      ``interpreters.QueueEmpty``.

Shareable Objects
-----------------

Both ``Interpreter.prepare_main()`` and ``Queue`` work only with
"shareable" objects.

A "shareable" object is one which may be passed from one interpreter
to another.  The object is not necessarily actually directly shared
by the interpreters.  However, even if it isn't, the shared object
should be treated as though it *were* shared directly.  That's a
strong equivalence guarantee for all shareable objects.
(See below.)

For some types (builtin singletons), the actual object is shared.
For some, the object's underlying data is actually shared but each
interpreter has a distinct object wrapping that data.  For all other
shareable types, a strict copy or proxy is made such that the
corresponding objects continue to match exactly.  In cases where
the underlying data is complex but must be copied (e.g. ``tuple``),
the data is serialized as efficiently as possible.

Shareable objects must be specifically supported internally
by the Python runtime.  However, there is no restriction against
adding support for more types later.

Here's the initial list of supported objects:

* ``str``
* ``bytes``
* ``int``
* ``float``
* ``bool`` (``True``/``False``)
* ``None``
* ``tuple`` (only with shareable items)
* ``Queue``
* ``memoryview`` (underlying buffer actually shared)

Note that the last two on the list, queues and ``memoryview``, are
technically mutable data types, whereas the rest are not.  When any
interpreters share mutable data there is always a risk of data races.
Cross-interpreter safety, including thread-safety, is a fundamental
feature of queues.

However, ``memoryview`` does not have any native accommodations.
The user is responsible for managing thread-safety, whether passing
a token back and forth through a queue to indicate safety
(see `Synchronization`_), or by assigning sub-range exclusivity
to individual interpreters.

Most objects will be shared through queues (``Queue``), as interpreters
communicate information between each other.  Less frequently, objects
will be shared through ``prepare_main()`` to set up an interpreter
prior to running code in it.  However, ``prepare_main()`` is the
primary way that queues are shared, to provide another interpreter
with a means of further communication.

Finally, a reminder: for a few types the actual object is shared,
whereas for the rest only the underlying data is shared, whether
as a copy or through a proxy.  Regardless, it always preserves
the strong equivalence guarantee of "shareable" objects.

The guarantee is that a shared object in one interpreter is strictly
equivalent to the corresponding object in the other interpreter.
In other words, the two objects will be indistinguishable from each
other.  The shared object should be treated as though the original
had been shared directly, whether or not it actually was.
That's a slightly different and stronger promise than just equality.

The guarantee is especially important for mutable objects, like
``Queue`` and ``memoryview``.  Mutating the object in one interpreter
will always be reflected immediately in every other interpreter
sharing the object.

Synchronization
---------------

There are situations where two interpreters should be synchronized.
That may involve sharing a resource, worker management, or preserving
sequential consistency.

In threaded programming the typical synchronization primitives are
types like mutexes.  The ``threading`` module exposes several.
However, interpreters cannot share objects which means they cannot
share ``threading.Lock`` objects.

The ``interpreters`` module does not provide any such dedicated
synchronization primitives.  Instead, ``Queue`` objects provide
everything one might need.

For example, if there's a shared resource that needs managed
access then a queue may be used to manage it, where the interpreters
pass an object around to indicate who can use the resource::

   import interpreters
   from mymodule import load_big_data, check_data

   numworkers = 10
   control = interpreters.create_queue()
   data = memoryview(load_big_data())

   def worker():
       interp = interpreters.create()
       interp.prepare_main(control=control, data=data)
       interp.exec_sync("""if True:
           from mymodule import edit_data
           while True:
               token = control.get()
               edit_data(data)
               control.put(token)
           """)
   threads = [threading.Thread(target=worker) for _ in range(numworkers)]
   for t in threads:
       t.start()

   token = 'football'
   control.put(token)
   while True:
       control.get()
       if not check_data(data):
           break
       control.put(token)

Exceptions
----------

* ``ExecFailure``
      Raised from ``Interpreter.exec_sync()`` when there's an
      uncaught exception.  The error display for this exception
      includes the traceback of the uncaught exception, which gets
      shown after the normal error display, much like happens for
      ``ExceptionGroup``.

      Attributes:

      * ``type`` - a representation of the original exception's class,
        with ``__name__``, ``__module__``, and ``__qualname__`` attrs.
      * ``msg`` - ``str(exc)`` of the original exception
      * ``snapshot`` - a ``traceback.TracebackException`` object
        for the original exception

      This exception is a subclass of ``RuntimeError``.

* ``QueueEmpty``
      Raised from ``Queue.get()`` (or ``get_nowait()`` with no default)
      when the queue is empty.

      This exception is a subclass of ``queue.Empty``.

* ``QueueFull``
      Raised from ``Queue.put()`` (with a timeout) or ``put_nowait()``
      when the queue is already at its max size.

      This exception is a subclass of ``queue.Full``.

InterpreterPoolExecutor
-----------------------

Along with the new ``interpreters`` module, there will be a new
``concurrent.futures.InterpreterPoolExecutor``.  Each worker executes
in its own thread with its own subinterpreter.  Communication may
still be done through ``Queue`` objects, set with the initializer.

Examples
--------

The following examples demonstrate practical cases where multiple
interpreters may be useful.

Example 1:

There's a stream of requests coming in that will be handled
via workers in sub-threads.

* each worker thread has its own interpreter
* there's one queue to send tasks to workers and
  another queue to return results
* the results are handled in a dedicated thread
* each worker keeps going until it receives a "stop" sentinel (``None``)
* the results handler keeps going until all workers have stopped

::

   import interpreters
   from mymodule import iter_requests, handle_result

   tasks = interpreters.create_queue()
   results = interpreters.create_queue()

   numworkers = 20
   threads = []

   def results_handler():
       running = numworkers
       while running:
           try:
               res = results.get(timeout=0.1)
           except interpreters.QueueEmpty:
               # No workers have finished a request since last time.
               pass
           else:
               if res is None:
                   # A worker has stopped.
                   running -= 1
               else:
                   handle_result(res)
       empty = object()
       assert results.get_nowait(empty) is empty
   threads.append(threading.Thread(target=results_handler))

   def worker():
       interp = interpreters.create()
       interp.prepare_main(tasks=tasks, results=results)
       interp.exec_sync("""if True:
           from mymodule import handle_request, capture_exception

           while True:
               req = tasks.get()
               if req is None:
                   # Stop!
                   break
               try:
                   res = handle_request(req)
               except Exception as exc:
                   res = capture_exception(exc)
               results.put(res)
           # Notify the results handler.
           results.put(None)
           """)
   threads.extend(threading.Thread(target=worker) for _ in range(numworkers))

   for t in threads:
       t.start()

   for req in iter_requests():
       tasks.put(req)
   # Send the "stop" signal.
   for _ in range(numworkers):
       tasks.put(None)

   for t in threads:
       t.join()

Example 2:

This case is similar to the last as there are a bunch of workers
in sub-threads.  However, this time the code is chunking up a big array
of data, where each worker processes one chunk at a time.  Copying
that data to each interpreter would be exceptionally inefficient,
so the code takes advantage of directly sharing ``memoryview`` buffers.

* all the interpreters share the buffer of the source array
* each one writes its results to a second shared buffer
* there's use a queue to send tasks to workers
* only one worker will ever read any given index in the source array
* only one worker will ever write to any given index in the results
  (this is how it ensures thread-safety)

::

   import interpreters
   import queue
   from mymodule import read_large_data_set, use_results

   numworkers = 3
   data, chunksize = read_large_data_set()
   buf = memoryview(data)
   numchunks = (len(buf) + 1) / chunksize
   results = memoryview(b'\0' * numchunks)

   tasks = interpreters.create_queue()

   def worker(id):
       interp = interpreters.create()
       interp.prepare_main(data=buf, results=results, tasks=tasks)
       interp.exec_sync("""if True:
           from mymodule import reduce_chunk

           while True:
               req = tasks.get()
               if res is None:
                   # Stop!
                   break
               resindex, start, end = req
               chunk = data[start: end]
               res = reduce_chunk(chunk)
               results[resindex] = res
           """)
   threads = [threading.Thread(target=worker) for _ in range(numworkers)]
   for t in threads:
       t.start()

   for i in range(numchunks):
       # Assume there's at least one worker running still.
       start = i * chunksize
       end = start + chunksize
       if end > len(buf):
           end = len(buf)
       tasks.put((start, end, i))
   # Send the "stop" signal.
   for _ in range(numworkers):
       tasks.put(None)

   for t in threads:
       t.join()

   use_results(results)


Rejected Ideas
==============

See :pep:`PEP 554 <554#rejected-ideas>`.


Copyright
=========

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.

gpshead · December 15, 2023, 1:11am

quick initial thought without having read anything in detail: Interpreter.run() might be better named Interpreter.start() to be consistent with the threading.Thread API’s use of those terms. start launches the thread.

Thread.run happens to be the name of the main method that start() has the new thread call - people calling .run() directly on threading.Thread instances has been a point of this users confusion at times.

eric.snow · December 15, 2023, 1:11am

FTR, @guido provided a substantial amount of feedback as I iterated on PEP 734 before posting here. Consequently, it’s a substantially better proposal than it would have been. Thanks Guido!

eric.snow · December 15, 2023, 1:12am

Good point.

brettcannon · December 15, 2023, 1:13am

I think the difference between run() and exec_sync() are too subtle based on the names. My initial reaction was I had no idea about the difference since one says “sync” while the other is a completely different name. If you’re going to try to differentiate between sync and async in the name then I think that should be the difference in the prefix or suffix.

blink1073 · December 15, 2023, 3:17am

Perhaps spawn for background and exec for foreground?

ajoino · December 15, 2023, 9:52am

I’ve been following this work with excitement since I started writing Python in 2017, it seems like a very nice approach to parallelism!

I have only skimmed this version PEP and my initial question is regarding object “sharing”. The Queue seems a bit too restricted atm, but I think the motivation makes sense. Is the current idea that if I want to share more complex objects that I pickle them, send the strings and unpickle them on the other side? I think that’s what the text hints at but I didn’t see that stated clearly anywhere. Maybe it could be stated more clearly if that is the intended workflow?

pitrou · December 15, 2023, 10:13am

I briefly skimmed through the PEP. One thing I have in mind: instead of hardcoding the list of types that support sharing, would it be possible to make this a protocol? Something like __reduce__, but for in-process sharing.

Also, I think interpreters.Queue deviating from the threading and multiprocessing queue semantics by only allowing shareable objects will be annoying users. Perhaps you could have two queue classes, for example a LowLevelQueue for shareable objects only, and a regular Queue that would use pickle for most objects?

Finally, I think a benchmark of sharing vs. pickling/unpickling would be useful to assess the benefits of the sharing mechanism. Right now it’s not obvious whether this really can make a difference.

(note that with pickle 5 out-of-band buffers, you could perhaps simply share those buffers while still using pickle for most datatypes?)

bluetech · December 15, 2023, 11:17am

I think this will be a nice addition to Python. I would definitely like to try it out and see how it works in practice once it’s available.

Some comments on the PEP:

Minor comments

ExecFailure

I think this name doesn’t match Python stdlib convention. Maybe ExecError?

This exception is a subclass of RuntimeError.

Why RuntimeError?

create_queue(maxsize=0) -> Queue

Why is it a standalone function rather than Queue(maxsize=0) as in queue.Queue / asyncio.Queue /multiprocessing.Queue?

Here’s the initial list of supported objects:
…

Queue

I suggest writing this as interpreters.Queue for clarity.

Shared buffer safety?

However, memoryview does not have any native accommodations. The user is responsible for managing thread-safety, whether passing a token back and forth through a queue to indicate safety (see Synchronization), or by assigning sub-range exclusivity to individual interpreters.

What happens if the user does not properly synchronize? That is, multiple interpreters running in parallel writing and reading to the same address at the same time.

If the answer is “data race”, then this ties back to the no-GIL discussions about needing a memory model. Without one, it will not be possible to provide proper semantics for Python programs in the face of data races. Javascript (which has SharedArrayBuffer and Atomics) takes this approach. Here is the JS memory model: ECMAScript® 2024 Language Specification

If the answer is “undefined behavior”, i.e. it’s not allowed and it’s the programmer’s responsibility to avoid it and no saying what happens if it does happen, that would be the first case of undefined behavior in Python AFAIK, and should be stated explicitly.

Transferable Objects

Speaking of JS (whose prior work is very relevant for this IMO), it has the notion of Transferable Objects, which are a different notion than the Sharable Objects defined in the PEP.

While Sharable Objects remain available on both sides, and therefore must be immutable or synchronized, Transferable Objects are transferred to the receiving side (without copying) and “hollowed out” on the sending side, i.e. transferring ownership. This is very useful point between “copying” and “shared memory+synchronization”.

The reason I’m bringing up Transferable Objects is not that it should be a part of the PEP, but that I think it would be good to either make sure the design does not preclude it as a future enhancement, or that it explicitly does preclude it in case it’s not relevant for Python.

pitrou · December 15, 2023, 1:47pm

Note that “undefined behavior” is a language-specific term, it does not relate to what actually happens on the CPU. As such, UB is a notion specific to C, C++ and languages depending on similar compiler infrastructure (such as Rust).

(some instances of so-called “undefined behavior”, such as signed integer overflow in C or C++, are actually perfectly well-defined on the CPU)

Talking about UB when discussing Python semantics is therefore unhelpful and obscures the underlying concerns. Let’s avoid it.

Regardless, I’m sure you can trigger unspecified behavior - i.e. race conditions - quite easily using shared memory, or of course using ctypes, so I don’t think there’s anything new here.

eric.snow · December 15, 2023, 8:08pm

tl;dr I agree .exec_sync() and .run() need tweaking. I’ve summarized various alternatives at the bottom of this post. My preference is:

rename .exec_sync() to .exec()
add Interpreter.call(callable, args, kwargs) that runs in the current thread (like .exec())
replace .run() with .call_in_thread()

I actually did pick “run” partly because it felt more threading-related. You’re right that it could be confusing. I’ll have to consider alternatives, including “start”. (I also need to look up other places “run” shows up at a method name in the stdlib.)

(FWIW, I explain below, in my reply to Brett, the other reason I picked “run”.)

Thanks. That’s good feedback. I agree that the difference between the two methods is very minor:

.exec_sync() - minimal; runs in the current OS thread; almost identical to builtin exec()
.run() - runs in a new OS thread

As proposed, .run() is at best a simple convenience method to cover an anticipated common use case. It is only a light wrapper around .exec():

    def run(self, code, /):
        t = threading.Thread(target=(lambda: self.exec_sync(code)))
        t.start()
        return t

In part, I added it to provide a contrast with .exec_sync(), to make it even more clear that .exec_sync() happens in the current OS thread. I think that contrast is worth it, but perhaps .exec_sync() should just be .exec() and .run() should be .exec_in_thread().

That said, I also added .run() with a broader future (and a deeper contrast) in mind. In a way, I think of .exec_sync() as the low-level method and .run() as the high-level alternative, even if, as proposed, that distinction is currently super weak. Allow me to elaborate…

One of the main reasons I used the name “run” is that I think of it as “just run this code” without restrictions (though in practice there are, of course). “run” seemed like the natural name for it, especially due to addition use cases (with extra complexity) stewing in the back of my head.

# START EXTRA COMPELXITY

(expand)

While currently .run() is a light wrapper over .exec_sync() to add threading, I was already considering the following additional high-level-ly additions:

allow passing arbitrary callables (perhaps even args and kwargs)
serialize (e.g. pickle) as needed
optionally allow it to run in the current OS thread rather than always in a new thread

FWIW, some of that is things I’ll probably have to do for InterpreterPoolExecutor anyway.

Hmm, having written that out made me think. Maybe .run() should be named .call() instead and not support passing script text. (Passing script text would be exclusive to .exec_sync().) .call() would run in the current OS thread and support passing args/kwargs and maybe even returning a value. It might still make sense to support making calls in a new thread, so a separate .call_threaded() might be warranted.

Regarding .call() or other .run() alternatives, note that the PEP mostly aims to strongly parallel (or lightly wrap) the existing C-API, and the C-API has the following distinct function families:

PyRun_*File*() (not exposed)
PyRun_*String*() (Interpreter.exec_sync() and, currently, .run())
PyRun_Interactive*() (not exposed)
PyEval_EvalCode*() (restricted: Interpreter.exec_sync() and, currently, .run())
PyObject_Call*() (perhaps Interpreter.call())

It might also make sense to rename .run() to .threaded() (or .exec_threaded() if the shorter name isn’t clear enough). Perhaps it should just be .thread() (or .new_thread()):

    def thread(self, code, /):
        return threading.Thread(target=(lambda: self.exec_sync(code)))

If we had .call() and .thread() (or .threaded()) likewise supported only callables then .thread() would likewise support args/kwargs. For that matter, .thread() could even closely match the signature of threading.Thread().

# END EXTRA COMPELXITY

I didn’t put any of that extra complexity in the PEP because I’d rather add things later than have to remove them (or, more likely, live with them). Honestly, I’d be fine with just dropping .run() from the PEP and add it (or alternatives) to the module later after letting the idea bake more as a high-level alternative to .exec_sync().

Ultimately, I’d prefer getting the PEP accepted in a minimal state sooner over taking the time to iron out the potential nice-to-haves first.

Good point. I agree that, if they stay so similar, they should share a base name and have a distinct prefix or suffix. Sometimes, it also can make sense to use the same base leave the suffix off the more common case (e.g. Queue.get() vs. Queue.get_nowait()), so the common one is shorter.

Regardless, if they aren’t meant to be so similar then it might make sense to drop the “_sync” suffix (make it just .exec()).

Also, do note that it’s “sync” as in “not threaded”, rather than “not async”. Perhaps that’s too confusing too.

That’s definitely a possibility and demonstrates the spirit of the distinction. I just want to be careful not to over-promise on conceptual equivalence.

Summary

Based on @brettcannon’s, @gpshead’s, and @blink1073’s feedback, I’ve drawn 3 conclusions:

.exec_sync() should be .exec()
we should definitely have a method that explicitly relates to threading, to contrast with .exec()
.run() should be replaced (whether with the threading-focused method or another)

Possible alternatives to `Interpreter.run():

threaded exec
- starts a new thread (drop-in replacement for .run())
  - start(code, /) → threading.Thread
  - spawn(code, /) → threading.Thread
  - exec_threaded(code, /) → threading.Thread
  - exec_in_thread(code, /) → threading.Thread
  - threaded(code, /) → threading.Thread
- does not start the thread
  - thread(code, /) → threading.Thread
calls, to complement .exec() (in current OS thread)
- discarding return value
  - call(callable, args=None, kwargs=None, /) → None
  - call(callable, *args, **kwargs) → None
- preserving the return value
  - call(callable, args=None, kwargs=None, /) → object
  - call(callable, *args, **kwargs) → object
threaded calls (assumes also adding .call())
- starts a new thread
  - start(callable, …) → threading.Thread
  - spawn(callable, …) → threading.Thread
  - call_threaded(callable, …) → threading.Thread
  - call_in_thread(callable, …) → threading.Thread
  - threaded(callable, …) → threading.Thread
- does not start the thread
  - thread(callable, …) → threading.Thread

All the call-oriented methods would require at least some pickling. Each of the “threaded exec” options could later be adjusted to also take arbitrary callables (in addition to “code”) but would never support args/kwargs. Each of the threaded call methods could have signatures matching threading.Thread() (minus “daemon”).

My preference is to add .call() and replace .run() with .call_in_thread():

adding .call() provides improved usability, while still matching what I had envisioned for .run()
.call_in_thread() provides the important contrast with the methods that operate in the current thread (.exec(), .call())

eric.snow · December 15, 2023, 8:51pm

Thanks for saying so! That’s definitely the goal.

Yeah, I definitely want it to be more flexible.

Yep, that’s pretty accurate. I’ll make sure the PEP gets updated to make that more clear.

I’m currently working on a separate PEP for exactly that.

That’s a good point. Honestly, I was already considering falling back to pickle in interpreters.Queue. Here’s my main hesitation: I want a consistent explanation for the relationship between the send object and the received one. My preference for that relationship is “they may not be the exact same object, but they might as well be”. At the least, it’s an illusion I’d like to preserve. For immutable objects, that’s easy. For mutable ones, not so much.

Sharing mutable objects would either require that the underlying data is truly shared between the interpreters (a hard-ish problem) or that each interpreter have its own copy. The former would preserve my desired relationship between the objects. The latter would invalidate it unrecoverably. Pickling would mean the latter.

That said, having two distinct classes does seem like a way to preserve the constraint I want on a selective basis. I’ll think it over. It might also be doable as an optional flag on create_queue().

That’s a good observation. I’ll keep it in mind.

That’s a good point.

Partly it’s because queues live in the space between interpreters. create_queue() seemed to communicate that well. In contrast, Queue() creates an interpreter-specific object and might communicate the wrong thing.

That said, I had the idea for create_queue() (rather, its equivalent) over 6(?) years ago and haven’t thought about it much since. I might think of it differently if I consider it now. There also isn’t much of a technical reason for the separation at the point, especially since I recently changed Queue instances to be singletons (in each interpreter) and require that they match existing queues. I’ll have to think this over.

(FWIW, the same goes for interpreters.create() and `Interpreter’.)

As @pitrou noted, that’s something users already have to manage, whether from Python code or in extension modules.

Ran Benita:

Transferable Objects

Speaking of JS (whose prior work is very relevant for this IMO), it has the notion of Transferable Objects, which are a different notion than the Sharable Objects defined in the PEP.

While Sharable Objects remain available on both sides, and therefore must be immutable or synchronized, Transferable Objects are transferred to the receiving side (without copying) and “hollowed out” on the sending side, i.e. transferring ownership. This is very useful point between “copying” and “shared memory+synchronization”.

The reason I’m bringing up Transferable Objects is not that it should be a part of the PEP, but that I think it would be good to either make sure the design does not preclude it as a future enhancement, or that it explicitly does preclude it in case it’s not relevant for Python.

Yeah, that’s a cool idea. We’ve discussed it before. As you noted, it’s something to be addressed after this PEP. You’re right about awareness affecting design though. Thanks for bringing this up.

eric.snow · December 15, 2023, 9:04pm

That’s nice to hear! I do plan on releasing a 3.12-specific module on PyPI within the next few weeks.

It used to be RunFailedError. I had changed it to ExecFailedError but the redundancy vibes were bugging me, hence ExecFailure. Notable exception types without “Error”: KeyboardInterrupt, queue.Empty.

The problem with ExecError is that it could also represent something going wrong in the execution machinery, whereas we want an exception type that strictly covers “the code you ran raised an uncaught exception”.

That’s something that made sense to me 6+ years ago, but my 2023 self isn’t sure how subclassing RuntimeError provides any advantage. I’ll think it over. Thanks for pointing this out.

ncoghlan · December 21, 2023, 1:46am

The cleanup and updates look like a good step forward to me!

I won’t repeat the points already discussed above, but other key thoughts & questions that occurred to me while reading:

the expected behaviour for Python implementations that don’t offer subinterpreters needs to be defined (could be as simple as “there is no interpreters module”, but an explicit query flag in sys.implementation could also make sense, as then the error message on importing the interpreters module could be more explicit about the problem)
can the subinterpreter specific ExecFailure be dropped and ExceptionGroup used instead? Even if it is only ever a group of one in the direct subinterpreter invocation use case, it still feels like the same flavour of error container to me (although it may get confusing when attempting to catch foreign exception types)
I think Antoine is correct that the cross-interpreter Queue design needs further thought. One advantage of the “channel” terminology was that the type restrictions were more acceptable without the semantic implications of the “Queue” terminology established by the threading and multiprocessing modules

Other comments:

Specification section should appear before the Rationale section (as the latter assumes reader familiarity with the former)
the create APIs need to be discussed in the Rationale section (i.e. as you noted above, they create both the underlying instance and the running interpreter’s proxy to it, they don’t just instantiate the proxy as implied by a direct constructor call)
I like your proposed exec()/call()/call_in_thread() method rename

eric.snow · December 21, 2023, 5:48pm

Thanks for the feedback!

That’s an interesting idea, but I’ll have to think about it more. The conceptual relationship between the two exception types isn’t standing out to me, other than that they both wrap other exceptions. The main difference is, I think, critical though: ExceptionGroup wraps actual exception objects, while ExecFailure wraps a snapshot of an exception (a “remote” one, at that). That said, being able to use except * to catch the wrapped exception is something that had not even crossed my mind. It’s an intriguing and alluring idea.

Ultimately, I want the API to feel as natural to users as possible (i.e. principle of least surprise). In this specific case, what will feel most natural (and be most useful)? I think what I’ve proposed fills that need (but also leaves room for a more elegant solution in the future).

(expand for further detail)

The behavior of comparable existing features/modules doesn’t provide sufficient insight; I’m not sure there’s any precedent for the combination of (1) wrapping one exception in another, (2) the wrapped exception was uncaught in some “remote” execution, and (3) it is a snapshot of the exception (including, importantly, its traceback) rather than the actual exception object.

Again, I did consider precedent (though not ExceptionGroup):

multiprocessing: has “remote” uncaught exceptions, but doesn’t propagate them synchronously
threads: don’t propagate exceptions either, but there are several ways to add an exception hook to deal with uncaught exceptions
asyncio: doesn’t seem like a good fit (but I haven’t looked closely enough, I suppose)
subprocess: propagates remote errors but not uncaught exceptions (and certainly not with tracebacks)
exec(): uncaught exceptions are propagated synchronously but are neither remote nor snapshots
calls: similar to exec()
__cause__/__context__: expected to be set to exception objects, not snapshots

Also note that I’m trying to focus on what is actually useful for users here. I tried all sorts of variations, but, in the end, I realized that the one thing that mattered was showing the traceback for uncaught exceptions. (That realization came from actually trying to use subinterpreters in real code.) In effect, if the propagated ExecFailure is uncaught then the default sys.excepthook will show the remote exception as it would have shown it if it were “local” (i.e. traceback and str(exc)).

As I iterated through the design, at first I considered reconstituting the remote exception locally. However, that was too tricky, hid the remote origin of the exception, and introduced a conceptual ambiguity around whether to wrap the exception or not. I also looked into propagating a full snapshot that captured all the information of the exception (including attributes) and could be used as a substitute for the remote snapshot. Before I got in too far it started to feel like overkill and was tricky enough to make me reconsider.

I also looked at this from a different angle: making use of ExecFailure.__cause__ to wrap the remote exception. However, that required either an actual exception object or changing __cause__ to work with exception surrogates/snapshots. It was at that point I realized all I really cared about was the error display for uncaught exceptions.

To show the traceback we don’t need more than a wrapper exception (ExecFailure) and a minimal snapshot. If the user needs to handle the remote exception then they can do it in the “remote” code (or perhaps via some kind of excepthook).

FWIW, I’ve also thought about introducing a builtin RemoteError type, to represent an exception originating in any kind of “remote” execution and formalize what I’m trying to do with interpreters.ExecFailure. (ExecFailure would be a subclass, or I’d drop it.) As a builtin, RemoteError could be special-cased by sys.excepthook()/traceback.TracebackException.

I tabled the idea since it would require extra design/specification, and I found I didn’t really need it for PEP 734 (which I’m trying to keep minimal). It also felt like overkill for what would currently be a single use case.

The main reason I went with a Queue-like type is familiarity for users. FWIW, I’m still ruminating over the potential consequences of using pickle by default, particularly as it relates to “sharing” mutable objects.

Mostly, I was following PEP 1, as well as what I thought was precedent. However, switching the order definitely makes much more sense.

That’s good to hear! It’s still my preference but I’m also still mulling it over.

MegaIng · December 21, 2023, 6:04pm

I would suggest two different transfer methods/formats, one CopyQueue that can transport anything copyable (with potential optimizations for immutable things that can be shared) and one ShareQueue or a similar interface that can only transfer things that can be shared. Notably, these two queue objects would themselves not be copyable, but only shareable: It wouldn’t make sense to write these out to disk, so they shouldn’t support they pickle interface and therefore not be transportable via the CopyQueue. (Naming is hard: I am not really happy with either of these two).

These separation would make it clear to the user what is happening and reduce surprises. Note that for something like call, it might make sense to specify that the arguments can be either shareable or copyable so that you can directly pass over the queues. But that would IMO not be as clean as forcing users to make the choice consciously to share objects

steve.dower · December 21, 2023, 6:04pm

I think the idea of LowLevelQueue makes a lot of sense here, leaving the Queue name available for a richer version to come later.

You can probably write a recipe for using pickle by default with LowLevelQueue, but I think it’d be a waste to make that the “standard” protocol. We’ll want to be able to define a new __proxy__ or similar (to mirror __reduce__/__getargs__) that knows how to construct an object on the other side assuming the original one still exists (and hence, any native memory/etc. is also available), and that can be part of Queue.

But I think we’re still at the stage where we want 3rd party packages to design the Queue object, and users can use those on each end of the channel. So the PEP right now probably only wants to say we intend to add Queue, but not yet, and add a deliberately simple one as a building block now.

guido · December 21, 2023, 7:05pm

I think there will always be a notion of shareable objects — though that’s a poor name, it’s really about things that have value semantics. And the interpreters module can have a Queue that only allows values. Over time the definition of “value” can be adjusted.

Others can define more general queues in different namespaces, using pickle, proxies, or whatever else people invent.

ncoghlan · December 22, 2023, 12:18am

Yeah, that’s where my mind was going, too. However, after further thought, I realised it won’t work in this case since the exception types won’t reliably match across interpreters (although they might, more on that below).

Instead, I think you’re right that the better precedent is the lightweight TracebackException objects defined in the traceback module that allow reporting of full tracebacks without keeping entire frame stacks around. This is especially applicable with a shift to supporting pickle-able objects by default when calling sub interpreters, since the result could be composed in the executing interpreter and then passed back via pickle.

Alternatively, a more general solution that doesn’t make assumptions about imported modules in the calling interpreter would replace object references with their string representations (and always skip exporting locals info)

Even with the exception existing solely in the interpreters module, I think further clarification is needed on the translation of the snapshot between the two interpreters:

what happens when an exception in the snapshot comes from a module that the caller hasn’t imported?
can the exception processing implicitly trigger code execution in the caller? (that question also applies to regular object returns & queue usage)
what default options are used when creating the TracebackException object in the called interpreter? Can those options be configured?

FWIW, I’ve also thought about introducing a builtin RemoteError type, to represent an exception originating in any kind of “remote” execution and formalize what I’m trying to do with interpreters.ExecFailure. (ExecFailure would be a subclass, or I’d drop it.) As a builtin, RemoteError could be special-cased by sys.excepthook()/traceback.TracebackException.

I tabled the idea since it would require extra design/specification, and I found I didn’t really need it for PEP 734 (which I’m trying to keep minimal). It also felt like overkill for what would currently be a single use case.

I agree that a built-in “ForeignException” type would need to consider more use cases (e.g. true FFI, where the called target isn’t written in Python), so doing something more specific to this use case makes sense.

I think I’d prefer a bikeshed painted that kind of colour from a naming perspective though, and have the interpreters module docs refer consistently to local and “foreign” objects rather than local and remote ones (as unlike multiprocessing, the foreign objects are still in the same process on the same machine, as they are for a cross-language FFI, they’re just not part of the local Python object graph).

a-reich · December 22, 2023, 3:34am

I’ll add my reaction as a user that I’m pretty excited by this PEP, and the potential concurrency patterns it enables! (I’ve started playing with the implementation a bit too).

I’m especially interested now that the proposal supports sharing buffer views. I come from the data analysis/ML area, and in my experience when using concurrency libraries transferring arrays/dataframes between processes had a major impact on performance. But for these kinds of objects, most of the “size” is backed by binary buffers, and as Antoine noted this aligns well with pickle’s out-of-band buffers which offer an existing general protocol for this. (Unfortunately, none the popular extension libraries seem possible to run in subinterpreters yet.)

My two cents on some of the choices debated:

+1 to having a low level method like the current exec_sync as well as the higher level ones like call(callable, args=None, kwargs=None, /) → object or call_threaded.
While I understand the reasons for preferring a more minimal initial API, e.g. that we don’t know what will work well until more user experience, IMO to get the adoption that will provide that experience it will be important to have an option for sending general objects via pickle or something (ideally customizable) like it. If most people wanting to use it will need to write/find a wrapper for handling other types, and those wrappers will mostly work similarly, why not provide that code already to get them started. But maybe it makes sense to have that in a PyPI package until later, I don’t know.