PEP 734: Multiple Interpreters in the Stdlib

The SC has been evaluating and discussing PEP 734 since it was sent our way last month (PEP 734 -- Multiple Interpreters in the Stdlib · Issue #234 · python/steering-council · GitHub), and we have some concerns. This is partly with my RM hat on, having seen the kinds of oversights we’ve had to fix in 3.12 late in the release cycle of 3.12.0, and as backported fixes. It’s understandable, given the kind of changes that are required, their scope and the limited experience (and limited number of eyeballs) involved, but it does worry me. On top of that we have a completely new API that we have basically no experience with, even though it’s somewhat similar to other solutions we’ve seen (like concurrent.futures).

How invasive is the PEP 734 implementation in the interpreter? To what extent could it be released as a PyPI package first, before we consider it for inclusion in the standard library? If it can’t entirely be a separate package, could we make a private module with the minimum number of hooks for such a PyPI package to work? Having it be a separate package would make it much easier to evolve the API and, for example, fix problematic semantics, since you can release at will and users can always pin to an older version of the package while they make their code work with the newer version.

Also, the SC would like to invite you to come talk to us about the PEP, either at the regularly scheduled office hour, or if that doesn’t work for you, we can schedule something separately.

8 Likes

Thanks for the update. I appreciate your thoughtfulness.

Most of those fixes related to subinterpreters generally, especially isolated interpreters. The effect of PEP 734 is primarily in how it makes the existing feature more available and thus exposes existing flaws. What is your concern relative to this PEP?

At its core, PEP 734 has only a few parts:

  • expose the existing C-API (which has been around a long time) via a lightly-wrapping abstraction
  • provide a simple, safe way for one interpreter to run code in another (similarly a light implementation on top of existing C-API)
  • provide a basic mechanism for safely sending data from one interpreter to another

(There’s also a small amount of “sugar” on top of that.)

I’d consider the first part to be uncontroversial. The second part is focused and simple.

The third part is fairly important because subinterpreters aren’t nearly as useful without a way to communicate between them. That became clear almost immediately in the early PEP 554 discussions. PEP 734 presents an API that is almost identical to the existing queue.Queue, so there shouldn’t be any real surprises for users.

From my perspective, the PEP 734 API presents very little risk. My guiding principle from the beginning has been to present a minimal API on which we could build. Based on the many discussions this PEP has had, and on the practical experience with it that I and others have had with the implemented API, I’m confident it is a good starting point.

To me, that’s the key thing here: we want a solid foundation in the stdlib on which people may start using multiple interpreters in their programs. We can build from there as appropriate.

There are many things I’ve implemented that were mostly inspired by the needs of PEP 734 or because of my experiences with it, but most (or nearly all?) of them are valuable on their own. They’ve certainly helped us identify flaws in CPython over the last five years.

When it comes to invasiveness, I’ve been careful to keep a lot of the relevant code isolated to certain files. Nearly all the C-API I’ve added is strictly limited to the internal API (Include/internal/pycore_*.h).

If you were to ask what we would rip out if PEP 734 were rejected, I’m not sure that I’d put anything on that list. Basically it is valuable in testing subinterpreters, at the least.

To put it a different way, there are only a handful of things in the repo that are currently only used by the PEP 734 implementation (and they are almost all consolidated in specific files). That includes only one piece of runtime state: the internal “cross-interpreter data registry”, which I’d already like to replace with a new type slot (via a separate PEP) rather than adding a public API.

Early on I worked hard to implement PEP 734 in a way that I could publish it on PyPI. In fact, my plan has been to publish such a package for use with 3.12. The same could be done for 3.13, though I think it’s better suited for the stdlib.

My main concern with publishing a PyPI module is that there will be less exposure to the feature and less accessibility. I’m certainly biased here, but I’m convinced the multiple interpreters feature is a great benefit to users and want it in their hands with the least friction possible. Again, I’m confident that the minimal API provided by PEP 734 is the right place to start.

That minimal module is basically what PEP 734 specifies. :smile: We could certainly move the PEP 734 implementation to a PyPI package, either my building on the necessary existing internal C-API or via what is currently the _xxsubinterpreters module. (That said, I don’t see the value in doing so over adding the new stdlib module.)

I’ll meet at the office hours this week.

7 Likes

I think this is true for the proposed semantics, but I’m not 100% convinced it is true for the exact naming choices.

Specifically, given the PEP (and presumably documentation) terminology focuses on “shareable objects” (vs arbitrary objects), the name of the syncobj flag on interpreters.Queue objects seems odd. It feels like share_data would better describe the behaviour being requested.

For the notion of operating on PyPI for a release cycle, the natural API split point would presumably be to have the _interpreters module in the standard library, and put interpreters on PyPI (including InterpreterPoolExecutor).

The main advantage I see to this more conservative approach is that it would allow some of the open questions in the PEP to be deferred until the module’s promotion to the standard library (whether that comes next release or later), specifically:

  • default behaviour of Queue objects when the interpreter adding the object to the queue goes away before the object is retrieved. The current default feels prone to “errors passing silently without being explicitly silenced” to me, akin to the original handling of async tasks that never get scheduled.
  • exact API design for InterpreterPoolExecutor (in particular, how the new interpreters are configured, and how the pools support execution of configuration code in a way that ensures every interpreter in each pool is configured exactly once. This will presumably be similar to ThreadPoolExecutor and ProcessPoolExecutor, but I’m not sure the existing initializer API will quite be sufficient for InterpreterPoolExecutor)
  • whether some improved ergonomics are feasible for cross-interpreter exception handling (e.g naming a parameterless context manager as a dotted-string when calling Interpreter.exec, so the actual execution in the other interpreter runs inside a with statement using that context manager)
  • building out a list of other not-yet-shareable object types where it would be genuinely helpful to be able to share them
  • ensuring the data buffer sharing works as expected with other data buffer exporters (NumPy, etc)

While the initial level of adoption will definitely be lower than for a generally available standard library module, I think the adoption you’ll see will be from exactly those you most want at this stage of feature publication: folks that aren’t happy with the trade-offs between threads and processes, and are actively looking for something that strikes the middle ground of combining low-overhead data sharing with strong default data separation.

6 Likes

Hi Eric,

Thank you again for coming to the Steering Council’s office hours to discuss PEP 734. We all found it very helpful to talk about the PEP with you in real time.

After much discussion, the Steering Council thinks that the best way forward with this module is to maintain it separately for now, release it on PyPI, and let it mature there for a while before including it in the stdlib. There are several reasons leading us to this decision.

We think the API needs more real-world usage before it can be deemed stable. It may indeed be the best API available, but without some maturation on PyPI, we can’t really know for sure. With the module being independently developed and released, its API can evolve much more quickly than it can once the module is in the stdlib. It will also not have to adhere to the strict backward compatibility and deprecation rules of the stdlib. The Steering Council thinks this is a good policy in general for new stdlib packages, and plans to propose this as “standard operation procedure” for most new packages.

You expressed a concern about the maintenance burden of a PyPI release, but we think that’s solvable. You should be able to recruit co-maintainers, either from the current cohort of core developers, or from users of the subinterpreters package. We’re also confident that we can get help with setting up any automation needed for testing and releases. It shouldn’t be much more of a burden developing and releasing it independently for now than it would be in the stdlib.

We are going to mark the PEP as Deferred, and we can always reevaluate stdlib inclusion for a future version of Python.

Cheers,
-Barry (on behalf of the Steering Council)

6 Likes

My understanding of this PEP is that would mean Eric can add the API as private but exposed (e.g. a _subinterpreters module) and then use a PyPI package to expose it? That allows some freedom to change it in a subsequent release, but the PyPI package probably wouldn’t amount to much more than from _subinterpreters import * (and InterpreterPoolExecutor, which I agree could live outside of the stdlib).

In other words, it’s “provisional” except we don’t mark stuff as provisional anymore so it’s just “private”.

The whole concept is based around exporting internal functionality, so it’s not like the API can be separated from the interpreter. Behaviour changes have to occur within the runtime, not within the module, and the interesting development is all at a level higher than proposed here (again, except InterpreterPoolExecutor). Right now, ctypes would be needed to access our internal C APIs to get the same behaviour, and I don’t think there’s really any way to do the synchronisation needed to actually make that work.

I’m sure Eric presented this analogy, but this module here is essentially exposing os.fork so that 3rd parties can develop multiprocessing outside of the stdlib. If the SC is okay with exposing os._fork for now, then I’m sure that’ll be workable, but if the SC is going to be surprised at the new internal APIs showing up with no stdlib users, it’d be good to clear that up sooner rather than later.

2 Likes

Thanks for the taking the time to consider the PEP and for being clear about the position of the Steering Council. While I don’t agree with the decision and would have liked more direct discussion with the Steering Council, I’m on board and ready to move forward.

FWIW, I do see the point you (and Alyssa) have made about uncertainty with the design choices (e.g. names), regardless of the scale of the proposal. I also agree that community feedback from an implementation of the PEP on PyPI has a good chance of identifying potential improvements.

(Personally, I consider the advantage of exposure in the stdlib to outweigh the risk of having to tweak the API in later versions, given the small proposed surface area. That said, I’ll readily admit that my perspective is skewed toward the value I anticipate the new module to add for Python users, making it hard for me to tell if I’m weighing things fairly.)

Here’s my plan for 3.13:

  • wrap up the various various 3.13 fixes I have in flight (or planned)
  • rename the _xxsubinterpreters module to _interpreters
  • work on a PyPI package (3.12/3.13) that uses _interpreters (still keeping it minimal)
  • look for collaborators

There are alternatives for the second point: expose a bunch of necessary internal C-API in the public API, or use the internal C-API directly (i.e. with Py_BUILD_CORE). However, I’d much prefer using the low-level _interpreters module as it make a lot of things simpler.

9 Likes

Also, I have some feedback for the Steering Council on how things have played out with this PEP, involving level-of-interaction and clarity on the decision. I’m not interested in complaining/venting, but rather want to identify some key good/bad parts of my experience here, in the hope that it helps the Steering Council and the community. (I’d also like to see what we can do to better support the Steering Council, who are volunteers like the rest of us, but have a distinct (and challenging) role as gatekeepers.)

Where would be the best place to start such a discussion?

5 Likes

That seems like a good plan to me [1]. Keep in mind too that the PEP is deferred so you can definitely come back and ask for a re-evaluation for a future Python release.


  1. wearing my core dev hat, not necessarily SC member hat ↩︎

3 Likes

Speaking with my SC member hat on, I certainly would welcome feedback. Maybe start with another SC office hours session? If you wanted to write down your thoughts first, an email to the steering council?

3 Likes