Clarification on PEP 734 subinterpreters with embedded C API

freakboy3742 · February 14, 2024, 3:59am

I’m currently elbow deep in landing the patches for iOS support, and think I’ve hit an edge case in either the definition or implementation of PEP 734 (the new subinterpreters PEP).

PEP 734 says:

When a Python process starts, it creates a single interpreter state (the “main” interpreter) with a single thread state for the current OS thread. The Python runtime is then initialized using them.

That makes sense at a high level; however in practice it’s unclear to me how/where this initialization is meant to occur.

The _PyInterpreterState_SetRunningMain() method that sets up the main subinterpreter is currently invoked in pymain_run_python(). If you’re using a standard CPython interpreter, this works great - the main interpreter state is set just before the user’s code is imported and executed, and cleaned up afterwards.

However, an iOS app doesn’t use pymain_run_python() - as there’s no command line experience, an iOS app is effectively a custom executable that embeds a C Python interpreter using the C API.

As a result, when I run the test__xxsubinterpreters test case on iOS, all the tests pass, except one - IsRunningTests.test_main() - because the main subinterpreter state hasn’t been initialized. The problem is that unless I’m missing something, there there’s no public API to do this. There’s a public API to initialize new subinterpreters - but not the main subinterpreter.

Although I’m hitting this as part of the iOS work, I believe the same problem will exist with any embedded CPython usage. Any of the example code in the CPython embedding guide would report that the main subinterpreter is uninitialised.

It seems to me that either:

_PyInterpreterState_SetRunningMain() should be invoked as a side effect of calling Py_Initialize()/Py_InitializeFromConfig(), and cleared in Py_Finalize()
There’s a missing C API endpoint to initialize the main interpreter.
An iOS app should be calling private APIs to initialize the main interpreter (essentially treating the iOS app as an alternate Py_BUILD_CORE target)
An embedded CPython interpreter is a special case that doesn’t have a main subinterpreter for some reason.

My best guess is that (1) is the most appropriate response - but I’m not sure if that refactoring would have other unintended consequences (or, for that matter, where in the interpreter initialisation process would be appropriate).

Have I missed something obvious here? If (1) is the right approach, any tips on where the initialisation should occur? I’m happy to work up a patch - I just need to be pointed in the right direction.

pitrou · February 14, 2024, 8:47am

Unless something changed recently, Py_Initialize() creates the main interpreter and activates its main thread for you. This is why this example program is able to run Python code immediatly after Py_Initialize returned successfully:

freakboy3742 · February 14, 2024, 9:11am

I agree that Py_Initialize() creates the main interpreter, and Python code is able to execute immediately afterwards.

My query is entirely about the subinterpreter API. As currently implemented, if the Python string that is executed by that example is altered to read:

import _xxsubinterpreters as interpreters
print(interpreters.is_running(interpreters.get_main()))

the output will be False, because _PyInterpreterState_SetRunningMain() is not invoked as part of Py_Initialize() - it’s only invoked as part pymain_run_python().

pitrou · February 14, 2024, 10:25am

Ah, sorry for the misunderstanding. I guess that’s a bug? @eric.snow

steve.dower · February 14, 2024, 11:30am

It’s a bug right now, and also partway through a redesign. So whatever change occurs to fix things now, I’d expect the highest compatibility with existing code followed by immediate deprecation so that we can change it to a better overall structure.

(Eric and I are both keen to see initialisation become a process of “init runtime → create interpreter → run code” rather than the current “preinit runtime → init runtime → run code”. So eventually we hope that embedders will create the main interpreter themselves and mark it as such, assuming we can’t just entirely remove the main/sub distinction.)

eric.snow · February 14, 2024, 5:52pm

tl;dr Yeah, we need to fix something here. I’ll work on a short-term solution right away.

Thanks for pointing this case out, @freakboy3742! It’s definitely a bug, one way or another.

Have you opened an issue for this? If not, I’d be glad to open one.

_PyInterpreterState_SetRunningMain() is not part of initialization. Instead, it’s the means by which the application indicates that it’s taking control of an interpreter’s __main__ module, typically to run code in it. That’s happening after initialization. Sorry for how overloaded the word “main” is here! (Also for how tangled and convoluted Py_Main() and initialization still are!)

Anyway, the key word in the function name is actually “running”.

That test should definitely not be failing. The failure indicates that this “running”-tracking feature is incomplete, whether in design or implementation or both. It doesn’t matter that the feature is currently meant to be internal-only.

The easy thing would be to skip the test under embedded applications, and that might be an appropriate short-term fix. However, I’m not comfortable with that long-term (plus the test would be needed unconditionally if PEP 734 is accepted).

Ultimately, at least one of the following needs to happen:

always assume the main interpreter is running (in the main thread)
make calls to the “running”-tracking API implicit to calls to the PyRun_*() family (and similar)
infer the “running”-tracking API should have been called in certain situations
make it public API
stop tracking if an interpreter is “running” (i.e. drop the API and the related state)

FYI, my motivation for _PyInterpreterState_SetRunningMain(), etc. is to facilitate the Interpreter.is_running() method in PEP 734. Perhaps there are other cases that would benefit, but I haven’t made any effort to find them. Clearly I didn’t consider embedders as I focused on what I needed.

Making the API public is probably the best solution long-term. However, I’d rather not give embedders one more thing they have to do. If there’s any way we can avoid that then I think we should, though we’d still have the public API to cover any gaps.

Note that _PyInterpreterState_SetRunningMain() is currently only used in tests for a module that itself is only meant for use in the test suite (though that module is also helpful for people interested in subinterpreters, and it may become the PEP 734 implementation if the PEP is accepted).

eric.snow · February 14, 2024, 6:02pm

As an aside, yours is an excellent example contrasting how we (CPython) embed the runtime vs. how others embed it. There is still too much (non-zero) overlap between Py_Main() and runtime init. It’s all a bit tangled. That’s definitely part of the ongoing effort to which @steve.dower referred. Examples like yours are very helpful in clarifying where we’re deficient, so thank you!

eric.snow · February 14, 2024, 6:52pm

I opened `test.test_interpreters.test_api.TestInterpreterIsRunning.test_main` Fails in Embedded App · Issue #115482 · python/cpython · GitHub.

freakboy3742 · February 14, 2024, 10:56pm

I hadn’t got as far as an issue, as I wasn’t 100% certain it wasn’t an error of usage on my part.

For posterity - the PR you’ve submitted fixes the problem in my testing. Thanks for the fast turnaround on that one.

That’s actually the reason I hadn’t noticed the problem until now - I had the _xxsubinterpreters module commented out of my iOS patch on the basis that the name suggested it was an optional testing mechanism. I started seeing the failure when I re-introduced the module yesterday.

FWIW - the overall “init, create, run” workflow that Steve described makes sense to me. Happy to provide feedback on any API designs from the perspective of an “alternate python.exe implementer” when it gets to that point.