PEP 788: PyInterpreterRef: Interpreter References in the C API

I have nothing else to say, except that we’d really like to use this in PyArrow where we’ve had several reports of crashes or hangs due to calling into Python too late at process shutdown.

3 Likes

Belatedly catching up on this PEP after @encukou mentioned it in an old issue about PyGILState_Ensure() not working in subinterpreters.

The proposed API looks good to me, and the fact that any code using PyGILState_Ensure() would need to be updated to handle it potentially failing is a solid rationale for deprecating that API entirely in favour of the new one rather than trying to make the old API more flexible.

1 Like

Adding a cross-reference to the C-API working group issue, where there has been some further discussion in the past week (cc @ZeroIntensity):

A

1 Like

Just combining threads, the C API WG seems to want the following changes:

  1. Directly return PyInterpreterRefs and use 0 as the error value, rather than the extra hoops with passing local variable addresses.
  2. For future-proofing, PyInterpreter[Weak]Ref_Dup should be failable.
  3. Add clarification to when PyThreadState_Ensure can fail.
  4. Rename PyInterpreter[Weak]Ref_Get to PyInterpreter[Weak]Ref_FromCurrent.
  5. Move PyInterpreterRef_Main to the unstable API, and also rename it. I haven’t seen consensus from the WG on what to rename it to, but they seem to be leaning towards PyUnstable_InterpreterState_GetDefaultRef. I’m personally not a big fan of putting this under the PyInterpreterState namespace, but I won’t argue too much.
  6. Rename PyInterpreterWeakRef_AsStrong to something else, but hasn’t been agreed upon what yet. I’d personally like PyInterpreterWeakRef_Promote.

Given that most of this is bikeshedding, I feel that this is a good sign that the general approach of the proposal is liked by the WG.

cc @steve.dower, in case you wanted to share any of your thoughts here.

1 Like

Well, mostly Steve rather than the whole WG :‍)

I still think it would be better if references are opaque, so you can’t compare them to each other – otherwise people will do that to see if they refer to the same interpreter, no matter how much we call that an implementation detail.
Of course, this has the disadvantage of not allowing comparisons to 0, which requires the hoops.
But, if you’ve been convinced that allowing comparisons is better, I won’t veto the PEP over it.

Same here. Why not PyUnstable_InterpreterRef_FromMainInterpreter? It should be OK to tie unstable API to implementation quirks we’d like to phase out.

The rest does feel like improvements, or bikeshedding :‍)

1 Like

Maybe not, I can’t think of that many places where comparing a PyInterpreterRef would be useful, even with the knowledge that interpreters can compare equal. What did you have in mind?

I think that == 0 does make usage a lot simpler. We could also add a PyInterpreterRef_NULL or PyInterpreterRef_IsNull to hide it:

PyInterpreterRef ref = PyInterpreterRef_FromCurrent();
if (ref == PyInterpreterRef_NULL) {
    goto fail;
}
PyInterpreterRef ref2 = PyInterpreterRef_FromCurrent();
if (PyInterpreterRef_IsNull(ref2)) {
    goto fail;
}

I like that too, but it’s a little long. In my update PR, I named this PyUnstable_InterpreterRef_GetMain.

1 Like

A function sounds good to me. Perhaps PyInterpreterRef_IsValid?

The thinking here is that while we currently have the concept of a “main” interpreter, we don’t need it, and hope to one day no longer have to treat one as special.[1]

However, situations like the ones you’ve described do mean that sometimes users want “any” interpreter. Similarly, things like handling signals need somewhere to run.

So rather than embedding the concept of “the main interpreter” (which is not a concept we want to entrench), I suggested a “default” interpreter. It’s more neutral, but fundamentally means the same thing (though you can safely change a default interpreter, whereas a main interpreter cannot be changed).

And since the object is the interpreter state, then the main/default is an interpreter state, not a reference (we don’t have a “main reference”). So semantically, you’re getting a reference to the main/default interpreter state, and as such, it’s an interpreter state API.


  1. One very concrete example of why: currently the “main” interpreter has to be first, but we might like to spin up a constrained interpreter first to evaluate the settings for the actual first interpreter (e.g. to run getpath.py). We briefly want the constrained one to be the default, but then want to switch it to the “real” interpreter. Currently, our implementation prevents it, and if we further entrench the current concept of the main interpreter it becomes even more prevented. ↩︎

2 Likes

I’d prefer IsNull. IsValid reads to me as checking if the interpreter is valid, not the reference.

The problem is that PyInterpreterRef_Main only works because it’s the “main” interpreter, which is statically allocated on the runtime state. A heap-allocated default interpreter wouldn’t work well for this API because it could be deallocated concurrently.

1 Like

This is why we want to be rid of it - we don’t want any statically allocated (global) state. So if this is your reason why we need to use it, I’m now a big -1 on this particular API.

The point of adding an API is to retrieve it safely, at which point you have a strong reference and so it won’t be deallocated. If we can’t do that inside our own API, then users have no hope (so we’d better make sure we can do it).

2 Likes

I totally agree! People should absolutely be using PyInterpreterRef_Get instead of blindly grabbing the main/default interpreter. But, I heard complaints about how it was too much maintenance to get interpreter references into every single PyGILState_Ensure call in some codebases. PyInterpreterRef_Main is solely a compatibility shim for those people.

I think you’re right, my bad. I was imagining we’d just have a global PyInterpreterState * on the runtime (which wouldn’t be thread safe), but it’s essentially a global weak reference, so I guess we could just expose something like PyInterpreterWeakRef_GetMain instead. I’d imagine it would have to be immutable, though. It couldn’t switch to some arbitrary “default” interpreter, because an application should be able to expect that objects created after one call to GetMain are still valid after a later call to GetMain.

Why?

They can assume that objects created after activating a certain interpreter state will continue to be valid in that interpreter state, but there are so many ways you already can’t actually make the quoted statement true (most obvious, if the runtime finalises and restarts) that we probably should just not allow it at all.

My expectation is that the most likely reason people will want to grab any arbitrary interpreter state would be to do an action (e.g. log something), not to create a value. And if they’re at all multi-interpreter aware then that’ll involve some kind of serialization out of necessity (because the only way you can assume you’re getting a particular interpreter is to not use _Main, since you don’t even necessarily know which interpreter started your code running).

The most we can really promise is that python.exe (or equivalent) won’t change the default/main interpreter, but embedded runtimes might. I have no problem explicitly discouraging the use of _Main (or whatever we name it) by making it sound unreliable - it really shouldn’t be that hard for libraries to capture the interpreter that was active when they were invoked and to keep that reference around until they’re done. Provided we can decref/dealloc an interpreter from any (native) thread, making _Main a risky hack is fine by me.

Ok, that all makes sense to me. I’m fine with naming it GetDefault, but I’m still not convinced about the PyInterpreterState namespace. I feel like it’s really stretching the definition of what PyInterpreterState should be for. Everything else regarding interpreter references will be in PyInterpreterRef, so why not this one?

Is that something that people actually do in practice? To my knowledge, there’s a bunch of memory that’s leaked on the runtime, so I’d imagine that most embedders just want to do it once for the lifetime of the entire process

1 Like

You seem to think that the interpreter reference is the concrete object, rather than being a reference to the concrete object. But that doesn’t follow - I can’t create a “new reference”, I can only create a new reference to an existing interpreter state. The object is the interpreter state, and the reference is the reference. (I know that phrasing seems a bit patronising, but it literally is in the name :wink: If you’d called them “interpreter handles” then there’d be a mix of “handle” and “reference” in all that.)

We don’t have a default/main reference; we have a default/main interpreter state. I can have multiple references to the default/main interpreter, and I can close any of them at any time without affecting it (apart from the last one). If I get a reference to the default and then close the reference, the default still exists and I can get it again (but it’ll be a new reference). That’s how you can tell that the thing that is the default is not the reference, and why getting the current default is a (static) operation on the state (or even more appropriately, as an instance method of the runtime, which is what “knows” the default, but as you’ve stated we don’t have any of those yet and you don’t want to introduce it).

It’s one of the major reasons that CPython isn’t really suitable for embedding in some kinds of apps (e.g. those with heavyweight project context, like most video editors, where ~everything shuts down between closing one project and opening the next).

So yes, we should continue to aim for the day that we can safely and completely shut down and restart the runtime, rather than actively doing things that will make it harder.

2 Likes

Hm, ok, I think I can get behind that. But I don’t think it should be done under PyInterpreterState, as this seems more like a runtime operation rather than an interpreter operation. On the C API WG thread, you suggested the PyRuntime namespace, but it seems like we put general lifecycle/runtime APIs in Py. Would you be OK with something like Py_GetDefaultInterpreterRef (or I guess PyUnstable_GetDefaultInterpreterRef)?

Ok, good to know. Thanks for the context.

1 Like

Either of those is fine by me. I don’t think we’d need to make it unstable (I suggested that for “main” because we want to lose “main”, but default is probably okay), but maybe @eric.snow has more thoughts about which direction this should go?

I think it’d be good for it to be unstable, regardless of whether it’s named “main” or “default”. People should be using the correct interpreter based on a call to PyInterpreterRef_FromCurrent, not the arbitrary “default” one they got from the runtime.

1 Like

PyUnstable_GetDefaultInterpreterRef sounds like a good spelling to me.

The unstable element would be less the mechanical aspect of the API, and more the semantic aspect: in the initial iteration, the default interpreter state is fixed for an entire initialise/finalise cycle, but that may not be the case for all future versions.

1 Like

Ok, seems like we have some consensus on that.

Going back to failure results for the APIs, is everyone good with PyInterpreterRef_IsNull?

1 Like

Changing semanics like that is bad for API, including unstable API.

Whatever we name this, I bet people will use it to get a persistent interpreter.
I’d prefer naming this PyUnstable_GetMainInterpreterRef, and when we no longer have a main interpreter, removing it and adding PyUnstable_GetDefaultInterpreterRef.

One late brainstorming idea: PyInterpreterRef_IsError

1 Like