I have nothing else to say, except that we’d really like to use this in PyArrow where we’ve had several reports of crashes or hangs due to calling into Python too late at process shutdown.
Belatedly catching up on this PEP after @encukou mentioned it in an old issue about PyGILState_Ensure()
not working in subinterpreters.
The proposed API looks good to me, and the fact that any code using PyGILState_Ensure()
would need to be updated to handle it potentially failing is a solid rationale for deprecating that API entirely in favour of the new one rather than trying to make the old API more flexible.
Adding a cross-reference to the C-API working group issue, where there has been some further discussion in the past week (cc @ZeroIntensity):
A
Just combining threads, the C API WG seems to want the following changes:
- Directly return
PyInterpreterRef
s and use0
as the error value, rather than the extra hoops with passing local variable addresses. - For future-proofing,
PyInterpreter[Weak]Ref_Dup
should be failable. - Add clarification to when
PyThreadState_Ensure
can fail. - Rename
PyInterpreter[Weak]Ref_Get
toPyInterpreter[Weak]Ref_FromCurrent
. - Move
PyInterpreterRef_Main
to the unstable API, and also rename it. I haven’t seen consensus from the WG on what to rename it to, but they seem to be leaning towardsPyUnstable_InterpreterState_GetDefaultRef
. I’m personally not a big fan of putting this under thePyInterpreterState
namespace, but I won’t argue too much. - Rename
PyInterpreterWeakRef_AsStrong
to something else, but hasn’t been agreed upon what yet. I’d personally likePyInterpreterWeakRef_Promote
.
Given that most of this is bikeshedding, I feel that this is a good sign that the general approach of the proposal is liked by the WG.
cc @steve.dower, in case you wanted to share any of your thoughts here.
Well, mostly Steve rather than the whole WG :)
I still think it would be better if references are opaque, so you can’t compare them to each other – otherwise people will do that to see if they refer to the same interpreter, no matter how much we call that an implementation detail.
Of course, this has the disadvantage of not allowing comparisons to 0, which requires the hoops.
But, if you’ve been convinced that allowing comparisons is better, I won’t veto the PEP over it.
Same here. Why not PyUnstable_InterpreterRef_FromMainInterpreter
? It should be OK to tie unstable API to implementation quirks we’d like to phase out.
The rest does feel like improvements, or bikeshedding :)
Maybe not, I can’t think of that many places where comparing a PyInterpreterRef
would be useful, even with the knowledge that interpreters can compare equal. What did you have in mind?
I think that == 0
does make usage a lot simpler. We could also add a PyInterpreterRef_NULL
or PyInterpreterRef_IsNull
to hide it:
PyInterpreterRef ref = PyInterpreterRef_FromCurrent();
if (ref == PyInterpreterRef_NULL) {
goto fail;
}
PyInterpreterRef ref2 = PyInterpreterRef_FromCurrent();
if (PyInterpreterRef_IsNull(ref2)) {
goto fail;
}
I like that too, but it’s a little long. In my update PR, I named this PyUnstable_InterpreterRef_GetMain
.
A function sounds good to me. Perhaps PyInterpreterRef_IsValid
?
The thinking here is that while we currently have the concept of a “main” interpreter, we don’t need it, and hope to one day no longer have to treat one as special.[1]
However, situations like the ones you’ve described do mean that sometimes users want “any” interpreter. Similarly, things like handling signals need somewhere to run.
So rather than embedding the concept of “the main interpreter” (which is not a concept we want to entrench), I suggested a “default” interpreter. It’s more neutral, but fundamentally means the same thing (though you can safely change a default interpreter, whereas a main interpreter cannot be changed).
And since the object is the interpreter state, then the main/default is an interpreter state, not a reference (we don’t have a “main reference”). So semantically, you’re getting a reference to the main/default interpreter state, and as such, it’s an interpreter state API.
One very concrete example of why: currently the “main” interpreter has to be first, but we might like to spin up a constrained interpreter first to evaluate the settings for the actual first interpreter (e.g. to run
getpath.py
). We briefly want the constrained one to be the default, but then want to switch it to the “real” interpreter. Currently, our implementation prevents it, and if we further entrench the current concept of the main interpreter it becomes even more prevented. ↩︎
I’d prefer IsNull
. IsValid
reads to me as checking if the interpreter is valid, not the reference.
The problem is that PyInterpreterRef_Main
only works because it’s the “main” interpreter, which is statically allocated on the runtime state. A heap-allocated default interpreter wouldn’t work well for this API because it could be deallocated concurrently.
This is why we want to be rid of it - we don’t want any statically allocated (global) state. So if this is your reason why we need to use it, I’m now a big -1 on this particular API.
The point of adding an API is to retrieve it safely, at which point you have a strong reference and so it won’t be deallocated. If we can’t do that inside our own API, then users have no hope (so we’d better make sure we can do it).
I totally agree! People should absolutely be using PyInterpreterRef_Get
instead of blindly grabbing the main/default interpreter. But, I heard complaints about how it was too much maintenance to get interpreter references into every single PyGILState_Ensure
call in some codebases. PyInterpreterRef_Main
is solely a compatibility shim for those people.
I think you’re right, my bad. I was imagining we’d just have a global PyInterpreterState *
on the runtime (which wouldn’t be thread safe), but it’s essentially a global weak reference, so I guess we could just expose something like PyInterpreterWeakRef_GetMain
instead. I’d imagine it would have to be immutable, though. It couldn’t switch to some arbitrary “default” interpreter, because an application should be able to expect that objects created after one call to GetMain
are still valid after a later call to GetMain
.
Why?
They can assume that objects created after activating a certain interpreter state will continue to be valid in that interpreter state, but there are so many ways you already can’t actually make the quoted statement true (most obvious, if the runtime finalises and restarts) that we probably should just not allow it at all.
My expectation is that the most likely reason people will want to grab any arbitrary interpreter state would be to do an action (e.g. log something), not to create a value. And if they’re at all multi-interpreter aware then that’ll involve some kind of serialization out of necessity (because the only way you can assume you’re getting a particular interpreter is to not use _Main
, since you don’t even necessarily know which interpreter started your code running).
The most we can really promise is that python.exe
(or equivalent) won’t change the default/main interpreter, but embedded runtimes might. I have no problem explicitly discouraging the use of _Main
(or whatever we name it) by making it sound unreliable - it really shouldn’t be that hard for libraries to capture the interpreter that was active when they were invoked and to keep that reference around until they’re done. Provided we can decref/dealloc an interpreter from any (native) thread, making _Main
a risky hack is fine by me.
Ok, that all makes sense to me. I’m fine with naming it GetDefault
, but I’m still not convinced about the PyInterpreterState
namespace. I feel like it’s really stretching the definition of what PyInterpreterState
should be for. Everything else regarding interpreter references will be in PyInterpreterRef
, so why not this one?
Is that something that people actually do in practice? To my knowledge, there’s a bunch of memory that’s leaked on the runtime, so I’d imagine that most embedders just want to do it once for the lifetime of the entire process
You seem to think that the interpreter reference is the concrete object, rather than being a reference to the concrete object. But that doesn’t follow - I can’t create a “new reference”, I can only create a new reference to an existing interpreter state. The object is the interpreter state, and the reference is the reference. (I know that phrasing seems a bit patronising, but it literally is in the name If you’d called them “interpreter handles” then there’d be a mix of “handle” and “reference” in all that.)
We don’t have a default/main reference; we have a default/main interpreter state. I can have multiple references to the default/main interpreter, and I can close any of them at any time without affecting it (apart from the last one). If I get a reference to the default and then close the reference, the default still exists and I can get it again (but it’ll be a new reference). That’s how you can tell that the thing that is the default is not the reference, and why getting the current default is a (static) operation on the state (or even more appropriately, as an instance method of the runtime, which is what “knows” the default, but as you’ve stated we don’t have any of those yet and you don’t want to introduce it).
It’s one of the major reasons that CPython isn’t really suitable for embedding in some kinds of apps (e.g. those with heavyweight project context, like most video editors, where ~everything shuts down between closing one project and opening the next).
So yes, we should continue to aim for the day that we can safely and completely shut down and restart the runtime, rather than actively doing things that will make it harder.
Hm, ok, I think I can get behind that. But I don’t think it should be done under PyInterpreterState
, as this seems more like a runtime operation rather than an interpreter operation. On the C API WG thread, you suggested the PyRuntime
namespace, but it seems like we put general lifecycle/runtime APIs in Py
. Would you be OK with something like Py_GetDefaultInterpreterRef
(or I guess PyUnstable_GetDefaultInterpreterRef)
?
Ok, good to know. Thanks for the context.
Either of those is fine by me. I don’t think we’d need to make it unstable (I suggested that for “main” because we want to lose “main”, but default is probably okay), but maybe @eric.snow has more thoughts about which direction this should go?
I think it’d be good for it to be unstable, regardless of whether it’s named “main” or “default”. People should be using the correct interpreter based on a call to PyInterpreterRef_FromCurrent
, not the arbitrary “default” one they got from the runtime.
PyUnstable_GetDefaultInterpreterRef
sounds like a good spelling to me.
The unstable element would be less the mechanical aspect of the API, and more the semantic aspect: in the initial iteration, the default interpreter state is fixed for an entire initialise/finalise cycle, but that may not be the case for all future versions.
Ok, seems like we have some consensus on that.
Going back to failure results for the APIs, is everyone good with PyInterpreterRef_IsNull
?
Changing semanics like that is bad for API, including unstable API.
Whatever we name this, I bet people will use it to get a persistent interpreter.
I’d prefer naming this PyUnstable_GetMainInterpreterRef
, and when we no longer have a main interpreter, removing it and adding PyUnstable_GetDefaultInterpreterRef
.
One late brainstorming idea: PyInterpreterRef_IsError