PEP 788: PyInterpreterRef: Interpreter References in the C API

I think Steve made a good point that we should consider supporting runtime reinitialization someday, so persistency isn’t a property that works too well, even if we specify that it should return the “main” interpreter.

Perhaps we should expose an API that returns a weak reference for the main/default interpreter instead? That would allow things to fail once the (current) runtime has finalized.

I like that too!

2 Likes

People will use what works for them, even if the behaviour isn’t sound in all cases.

That’s the same as a fallible function returning a strong reference.

1 Like

I meant that they would store the weak reference instead of calling PyUnstable_GetDefaultInterpreterRef every time they wanted to call the main interpreter. But maybe that’s asking too much of users to be practical.

Can we invert it: PyInterpreterRef_IsValid?

1 Like

That was brought up before, I like IsNull or IsError better:

It also ever-so-slightly complicates the usage, because you have to do !IsValid instead of IsNull.

If interpreter references need to be verified opaquely, I think the original design (e.g. int PyInterpreterRef_FromCurrent(&ref)) will be more user-friendly and efficient.

I dislike this API. I prefer to return an int and never create invalid/null references. Something like int PyInterpreterRef_Get(PyInterpreterRef *ref). It avoids having to add a IsNull/IsError/IsValid function and having to bother with invalid/uninitialized references.

I would prefer to keep the InterpreterRef domain first in the name: PyUnstable_InterpreterRef_GetMain(). So it’s easier to guess what’s the return type and functions are grouped by InterpreterRef.

I have no preference between GetMain(), GetDefault() or whatever else :slight_smile:

1 Like

We’re going in circles now :frowning:

I’d also be fine with == 0 or == PyInterpreterRef_NULL instead of a IsNull function. It’s easy to use and probably won’t cause problems with people comparing references.

I think Victor’s point relates to functions that accept PyInterpreter_Ref parameters: do they need to check their parameter for validity?

The counterpoint is whether a PyInterpreter_Ref struct field can be safely zero-initialised (by providing a way to detect such references are invalid).

Providing the comparison guarantee that interpreter refs are unique while alive, but may be reused also doesn’t seem that bad to me (since those are the semantics of object IDs in general).

Edit: that was a bit cryptic. To be clear, I’m also in favour of allowing comparisons in order to be able to say that zero initialising ref fields is OK.

1 Like

Same here. I’d love to avoid creating invalid references.
@steve.dower, how strongly do you feel about this?

1 Like

Don’t we also have a rule about always initializing output parameters? What does it get initialized to on failure if not an invalid reference?

While it’s quaint to pretend that the type is actually fully opaque, we’re using C. It’s going to be a pattern of bits, which means “all zeros” is going to be a possible value. Allowing native NULL checks is a really easy way to make everything work very naturally, especially across different languages.

Define it as typedef void *PyInterpreterRef and cast it to a private struct in the couple of functions that know its contents. It’s totally okay for users to know it’s a pointer-sized int and that zero is not valid, but not have any way to interpret the value other than to pass it back into the relevant functions. That’s incredibly normal for handle APIs.

My opposition to the int f(*ref) style is that it doesn’t solve any of the problems you seem to be saying it solves, and it adds complexity that we can very easily just avoid by using existing idioms.

2 Likes

I think the alternative designs, and their pros & cons, have been repeated enough times that they should get a Rejected Ideas (or Open Questions) section in the PEP.

1 Like

Right, we are trying to initialize output parameters on error to avoid undefined behaviors, but don’t document their values. From the API point of view, the output parameter (ref here) value remains undefined on error.

1 Like

Sure, no need to document the behaviour, but if we don’t have any kind of sentinel value at all then we can’t guarantee anything about undefined behaviours.

Alternatively, we make 0/NULL a defined value for a PyInterpreterRef and passing it to anything is an error, and if (!ref) is how you check whether you’ve got an invalid (but defined) result.

2 Likes

If it helps, the reference implementation already uses 0 as a debug value. If you call PyInterpreterRef_Close, the passed reference is actually assigned to 0, so then trying to use it with any PyInterpreterRef APIs after that will emit a fatal error.

If PyInterpreterRef_Get(&ref) fails, ref is undefined and must not be used. We can say it like that in the doc if you prefer. It’s a common pattern in the Python C API.

PyInterpreterRef_Close() argument is passed by copy, not by reference (it’s not a pointer to a reference). How can it set the reference in the callee? Does it use a magic macro?

Yeah, is that too much magic?

Ah, I found the code:

#define PyInterpreterRef_Close(ref) do {    \
    PyInterpreterRef_Close(ref);            \
    ref = 0;                                \
} while (0)

In the past, we had issues with macros which uses the argument more than once if the argument has side effects (ex: *ptr++). I would prefer to avoid such macro. Also, the macro cannot be used in Rust and other programming languages which leads to different behavior.

3 Likes

Yeah, too magic. Close APIs are pretty standard, and the value will be ignored by the user afterwards.

We’re talking about different things because of a disagreement, but not talking about the disagreement itself. I don’t disagree with anything that you’re saying, but it’s like you keep changing the subject.

There’s no need or benefit to using an output parameter here at all. If the returned type is zero/false/NULL (all the same in C), then it failed, otherwise the return value is the reference. Just like how we do everywhere else. The advantage of this is we implicitly get defined behaviour when a user ignores the return value:

// With output parameter
PyInterpreterRef ref;
PyInterpreterRef_Get(&ref); // ignored return value - `ref` is uninitialized/undefined
PyInterpreterRef_Close(ref); // no defined value of `ref` here, so we assume it's valid and then crash

// With return value
PyInterpreterRef ref = PyInterpreterRef_Get();
// didn't check `if (!ref) goto error;`
PyInterpreterRef_Close(ref); // `ref` is NULL, so we can ignore it/fail appropriately

Obviously we can make the first one safe by defining NULL/0 as a defined (but invalid) value, and then assigning it in the error case (without documenting that it’s part of the API, as you suggested).

Or we can just make it the error case and then everything else works naturally. Simpler for all.


To expand on my preference for the return value: the deciding factor as to which approach is better is how frequently will the reference be allocated and immediately assigned? If that’s the common case (as in my examples above), then initializing followed by a test (my second example) is better. If it’s more common to allocate the reference separately from where it’s initialized (e.g. in a malloc’d struct), and especially if getting its value is likely to be part of a chain of operations where boolean results can be combined and short circuiting used, then having a boolean result is likely to be either equivalent or better.

In this case, it seems most likely that interpreter references will be initialized at definition time, and failure likely means that Python exceptions aren’t available at all while the subsequent operation is probably going to produce a Python exception, hence chaining operations is unlikely. So I think we should go with the nullable PyInterpreterRef return value rather than output arguments.

2 Likes

Could someone lock this? The new thread is at PEP 788: Protecting the C API from Interpreter Finalization.

2 Likes