Adding C API for use in tp_traverse

Unless we’re going to make the intended API the one that doesn’t use errors or reference counting,[1] then these APIs are inevitably going to be unclear. They behave differently in a purely implementation-specific way.

It’s a shame that IronPython isn’t under development, because it’s a great implementation to use for “how would other implementations do this?” And I’m 99% sure the answer in this case is that they wouldn’t, which means this is effectively CPython-specific. At that point, being unclear or referencing implementation details is precisely what we should do.

So if it’s not because ~something is already locked that you can use these functions, and it’s not the world being stopped that allows you to use these functions… what is the condition?


  1. Hopefully this is obviously sarcastic. ↩︎

1 Like

The problem is that the fact that this is called within a stop-the-world pause (or locking) is mostly an unrelated implementation detail:

Right, so there’s a need for some API that doesn’t break internal CPython assumptions, and those API are only safe to use while those assumptions are in place (because you shouldn’t be borrowing references when other threads might release them, etc.).

So we need to give “those assumptions are in place” a name - “during GC” might be good enough - and then we need to name those APIs in a way to be clear that they are exclusively for at that time, without giving the impression that they’re good all the time (which I think _GCSafe and arguably _GC do).

Perhaps _DuringGC is what it should be? Or if there’s a better name for “those assumptions”, then I’m open to it.[1]


  1. I’m designing from ignorance here, so looking to those who understand it deeper than me for actual answers. I’m just testing my own ignorant assumptions with you and looking for better questions to ask. ↩︎

3 Likes

Let’s say that the requirement is that the functions don’t have any side effects. They can’t write to memory (outside their stack & return value). That’s a bit more strict than tp_traverse needs, but it’s what you need for tp_traverse, and it’s relatively easy to explain.
Then, _Pure seems appropriate.

Again, _Pure sounds safe to use all the time, which isn’t what we want to imply. I like it as a definition though, we just have to be clear that the side effects are essential to correct operation most of the time.

Perhaps _StackOnly or _NoHeap or _StaticMemory are ideas that might prompt a better one? (I’d happily have _ReadTheDocsBeforeUse in the name if there wasn’t always a risk that we’ll one day need another variant here and will need to distinguish them.)

Have we ruled out _UnmanagedRefs or just _Unmanaged already? Or _ThreadUnsafe?

OTOH, using a pure function is one of the more benign mistakes you can do in C. AFAICS it’s even a bit safer to use these than to skip error checking with the regular functions.

Well, any suffix like this should be a signal that something is different, and send you to the docs.

I’m fine with _GCSafe suffix. IMO it’s explicit enough. Details can be written in the documentation.

It returns a borrowed reference - we’ve been deliberately deprecating functions that do this even from the limited API because it’s apparently so bad (Py_TYPE in particular raised a lot of concerns). Why is it suddenly good enough in this case that we don’t want to discourage people from doing it whenever?

Should, but won’t. We know that people regularly confuse APIs based on their name (maybe they read the docs in the past and are misremembering now, or are just guessing based on the name). It’s worth the effort to name something well so that the most likely guess of what it does or why it’s being used is close to the intention. And we’re inventing a new concept here, so it’s even more important that we name it well.

All I’m asking for is a negative connotation on the suffix, not a positive one. Both _Pure and _GCSafe have positive connotations.

1 Like

_GCRestricted?

I do think _GC should be in the suffix somehow because that feels like the most important detail. So it seems like a case of finding a second word that’s negative enough.

2 Likes

I think this is the most self-evident option (with reasonable length) of “don’t use outside of very specific circumstances”

2 Likes

+1 Side-effects is a concept that is easy to explain and reason about.

Since we’re bikeshedding, how about `_NoSideEffects`?

Returning a borrowed reference isn’t necessarily bad. We’re not depercating PyTuple_GetItem (whose only sin is not having Borrow in the name). We’ve added Py_GetConstantBorrowed.

Py_TYPE raised concerns because it can’t be deprecated – it’s extremely widely used. Instead, for free-threading there’s a rather elaborate workaround that makes it (practically) safe: assigning to the type is a stop-the world operation. And so is MRO assignment.
So, PyType_GetBaseByToken_GCSafe is as safe as Py_TYPE; we need be able to support it as long as we have Py_TYPE (i.e., forever).

The other ones that return a borrowed reference (PyType_GetModule_GCSafe, PyType_GetModuleByToken_GCSafe) do the GetBaseByToken operation implicitly, but then return a type’s associated module, which is an immutable field (so like PyTuple_GetItem, there’s no issue there).

Do you have a favourite, from the alternatives floated here?

That has the same issue as “_Pure”: it looks like you want to use this variant.

I think I like _DuringGC best, and it strikes me as being at least as accurate as any of the others (the more you explain it, the more confident I feel to say it sounds accurate and informative enough). It certainly doesn’t make it easy to guess what it means, but I think once you’ve read up on it, it’s specific enough to be memorable.

4 Likes

OK, anyone against _DuringGC?

_DuringGC suffix sounds good to me.