Add a #define UNSTABLE_ABI for tools and extensions like Cython and Numpy

Cython and Numpy like to access the internals of CPython data structures and make assumptions about the semantics of struct fields.
This is problematic for (at least) two reasons:

  • We can’t change anything for fear of breaking 3rd party code
  • We break 3rd party code, despite being careful, because our understanding of the semantics of C structs differs from Cython/Numpy

Therefore, I propose adding a lot of inline static C functions to control access to C structs.
These would be guarded by #ifdef UNSTABLE_ABI.

E.g.

#ifdef UNSTABLE_ABI
static inline _PyLong_IsNegative(PyLongObject *l)
{
    return l->long_value.ob_size < 0;
}
#endif

Initially, most (or all) of these functions won’t have normal API equivalents, in order to avoid making the C-API even larger than it currently is.
We could add the slower stable versions, if there is demand for them.

When we change the internals of PyLongObject again, there would be no need for Cython/Numpy to change their code, although they would need to be recompiled.

This is a limited solution to a limited problem. For more general solution, see PEP 689 -- Unstable C API tier

Could we add these as a new header file that has to be included explicitly?

How is this different from the unstable API tier?

2 Likes

We already #include longintrepr.h in Python.h, so all code already has access to the internals.
Adding a new file makes the new functions less discoverable.

This is for a limited use case, that we need to fix really soon.

So, just time?
The unstable API PR is up for review: gh-101101: Unstable C API tier (PEP 689) by encukou · Pull Request #101102 · python/cpython · GitHub
It’s missing the Devguide docs, but that should be fine here.

TBH I think having different APIs is, in general, a bad idea.
Multiple ABIs, sure. Trading performance for portability.
But there should only be one API, IMO.

1 Like

Maybe we should narrow the scope here.
Let’s change UNSTABLE_ABI to UNSTABLE_PYLONG_ABI as it is the access to PyLongObject that we need to fix.

And yet, here you’re sugesting new, exclusive API. What am I missing?

1 Like

Why make these internal APIs (= with underscore) instead of public ones ?

IMO, having a rich Python C API solves most of the “hiding away internals” in the most effective way. If extensions don’t need to go for internal struct fields to have a fast way to determine e.g. whether a Python int is negative, we’d come closer to resolving the issue of putting an abstraction layer between core Python and the huge set of Python extensions, allowing the internals to more forward independently from the extension’s use of the API.

Long term, there is probably no reason not to have these functions as part of the API.
But, for 3.12 at least we want to keep these semi-private until we are sure that we have the API right.

That’s what the unstable tier is for. PEP 689 even has this example:

if PEP 590’s “provisional” _PyObject_Vectorcall was added today, it would be initially named PyUnstable_Object_Vectorcall

That’s an unstable API, which is different from this use case (an unstable ABI).

We can have a stable API built on a stable or an unstable ABI, depending on compiler time flags.

I was referring to this kind of “semi-private”, provisional API.

I guess there was a misunderstanding somewhere. What are the stability expectations you need?
I thought it’s:

  • ABI can change up to the first release candidate, then is stable in patch releases
  • API can change up to the first release candidate, then is stable in patch releases

That’s the regular ABI and unstable API. What should be different here?