Python ABIs and PEP 703

For example to allow for mutation or not, or perhaps for copy-on-write optimizations.
I’m not saying it’s good practice to rely on Python refcounting for that, btw.

Right. Checking whether an object is shared or not, by comparing the refcount to some K, sounds like you’re deep enough in the internals to want the version-specific ABI, a least.

2 Likes

OK. Starting from:

  • If PyObject becomes opaque, PyModuleDef_Base must lose its PyObject_HEAD so that users can define it.
    It follows that PyModuleDef must be not cast to PyObject (and we can bikeshed on how to document/enforce that for users).
    FWIW, in CPython, the only use for it being a PyObject is determining whether the PyInit_* function returned a def (multi-phase init) or a complete module object (single-phase init).

The design space here is pretty big and needs more thought than I can give it right now, but a solution exists:

  • PyModuleDef_Init can return a newly allocated PyObject that wraps the provided PyModuleDef. (This object will leak, but possibly only once per PyModuleDef per process, if it’s stored & reused.)
    To allow users to avoid that, we could make CPython look for a symbol named PyModDef_<name> before it looks for PyInit_<name>. PyModDef_<name> would contain the module definition rather than a function that returns it. (And so users get to write less boilerplate, not just avoid a small leak!)

There might be better ways with fewer downsides.

1 Like

Please let’s stick to functions in cross-module APIs. They’re far more reliable and make forward-compatibility easier.

Also, let’s either put the structure size in the structure itself, or a version number. We’ve already run into issues with the initialisation structs changing over time in a way that we now can’t detect, and either of these would’ve covered it. (Even in the wrapped PyModuleDef case I suspect we’ll end up owning the wrapper, and so this matters, but I don’t think it gains us anything. Might as well just return a dict with magic names in it, if we want complete independence from C structs.)

2 Likes

Ah, I was wondering why the dynamic loading code is so tied to functions :‍)

Well, another idea is to have PyModule_Def start with a bit-pattern that can’t ever be a valid PyObject. The hard part is, again, making sure the user won’t cast it to PyObject.

Struct versions wouldn’t save us here. They’re still a good idea though.

1 Like

Yeah, that was more of a “lessons learned from last time we changed a struct and got it wrong”, since we may well end up changing a struct here.

I don’t think that gives us ABI level compatibility with other runtimes, which seems to be where the idea for this change traces back to? We could always make up a fake PyTypeObject for it to refer to, but there’s only so many tricks we ought to use to maintain abi3 before it’s worth a fresh start.

Probably we’re best off fleshing out the slots API if we wanted to drastically change module definition. That gives us the most freedom in terms of adding new slots later without breaking existing users (assuming a stable API).

2 Likes

Building on @encukou’s comments, here’s a modification of the original proposal. (Don’t focus too much on naming, we can work out those details later.)

Here’s a prototype: Prototype ABI4 with Python 3.7+ compatibility · colesbury/cpython@2c1d02a · GitHub

  • We add a new “abi4” with targets “cp37-abi4”, …, “cp313-abi4”.
  • C extensions can target by “abi4” by defining Py_ABI_VERSION_4 along with the usual Py_LIMITED_API=0x30...
  • The --disable-gil builds can load “abi4” wheels, but not “abi3” wheels. So “cp37-abi4” wheels are compatible with Python 3.7+, including --disable-gil builds.
  • PyObject and PyVarObject become opaque structs in abi4.
  • PyObject_HEAD is still needed for extensions to define their own types. In abi4, it reserves enough space for both abi3 PyObject headers and --disable-gil PyObject headers. This means 24 or 32 bytes on 32-bit or 64-bit platforms. Note that extra space in the header is fine.
  • PyModuleDef is a bit different. It reserves exactly the space for an abi3 PyObject header. (It can’t be bigger because older versions of CPython still need to access the fields.) At the API level, PyModuleDef_Init no longer guarantees that it returns “def cast to PyObject*”. Old versions of CPython, of course, still do this as does the default build of CPython 3.13. The --disable-gil builds return a simple wrapper.

The implementations of Py_TYPE and similar require special care when targeting pre-3.13 versions of Python and ABI 4. (See this file).

  • Extensions must call the Py_Type function in 3.13, but that won’t be available in 3.12 and older. When loaded on those versions, the extension must access the field directly.
  • On most platforms, we handle this by making Py_Type a weak symbol. Windows doesn’t support weak symbols, so we use GetProcAddress instead to determine if Py_Type is available.
  • There’s a dispatcher mechanism so that we only have to look up the symbol on the first invocation.

Risks and weird stuff

  • Wheels can be named like cryptography-42.0.0-cp37-abi4-[platform].whl but the actual shared libraries will still need to be named with the .abi3.so extension to be loadable by older versions of Python.
  • pip (or more specifically pypa/packaging) will needed to disallow abi3 wheels from --disable-gil builds and allow abi4 wheels for Python 3.7+.
3 Likes

While I did my best to avoid accessing PyObject members in the Public C API, it is non-trivial to make PyObject empty and store ob_refcnt and ob_type outside PyObject, for example “before the PyObject* pointer” (as done for the PyGC_Head structure for example).

One problem is that sizeof(PyObject) is part of all structure which inherits from PyObject (PyListObject, PyDictOject, etc.), and this size is stored as “item size” in heap types, in the stable ABI. sizeof(PyObject) is used in a few other places.

Details: Python C API: Add functions to access PyObject — Victor Stinner blog 3

What can be done to make PyObject opaque is to replace members with an opaque byte array of the exact same size. The public C API knows nothing about members, and casts and other tricks are used to access the hidden members in the internal API (just use a _PyObject structure with real members).

For NoGIL, the constraint is that sizeof(PyObject) should not change.


We cannot remove PyObject type. It’s used everywhere. But we might be able at same point to make it empty! If we manage to store data before the PyObject* pointer.

sizeof(PyObject) just return zero. Members cannot be accessed. Now the new problem is the stable ABI breakage. Maybe it would be acceptable to break the ABI for NoGIL.

Sounds good for PEP 703. We can tweak the versioning & defines in another PEP.

Makes sense.
I’d say that PyObject_HEAD is still available, rather than needed. PEP 697 makes it possible to avoid it, but that’s only available in Python 3.12+, which is too late if we want abi4 in 3.13.
But that gives us a clearer way to remove PyObject_HEAD eventually – hopefully by the time we need to increase the object head size. (We’ll probably want a gradual deprecation mechanism instead of planning for abi5).
If we can get in an even better way, like Grand Unified Python Object Layout, all the better :‍)

Use PyObject_Type instead of Py_Type :‍)

For the others, could we get away with removing them from the limited API entirely? They’re questionable, anyway:

  • set type (dangerous; in the safe cases can it be replaced by Python API – assignment to __class__)
  • get/set refcount (CPython-specific)
  • get/set size (ill-defined for the general PyVarObject; if individual types like PyTuple expose length information they should do so by other API – like PyObject_Size a.k.a. __len__)

Forward API compatibility (i.e. older extensions working on abi4 with just recompilation) is nice, but it might not be worth requiring weak symbols. Especially since…

This looks like a more serious break of forward API compatibility.
To define a type with no extra C instance state, you currently set PyType_Spec.basicsize to sizeof(PyObject). I haven’t checked, but I expect a lot of extensions do it.
We got lucky here: you instead can set basicsize to 0, meaning “inherit”, which was only documented and tested in 3.12 but happens to work on older versions as well.

1 Like

Why is it too late? 3.13 isn’t in beta yet.

2 Likes

I would also say we don’t need to rush things here. If things wait until 3.14 it isn’t the end of the world.

2 Likes

The Stable ABI sounds to be not well understood and people have different opinions and expectations about it. If we decide to design an “abi4”, please write down a PEP to list all changes and their rationale.

2 Likes

Yup. This is a pre-PEP discussion.
I’m just now trying to put all the thoughts together, in a way that would make sense for my presentation at the sprint :‍)

Let me back up a bit before answering Guido’s question:

We can’t keep the letter of the promise to support abi3 until Python 4.0. C’est la vie.
When the promise was made, 4.0 was expected to come after 3.9, so we’re keeping the spirit of the promise, for all that’s worth.

Anyway we need to decide a better policy for abi4. We probably want rolling deprecations, rather than planning for a(nother) big break.
That policy could be “CPython versions supported when rc1 is released”, so cp313-abi4 would work with 3.8+, cp314-abi4 with 3.9+ and so on.
I can see reasons to try supporting more than that, but I don’t think it would make much sense to support less.

It would limit abi4 to 3.12+. That is an option, but I think we can do better :‍)

It isn’t, but I expect we’ll keep finding more things to (not) include in the big break.
IMO it’s OK to put in things we intend to deprecate/remove down the road. We’ll need to plan how we’ll do deprecations, and we should start that now.

2 Likes

I think Victor may be right that “[t]he Stable ABI sounds to be not well understood”! I had always assumed that if you compile with a certain abi (say, abi3.10, if that exists) you are compatible with later versions (3.11 etc.) but not with previous versions (3.9 or before). But here you’re claiming that cp313-abi4, which I understand to mean “compiled with 3.13 headers, selecting ABI4 through a #define”, would be compatible with 3.8 and later? Is this new, or did I just misunderstand how the ABI is supposed to work?

Separately, you propose “rolling deprecations” (presumably for ABI versions), but you don’t give an example. Presumably the idea is that e.g. cp313-abi4 would be supported at least through 3.17? Or am I misunderstanding this too? (I’ve never been able to truly follow the PEPs on the subject.)

2 Likes

That’s the current promise of abi3; it’s compatibility is a combination of intepreter version and the stable ABI (there is no abi3.10 beyond cp310 whichi is tied to the CPython 3.10 ABI).

It’s a proposal Petr is making. I think his idea is that the stable ABI act like an LTS in that you have a certain number of years of guaranteed compatibility. In this case, though, he’s proposed to work backwards (I assume so you can use the newest Python while getting backwards-compatibility).

BTW we do have have compressed wheel tags, so you could actually list all ABI version one is compatible with, e.g. abi38.abi39.abi310.abi311.abi312.abi313.abi314. That could let us generate accurate tags based on any removals, incompatible changes, or usage of the ABI if we could somehow expose these details to the tools creating wheels.

1 Like

I meant this as a shorthand for defining

#define Py_LIMITED_API 0x030A0000

which several header files test for, e.g.

#if !defined(Py_LIMITED_API) || Py_LIMITED_API+0 >= 0x030A0000
PyAPI_FUNC(PyObject *) PyObject_GenericGetDict(PyObject *, void *);
#endif

I’m guessing you mean this is not expressible in a wheel identifier? I’m beginning to feel that I am getting confused between wheel version tags and ABI versions.

I’ll let @Petr expand on his proposal – I still don’t see how it can work without also specifying an “expiration date” for each ABI (sub)version.

I’m suggesting that with Python 3.13 you should be able to generate cp38-abi4 extensions by compiling with Py_LMITED_API=0x0308... (or its new equivalent). These extensions would be compatible with 3.8+, and so they’d need PyObject_HEAD.

That’s much less of a proposal, more of a general idea that we shouldn’t plan for an abi5 which would break everything, but rather deprecate and remove features gradually.
Yes, one option is that e.g. Python 3.18 would start rejecting cp313-abi4.

So we can do something like:

  • 3.13: deprecate PyObject_HEAD and related stuff in the limited API
  • 3.15: remove it from the limited API
  • 3.20: drop support for cp314-abi4, allowing us to change the size of PyObject

That might be useful!
Sounds like in practice the low end of the range would be the defined Py_LMITED_API, and the high end would be the Python you’re compiling with. That doesn’t look hard to expose.

It’s the 310 in cp310-abi4.

I’m guessing I was confused by the appearance of “cp313-abi4” in combination with mention of 3.8+. With your clarification I understand the proposal better, although I’m not sure about its motivation.

Separately, could you go over the issues around PyObject_HEAD? Again, it’s probably been explained in one or more of the previous 36 messages in this thread, but I’m trying to catch up here, and out of context I’m not sure what exactly is going on here.

I’m also still confused by the “it’s too late for 3.12+” – could you unpack that some more?

Motivation for a stable ABI in general, right?
So far this topic has been about brainstorming technical solutions – “is it possible” rather than “is it worth it”. We can start on the latter, too. Let me try to focus my thoughts into a paragraph:

IMO the most relevant benefit here is the ability to build your extension once to target all supported versions of Python.
For Python-first projects on GitHub Actions & PyPI, this might be just about saving build times and hosting/bandwidth costs. But for projects where the Python bindings are just a convenience, you get a choice between adding loops (and support for multiple Python configurations) to a buildsystem like CMake or Cargo, distributing a specific vendored version of CPython, or dropping the Python bindings.
We have a strategic decision. 1) Do we want Python to expand in that area? 2) Should this be our concern, or do we leave it to an external project like HPy?
IMO: 1) yes, we shouldn’t focus only on PyPI cp wheels, and 2) it’s better to build this in CPython, onboarding interested devs as CPython contributors.

Currently extensions define a custom type like this:

 typedef struct {
    PyObject_HEAD
    int my_data;
} MyObject;
static PyType_Spec MyType_spec = {
    ...
    .basicsize = sizeof(MyObject),
};
some_method (MyObject* self) {
    use(self->my_data);
};

That means the size of PyObject_HEAD (and thus PyObject) is part of the ABI. We can’t change the size in abi3 (i.e. Python 3.12 and lower).

Hence Sam’s solution: PyObject_HEAD stays, in abi3 it does the same thing as before, and in abi4 it’s big enough to fit either PyObject (used in 3.12- and default builds of 3.13+) and the new header (used in --disable-gil builds of 3.13+).

This just defers the problem: eventually we’ll need to change the object header size again. So the long-term goal should be to make PyObject_HEAD from the limited API/ABI – to have it managed entirely by the interpreter.
IMO, the best way to do that is along the lines of Mark’s Grand Unified Python Object Layout – but after we implement that, we’ll need a deprecation period before we can drop the current way of doing things.


Technically, there is a way to avoid PyObject_HEAD and sizeof(PyObject) in 3.12+, although it’s quite cumbersome to use – with API added in PEP-697:

 typedef struct {
    int my_data;
} MyObject;
static PyType_Spec MyType_spec = {
    ...
    .basicsize = -sizeof(MyObject),
};
some_method (PyObject* self) {
    ...
    MyObject *my_obj = PyObject_GetTypeData(self, my_type);
    use(my_obj->my_data);
};

This isn’t very ergonomic – it was designed for much more limited use cases than all custom objects – so it’s mostly interesting to code generators (Cython, pybind11, pyo3, etc.), or perhaps for a shim to make future API target 3.12+ ABI. (AFAIK it’s compatible with Mark’s Grand Unified Python Object Layout concept.)

Do you mean this?

I wanted to correct a technicality – PyObject_HEAD isn’t strictly required to define a type.
If we did remove PyObject_HEAD in abi4, it would make abi4 incompatibile with 3.11 and lower (for extensions that to define custom types without absurd hoop-jumping). At that point we might as well drop 3.12 too, and go for a hard ABI break.
(But that might be a moot point now: we don’t want to remove PyObject_HEAD because the current alternative uses inconvenient API, and I don’t think designing a better API is on the table for 3.13.)

2 Likes

FTR, I firmly believe any project that could conceivably vendor their own copy of CPython probably should vendor their own copy (and we should work on making embedding/vendoring CPython easier on all platforms).

But even (especially?) in this case, the benefit of a universal ABI is that third-party releases of packages can be more easily integrated into the overall app.[1] So +1 to the whole motivation.


  1. I weakly believe that any project that could vendor CPython should also rebuild all their own dependencies, but there are definitely situations where that’s infeasible and/or they intend for users to bring their own libraries, and so can’t enforce a build environment. ↩︎

3 Likes