Use the limited C API for some of our stdlib C extensions

pf_moore · September 4, 2023, 2:33pm

While this is a fair point, one thing it doesn’t consider is that tooling to allow extensions to use the limited ABI is, I believe, far from ideal.

If the common build backends that allow building of binary extensions (which basically means setuptools, but this may also apply to tools like Cython) made it the default to build extensions that used only the stable ABI, how would that change things? At the moment, the simplest way to build an extension uses the full API, so that’s what everyone will do. The only people using the stable ABI will be people for whom building extensions for every new release of Python is a significant burden, which is likely to already be a very small proportion of projects.

IMO, there is a lot of benefit in the limited ABI, but only if:

It satisfies the majority of extensions’ needs.
Tools default to the limited ABI by default.

The first of these is what this discussion has focused on, but as far as I know, no-one is looking at what would be involved in the second. IMO, offering help to 3rd party build tools would be a better investment, if we want to support the limited ABI, than simply using it internally. If only because stdlib extensions don’t use the tools (setuptools, cython, etc) that 3rd party extensions use and so we won’t actually identify the real pain points this way.

encukou · September 4, 2023, 3:21pm

It hasn’t been widely advertised, and it’s been quite incomplete until recently. Notice the recent uptick in usage :‍)

I know of one issue that makes this hard – exposed PyObject fields. That is solvable with a one-time ABI break, i.e. abi4.
If you know of any more issues where maintaining the stable ABI is more painful than API compatibility (PEP-387) or frozen per-version ABI, I’d love too hear them; ideally please comment in issue#4.

Sure, cibuildwheel works for projects on PyPI.
How would this work for, say, vim, which is now able to use whatever Python3 you have installed? (The commit message for that has great summary btw)

malemburg · September 4, 2023, 4:55pm

The numbers I quoted are only a week or so old (perhaps even less – I don’t know how often the ClickHouse database site is updated with more current numbers from PyPI).

Yes, there are a few more packages shipping abi3 wheels nowadays than say a year ago, but not substantially more to make a significant difference.

Incidentally, a lot of those packages are Rust based. Perhaps the PyO3 is defaulting to abi3 where possible. For a few others, they started shipping abi3 wheels for more recent Python versions, but not for older ones (I guess they were missing some APIs in the ABI3 which were added now).

The main difference is that you cannot change the APIs in the limited API at all, without breaking the ABI. Without this limitation, the APIs could be changed subject to the normal deprecation procedures and participate in the evolution of the APIs.

And because the stable ABI doesn’t even specify an expected lifetime in years or number of releases, it means that no changes are possible until we move to 4.x.

Lifting the requirement to be stable across all 3.x versions would help with this problem, of course, but then I don’t think we’re that far off from the regular deprecation process, which also supports compatibility for at least 3 releases.

I’ll check the issue you mentioned, if this has already been mentioned.

BTW: It’s not only the PyObject structure that’s affected, the corresponding type objects are affected as well and with them the way e.g. slots are used, found, managed, etc. A type defined based on the stable ABI in say Python 3.7 would still be expected to work in a future Python 3.20, unless I’m missing something. This hinders evolving the internals.

I don’t think we should be supporting such an unusual use case.

They appear to allow use of Python in plugins, but then let the user decide which Python version to link against. IMO, the version should be fixed by the application embedding Python and thus managed that way and by the application maintainer – not leave this decision to the user.

vstinner · September 4, 2023, 6:40pm

As Guido pointed out, apparently, I’m really bad at sharing my agenda and rationale behind this work. I’m sorry about that Let me explain it from an user point of view.

There are multiple problems:

When Python 3.12 will be shipped in 1 month, many C extensions will not be available at day 1 (2023-10-02).
It’s expensive (complicated) to maintain more than one Python version in a Linux distribution. Most Linux distributions don’t, users are on their own if they want another Python version than the chosen one.

Python 3.12 vs C extensions

Python 3.12 is going to be shipped in 1 month: 3.12.0 final: Monday, 2023-10-02. Can you expect 100% of C extensions to be usable smoothly at day 1 of the release? According to previous releases, I can say: well… no!

At Red Hat, we are actively asking gently popular C extension maintainers to ship Python 3.12 binary wheel packages as soon as possible: issue Encourage upstreams to release Python 3.12 manylinux wheels before Fedora 39 final release. We did the same in the past for Python 3.9, 3.10 and 3.11.

By the way, I would like to say that it’s unpleasant that we are paid to asks volunteers to work for us to please our customers. But that’s how the open source community usually works: we pass user requests to upstream projects. We are not maintainers of these projects, we are providing changes to support new Python releases (we helped to fix Cython and many others projects to support Python 3.12), but we cannot do the last part, actually releasing a new version containing the fixes, and publish wheel packages.

For example, last week, python3.12 -m pip install numpy still failed with: ModuleNotFoundError: No module named 'distutils' (I don’t think that it has been fixed in the meanwhile). Same error when trying to install matplotlib. Fedora 39 is scheduled to be released next month with Python 3.12. We do ship Fedora packages for numpy and matplotlib (RPM), but users love to run pip install anyway (directly or indirectly, for good or bad reasons), and then open bug report: “installing numpy doesn’t work!!!”.

Can users consider using Python 3.12 if numpy and matplotlib are missing? Maybe yes, maybe not. From the Fedora point of view, it’s a big issue. In the past, we always got bug reports, users don’t understand why it’s not as usual when “pip install just worked”. Sometimes, bug report are about missing build dependencies, slower installation, or anything else.

Note: I have hope in the port of numpy to HPy HPy serves similar use cases, especially with its universal ABI.

Process or tooling to make things better?

I only named 2 popular C extensions, but the Python ecosystem is way wider than these two C extensions. Also, numpy is a healthy project with strong funding. If a project with strong funding and active development cannot be ready for Python 3.12 one month before the final release, how can you expect 100% of C extensions be ready for Python 3.12 final release?

In the past, I proposed PEP 608 – Coordinated Python release which required to hold a new Python release until the bare minimum ecosystem is functional, like pip and numpy. It was rejected because Python core devs cannot control the releases of pip or numpy, and apparently it’s better to ship a Python without these tools than not shipping Python. An optimistic person would say that it’s a chicken-egg problem, Python 3.12 should be released before numpy should consider supporting it, right?

I also proposed PEP 606 – Python Compatibility Version, a kind of “stable API” at the Python level, but it also got rejected. It’s common that pure Python projects are also unusable on newer Python, because of a bunch of minor changes, minor but enough to prevent using these projects.

Supporting more than one Python version

Few Linux distributions support more than one Python version because providing one binary package for C extensions is a huge maintenance burden. For example, if you want to support Python 3.6, 3.9 and 3.11 and you have packages for lxml, when there is yet another security vulnerability in the lxml “clean html” feature, you don’t have to build 1 new package but 3 new packages. Even if technically, it should be the same “source package”, building 3 packages requires more work than only one. It’s perfectly reasonable for Debian to say “we only support one Python version”.

RHEL8 distribution provides 4 Python versions: 3.6, 3.8, 3.9 and 3.11. Only Python 3.6 is fully supported (most packages), the other versions have the bare minimum packages that we can support.

RHEL8 lifecycle is about 10 to 15 years, so it has other concerns than Linux distributions with shorter lifecycle, like Fedora releases which are only supported for one year. Debian has a similar time scale, but it only supports one Python version.

Here comes the stable ABI!

If we manage to make the limited C API and the stable ABI looking more appealing, we can use it as a technical solution to this problem. It will not solve all problems that I listed (e.g. it will not solve social issues), but it should solve some of them.

For example, PySide and cryptography are shipped as stable ABI binaries. They don’t need any change to usable on Python 3.12. Yeah, I know, it’s hard to believe, Python got me used to suffer at each major release, right? I’m not a believer, I want to see it working for real:

$ python3.12 -m venv env
$ env/bin/python -m pip install cryptography
...
  Using cached cryptography-41.0.3-cp37-abi3-manylinux_2_28_x86_64.whl.metadata (5.2 kB)
...
Successfully installed cffi-1.15.1 cryptography-41.0.3 pycparser-2.21

$ env/bin/python -c 'import cryptography.hazmat.bindings._rust as mod; print(mod)'
<module 'cryptography.hazmat.bindings._rust' from '/home/vstinner/env/lib64/python3.12/site-packages/cryptography/hazmat/bindings/_rust.abi3.so'>

Python 3.12 installs pure Python code but also a Rust extension built for the Python stable ABI known as abi3.

Note: it’s not perfect, cffi and pycparser don’t support the stable ABI, and so installing cryptography can fail if one of these two dependencies fail to build.

If we can help maintainers to move towards the limited C API, you can expect having more C extensions to be usable at day 1 of Python 3.13 release. Or it will done the usual way, users harassing maintainers to “just make it work” (without providing any technical or financial help, obviously), burnout of maintainers, and similar other cool stuffs

Python core devs can look at the limited C API, add missing functions, enhance the documentation, enhance it in general, and as I wrote, consider using it ourselves to see how users are struggling with using it. If we, developers of Python itself, don’t want to or cannot use the limited C API, why would other developers even give it a try?

There is hope!

guido · September 4, 2023, 7:06pm

This is one of the major confusing points though. You state that internal and private are pretty much the same. Victor OTOH seems to consider them completely different, and seems to be using specific definitions (which he’s not stated explicitly that I recall) which I think rely the distinctions that I listed in my summary.

I think “if his plan works” is not sufficient – we also need to have clear guidelines or rules that prevent new “private” APIs from being introduced by accident. I don’t think many core devs have internalized the new rules yet, and continue to think of “private” and “internal” as referring to the same thing. The same is likely true for the majority of non-core contributors. I don’t know if we need new tooling, or a PEP clearly stating the rules, or just Victor policing API changes. But I think stating that they are pretty much the same is just perpetuating the confusion about where private/internal APIs should live and how they should be named.

(OT: I also find the distinction between “Include/cpython” and “Include/internal” very confusing. This has bugged me from the start when these were introduced – at the time I didn’t feel it was my decision. But who did decide, besides Victor?)

guido · September 4, 2023, 7:17pm

I personally believe the problem is with the requirement that once 3.x.0 is released all 3rd party packages should be instantly available. This has never been the case and will continue to be an unreachable goal.

It’s too bad that your users expect this, maybe you should set different expectations when the next Fedora is released. If your users report spurious bugs, well, there are other ways of dealing with those than forcing upstream CPython into an awkward transition to the Limited API.

hugovk · September 4, 2023, 7:24pm

Latest stable is NumPy 1.25.2 and doesn’t support Python 3.12, but they’ve uploaded 3.12 support as a pre-release: 1.26.0b1. So you need to do something like python3.12 -m pip install numpy --pre or python3.12 -m pip install numpy>=1.26.0b1.

Or:

'numpy; python_version<"3.12"',
'numpy>=1.26.0b1; python_version>="3.12"',

Or PIP_PRE=1 in the environment, or PIP_ONLY_BINARY=:all: / PIP_ONLY_BINARY=numpy (source).

So good news it’s available, but this pre-release method is less obvious and makes integration harder.

encukou · September 4, 2023, 7:43pm

Yup, it’s fiiiinally usable for non-trivial extensions, and the word is getting out :‍)

Not really. Functions in the stable ABI can become no-ops, become very slow, start raising runtime deprecation warnings, start always raising exceptions, or even leak resources in extreme cases. The ABI guarantee is along the lines of the symbol staying around, so you don’t get linker errors, or the argument types and such not changing, so you don’t get data corruption.
Breaking as little as possible would be very nice of course, and PEP 387 applies. But stable ABI guarantees themselves are surprisingly weak.

It could start raising deprecation warnings, and then fail at runtime, just like Python code.
That said, as far as I know, there are ways to make most uses of existing slots work even with future redesigns.
And, many of the issues with slots – e.g. we can’t change their signatures – are shared with the non-limited API.

Well, we disagree there.
Anyway, let me know when stable ABI actually hinders your work (aside from the exposed PyObject* fields). In cases I’ve seen, the general backward compatibility policy (PEP-387) is where the pain comes from.

Yes. I’m not claiming to understand his work. But I don’t have the bandwidth to check it, so I’m left with trusting that it’s well thought out.
I did write some recent PEPs and documentation around this area, so I’m clarifying the intent and terminology used there. I’m not aware of a documented distinction between internal and private; they are used as synonyms.

No one? AFAIK the PEP that mentions it is 620, which is Withdrawn, even though it’s mostly completed.

encukou · September 4, 2023, 8:01pm

What do you think is the proper amount of time before a new 3.x becomes a good default? A year or so?

The goal might be unreachable, but IMO it is something to strive for. And at least enable from our side.

malemburg · September 4, 2023, 8:32pm

Petr Viktorin:

malemburg:

The main difference is that you cannot change the APIs in the limited API at all, without breaking the ABI. Without this limitation, the APIs could be changed subject to the normal deprecation procedures and participate in the evolution of the APIs.

Not really. Functions in the stable ABI can become no-ops, become very slow, start raising runtime deprecation warnings, start always raising exceptions, or even leak resources in extreme cases. The ABI guarantee is along the lines of the symbol staying around, so you don’t get linker errors, or the argument types and such not changing, so you don’t get data corruption.
Breaking as little as possible would be very nice of course, and PEP 387 applies. But stable ABI guarantees themselves are surprisingly weak.

That’s an interesting interpretation of a stable ABI

If all the stable ABI provides is a guarantee that entry points continue to exist, but the functionality behind it may break anytime (subject to PEP 387), then it could just as well be implemented as a shim library interfacing to the full public C API and shipped as a separate extension on PyPI that you load in case needed.

Needless to say that such an interpretation pretty much goes against what regular users and authors of Python extensions using the stable ABI would expect, namely that these extensions “can be compiled once and work with multiple versions of Python.”

Where’s the benefit of saying “oh, you will be able to import this module in Python 3.20 without problems, but some parts may not work anymore” (and it’s not even clear which parts those are) ?

If we want to keep a stable ABI as a maintained feature in Python, we need to give more robust guarantees and also take the hit of making it harder to evolve the internals. I don’t think we can have both with the current design of the stable ABI.

A new and different approach which separates the ABI from the internals would be needed, something like e.g. the Windows COM interface or an indirection of most of the ABI via a per interpreter struct (similar to what Capsules provide for C extensions - PyCObjects in the old days). With such a solution we could have multiple stable ABIs together with support guarantees for a certain number of releases, but that’s a different story…

erlendaasland · September 4, 2023, 8:47pm

For Include/cpython, there was a poll here on Discourse^[1]:

See also:

it only received 3 votes, though, so it wasn’t exactly representative ↩︎

guido · September 4, 2023, 9:10pm

Historically it’s been months for many common 3rd party packages to become available. People have to decide when to adopt the new release based on their own requirements and dependencies. Usually few people running applications in production are in a hurry, and dependencies are a big part of the reason. (Another big part is the resources needed to test and migrate a production app – and the worry about bugs in any “point oh release”, which everybody would rather have someone else find first.)

Packages are a different matter. I’m all for encouraging package maintainers to be prepared and start working on wheels once rc1 is out. I’m unhappy about the pressure I am currently feeling to make it our fault if not every 3rd party package works on day one. I still feel that the Stable ABI is a solution largely in search of a problem, and too much of the argument in favor of it feels based in wishful thinking.

I wish we could look for a solution in the area of making the building of binary wheels easier, or faster, or cheaper. This could take the form of a build farm to which one can submit build requests, or simpler GitHub Actions recipes, or something else. (I’m not sure about the state of the art here, but I’m sure some folks will claim that conda-forge has solved this – however that probably doesn’t help the pip install foo crowd.)

vstinner · September 4, 2023, 9:34pm

Remarks on the Internal C API.

Are you talking about the directory name? Or do you mean that it’s unclear to you who uses which API for what purpose?

I think that it was Eric Snow who created the Include/internal/ directory to be able to move the PyInterpreterState structure there. I don’t recall the details.

By the way, the internal C API was created to solve an implementation problem. For example, the _Py_ID() API, which replaces the _Py_IdentifierAPI, goes deeper into internals. _Py_Identifier is a simple structure with 2 members: typedef struct _Py_Identifier { const char* string; Py_ssize_t index;} _Py_Identifier;. It can be inspected by its user. But _Py_ID(NAME) is another kind of beast, it looks for _PyRuntime.static_objects.singletons.strings.identifiers._py_NAME which is the new giant _PyRuntimeState structure which has… many members (most of them are generated by complex tools). See for example the pycore_global_strings.h header file. I don’t think that we want to expose such implementation details to users, nor users to rely on it. The list of “static strings” is changing frequently, the structure members offset changes often, so there is no such thing as ABI stability for that (which could be an issue even without the Stable ABI, for regular C extensions).

Another problem was the usage of atomic variables (pyatomic.h) and conditional variables (pycore_condvar.h). These header files caused many compiler errors when they were part of the public C API (ex: not compatible with C++). Having the ability to exclude them from the Public C API is a nice step towards a cleaner Public C API.

Python 3.13 evolves a lot since Python 2.7! For me, having a separated Internal C API made such work possible.

vstinner · September 4, 2023, 9:57pm

There are still cases where adding a private _Py function is fine. For example, if it’s the implementation of a public C API: called by a macro or static inline function. But yeah, it should be the exception, not the rule Some tooling may help. So far, I used manually grep to discover these APIs. That’s how I landed on _PyObject_VisitManagedDict() added to Python 3.12.

I don’t pretend to have a silver bullet solving all problems at once. I’m saying that converting more 3rd party extensions to the stable ABI will increase the number of C extensions usable in early stage of new Python versions.

A lot part of that is already automated by tooling by cibuildwheel. I don’t think that this part is the bottleneck. IMO the bottleneck is more that it’s a thankless work to do, it’s not exciting, and maintainers prefer to work on new features, or fix their own bugs, rather than following Python C API changes.

I don’t think that it’s the right move. Maintainers are doing their best effort and are unavailable for various reasons. It’s common that a maintainer of a critical dependency is away for months. I don’t think that putting more pressure on them “you have to fix my use case” (ex: support the new Python) is helping. By the way, as I wrote, my Fedora team is already doing exactly that, asking gently to ship wheel binary packages as soon as possible, with mixed results. And we already did that for a small number of projects (the ones we care the most about).

Currently, each Python release introduces a various number of incompatible C API changes. Maintainers have to dedicate time at each Python release to make their code compatible. Sometimes, it takes one year and… then a new Python release introduces more incompatible C API. It doesn’t sound to be a pleasant work to do.

The deal here is that if you restrict your C extension to the limited C API and built it in a way to get a stable ABI binary wheel package, you will no longer have to do this maintenance work, and so you can use your time on other funnier tasks. Maintaining an open source dependency has to remain fun! Otherwise people just move away and abandon their project.

Well, I suppose that in practice they are still some issues time to time with the limited C API. But I expect that it’s way lower, since we have way stricter rules about the ABI compatibility.

Maybe what I say is just wrong and it doesn’t work as planned. But if we make it happen, it will be more pleasant for everybody.

I don’t think that all C extensions need bleeding edge performance. Many of them are just thin wrappers to another library, it’s a glue between Python and . The hot code in not the in C API glue code.

At least, with the limited C API you have the choice: either your use the limited C API, ship a stable ABI package and forget it. Or you can follow every C API change and adapt frequently your code to them.

guido · September 4, 2023, 10:03pm

I am tired of this discussion. It doesn’t seem we’re reaching any kind of agreement. Let’s talk in person at Brno. Until then I will stop arguing.

oscarbenjamin · September 4, 2023, 10:12pm

Speaking from the perspective of a maintainer of downstream libraries there is one unnecessary thing that core Python does that contributes to the problem of users wanting all packages available from day one. The Python download page always defaults to suggesting the latest release of CPython even if that is only one day old:

Users who want to use Python along with various other packages will find it disappointing if they install the “default” version of Python but then many important packages are not available. Those users are not well served by being guided to install day-old releases of Python.

h-vetinari · September 4, 2023, 11:03pm

As @hugovk noted, there’s numpy 1.26.0b1 that supports 3.12 since about 3 weeks. Note that this is not an “average” new CPython release, because the removal of distutils has a large blast radius for libraries like numpy, scipy, etc. This by way of explanation why things are taking a while, despite people working on this with very high urgency.

encukou · September 5, 2023, 7:02am

Behaviour is covered by the same policy we have for Python code – PEP-387. It works pretty well: you can generally assume your code will continue working. It’s rare and discouraged for packagers to set “defensive” upper limits on the Python version.
I’d be all in for a stricter PEP-387. But I think ABI stability guarantees should stick to ABI.

Indeed, that’s a direction the stable ABI could evolve in. In fact, HPy essentially does this today.
But, I’d like to support stable ABI in the reference implementation of Python, and as far as I can see, it doesn’t hinder development much more than API stability guarantees.

Going back to vim: I’d love hear your thoughts on how desktop applications should handle Python scripting/plugins. IMO, we should have a lighter-weight way to do that than each such project becoming a redistributor of Python, and e.g. re-releasing each new Python security fix. (Ask the release managers how painful it is for CPython to bundle OpenSSL!)
Allowing users to use a Python they already have sounds like a good way to go for me. It’s not perfect yet of course, but it’s a good direction.

Oh. Thanks!
If it was me I’d write a PEP. I don’t really understand why one is not needed for such massive C-API reorganizations.

encukou · September 5, 2023, 7:13am

Sorry that you feel the pressure, but AFAIK others want to improve this situation. A new release of Python should just work. Sure, it never was that way and maybe we’ll never get there, but that doesn’t mean we shouldn’t try.
Perhaps there’s something we can do to reduce the pressure on people who care about other things?

Yet, it has users that are very happy about it.

Neither does it help the homebrew install crowd, nor the [click here to Download Blender] crowd.
It’s not just building, and it’s not just wheels.

pitrou · September 5, 2023, 8:26am

I agree with Victor here. It may not be practically possible to start working on wheels before the final Python release. One reason can be that dependencies (such as Numpy) are not ready. Another reason is that Python RCs are not available in most distribution channels (such as conda-forge, etc.). Actually, even the final Python may take several weeks to be packaged in those distribution channels.

You cannot really ask package maintainers to go out of their way and implement a different build or testing procedure for Python RCs (or betas) than the one they use for released Pythons.

And of course if the new Python version requires changes to the package’s source code to maintain compatibility, then the new wheels will lag even more than if a mere rebuild had been sufficient.