Idea: Tracking ABI additions and changes for better pre-release wheel management

EwoutH · April 20, 2023, 3:38pm

I would like to initiate a discussion on the idea of tracking ABI additions and changes in CPython, as suggested by Alyssa Coghlan (@ncoghlan) in a previous thread about recognizing, managing, and installing non-ABI-stable Python wheels created with alpha/beta versions of Python.

Background
The main issue is that wheels for new Python versions are often not immediately available, causing delays in the adoption of new Python features and improvements. This is due to the ABI’s potential to change during the alpha/beta phase. To address this problem, I would like to discuss a system to manage wheels on PyPI created with non-ABI-stable Python versions, extending the wheel-building period from the current 2-month release candidate phase to the full 12-month development cycle.

Proposal
The proposal involves tracking two new numbers in CPython:

Last ABI addition version
Last ABI change version

Wheel building could then capture the first number (indicating all the APIs that might have been used in the wheel) for wheel consumption to check against the second number (if the wheel was built against an ABI that’s newer than the last breaking change, then it isn’t at risk of ABI-break induced crashes).

Benefits
By tracking ABI additions and changes, pre-release wheels can be useful in development without depending on full test coverage. This will allow for better management of pre-release wheels and speed up the adoption of new Python features and improvements.

Next steps

Continue discussing the idea and gather feedback from the community.
In the case of general support, write a PEP outlining the proposed changes.
Implement the ABI number change on the CPython side.
Implement a separate change on the packaging side for wheels to record the appropriate number.

I would love to hear everyone’s thoughts and opinions on this idea!

barry-scott · April 20, 2023, 4:40pm

I can, and do, test against alpha and beta so that my projects are ready for a final build.
But until cpython is released I cannot build final kits and trust that my testing is reliable.

How does the ABI tracking help?

EwoutH · April 20, 2023, 6:23pm

For many projects with light dependencies it’s relatively easy to build them all from source. But building NumPy, Cython, or wheels in the Geospatial field can sometimes be more complex, or just time consuming, especially in CI.

By allowing pre-release wheels to be uploaded earlier, it allows downstream dependencies of those project to start testing new Python versions without having to build all dependencies from source.

In this thread I go into it a lot deeper: Create system to recognize, manage and install non-ABI-stable Python wheels (created with alpha/beta versions of Python)

barry-scott · April 20, 2023, 10:07pm

So you have wheels built against beta builds that may or may not work correctly againt the final cpython release. That is risky in my view.

If you can delay the general release of the final build of cpython, but provide that build to make wheels from that could work.

encukou · April 24, 2023, 12:57pm

A big issue would be formalizing what exactly counts as an ABI change.
There’s abidump, but it’s Linux-only, and it often reports false positives so the Release Manager reviews it manually (when the ABI is frozen).

If the changes are too frequent – close to every Alpha/Beta – the system would bring no benefit.

IMO, porting NumPy/SciPy to stable ABI or HPy, and using version-specific API only for optional version-specific speedups (released as wheels in addition to more universal ones), might very well end up being less work overall.

That’s what the Release Candidates are for. Those are ABI-compatible with the final release, so you have a delay of 2 months.

barry-scott · April 24, 2023, 1:57pm

I have always thought of the release candidates the last chance to find bugs and have them fixed. And that only bugs will be fixed, nothing feature related changes.

I’m experienced enough to know that even seemingly trivia fixes can cause issues.
That’s why I wait for the final release.

Do I have 2 months from last RC until that last RC becomes the final release?

encukou · April 24, 2023, 2:17pm

No. But a 3.x.0 release typically has fewer changes than the 3.x.1 after it.
If no one tested with the RCs, bugs would only get fixed after the final release. A delay wouldn’t help.

brettcannon · June 9, 2023, 9:35pm

I agree. Plus wheels for betas can be overridden via build numbers later on if a breaking change does occur. I think trying to accurate SemVer the C API is going to be a losing battle.

ncoghlan · June 19, 2023, 11:03pm

Folks that need the higher performance versions would still face the same problems they do today, though.

As far as the two proposed numbers go, I think even the “API addition” number would increment more slowly than the Python version number, since adding new C API functions usually meets objections - it is preferred to call Python APIs via the existing general purpose Python APIs unless there’s a clear performance or usability benefit in expanding the C API.

The “ABI change” number should be updated even less frequently, since we actively avoid trying to change public struct layouts and function signatures. As more structs become opaque future updates to this value should become even rarer.

From an ongoing maintenance point of view, the two numbers shouldn’t be that much harder to maintain than the opcode magic number (easier in some ways, since they would be bumped at most once per release, and if they do get bumped, it would just be to the encoded release number rather than to an arbitrary value).

Exposing the numbers in the sys module would only need to happen once, and even exposing them in the docs could probably be automated.

Making use of the new numbers in wheel compatibility metadata would be a separate follow-up project that could only be considered if CPython decides to track and publish the values in the first place.

EwoutH · May 24, 2024, 12:07pm

With the first beta release of Python 3.13 we see the first libraries preparing for Python 3.13 support and wheels, like Pandas. However, it currently isn’t possible for any libraries to publish wheels in a useful way, that both acknowledges the possibilities of changes in the Python ABI, and simultaneously makes it easy for downstream packages to start testing on their wheels. For that we need to wait until the first release candidate.

So I would like to reopen the discussion on this issue: How can we lengthen the timeframe maintainers have to prepare support for the next Python version, including testing and deploying wheels, considering significant upstream dependencies by downstream libraries.

hugovk · May 24, 2024, 2:41pm

I’ve already started publishing wheels for Python 3.13 to PyPI, and noting they’re experimental/for early testing in release notes. This is very helpful for people who depend on these libraries (and people who depend on theirs) to also test and prepare.

The sooner bugs can be found and reported upstream to CPython, the sooner they can be fixed and included in the next beta for verification and further testing.

If there’s an ABI change in a later beta, that’s fine, we’ll create a new (experimental) release. Modern tools like cibuildwheel and Trusted Publishers make releasing much, much easier.

EwoutH · July 8, 2024, 3:06pm

@hugovk Thanks a lot for that effort. One issue is that they are not (simply) installable from PyPI, as far as I understand.

Maybe this is something that should be solved at the PyPI level. Many packages do have wheels available, but don’t upload them to PyPI, because it isn’t recommended to do so. And it isn’t recommended to do so, since they are not ABI stable. And it isn’t recommended, because there is no way to signal that an wheel is build with a pre-stable Python ABI.

PyPI could allow a CLI option like --allow-nonstable-abi that allows to install wheels with non-stable ABIs. By default it would of course be False, but it could be set to True if users want to test wheels with non-stable ABI wheels, which would be highly beneficial in CI setups.

Do the wheels contain the Python version with which they are build? If so, everything with alpha or beta could be marked nonstable ABI, and rc and release as stable.

hugovk · July 8, 2024, 5:27pm

Uploading to PyPI is the best way for you to make your project available for people to test, and to test their projects. Especially if you’re a dependency of a dependency.

I recommend it!

If you’re using a preview Python version, it’s already not recommended for production, so I think the same applies to wheels for preview Python versions: please test, but don’t use use in production. As the 3.13 release manager said:

We strongly encourage maintainers of third-party Python projects to test with 3.13 during the beta phase and report issues found to the Python bug tracker as soon as possible. While the release is planned to be feature complete entering the beta phase, it is possible that features may be modified or, in rare cases, deleted up until the start of the release candidate phase (Tuesday 2024-07-30). Our goal is to have no ABI changes after beta 4 and as few code changes as possible after 3.13.0rc1, the first release candidate. To achieve that, it will be extremely important to get as much exposure for 3.13 as possible during the beta phase.

I think it’s rare for ABI changes during beta. If it does happen, you can make a new release, or upload a new wheel with a build tag.

EwoutH · July 9, 2024, 7:50am

Okay, that’s interesting to know and - for me - new information. Can I communicate this as the official stance of the PSF / the steering council / PyPA?

I swear I can recall that somewhere official docs went against uploading wheels build with beta Python versions. I can’t find it anymore, frustratingly.

EwoutH · July 9, 2024, 8:03am

Found a few things.

In the cibuildwheel release notes from PyPA they mention:

While CPython is in beta, the ABI can change, so your wheels might not be compatible with the final release. For this reason, we don’t recommend distributing wheels until RC1, at which point 3.13 will be available in cibuildwheel without the flag. (#1815)

In the docs they also mention this explicitly:

This option is provided for testing purposes only. It is not recommended to distribute wheels built when CIBW_PRERELEASE_PYTHONS is set, such as uploading to PyPI. Please do not upload these wheels to PyPI, as they are not guaranteed to work with the final Python release. Once Python is ABI stable and enters the release candidate phase, that version of Python will become available without this flag.

(written by @mayeut, @henryiii and @joerick)

Then, the Python pre-release notes also include this implicitly. 3.12.0b4 doesn’t mention wheels or PyPI, while 3.12.0rc1 does:

We strongly encourage maintainers of third-party Python projects to prepare their projects for 3.12 compatibilities during this phase, and where necessary publish Python 3.12 wheels on PyPI to be ready for the final release of 3.12.0. Any binary wheels built against Python 3.12.0rc1 will work with future versions of Python 3.12. As always, report any issues to the Python bug tracker.

I understand there might be differences in how it’s communicated in official docs and to maintainers of highly involved libraries. I also understand that flipping a default switch at cibuildwheel would affect a huge number of projects with many potential edge cases and issues. But it would be nice to have somewhat of a general best practice about when and when not to upload beta wheels to PyPI.

Should we split this off into a separate discussion?

brettcannon · July 9, 2024, 11:03pm

If you expect you will be able to do another upload with a higher build tag if the ABI changes in a way that won’t work in the final release than building for beta releases for CPython is fine. The guidance from cibuildwheel is for folks who will only upload the files for a release once.

EwoutH · July 10, 2024, 11:44am

Thanks for getting back Brett.

Let’s take a step back here. The goal is to provide the long dependency chain of libraries with more time to get their affairs in order during the pre-release phase of a new beta Python version. Many libraries can’t start testing because their testing and CI workflows require certain dependencies. However, often these dependencies are not built from scratch, but installed from wheels, to keep CI time and complexity in check. Especially scientific and geospatial packages have complex toolchain requirements to build wheels, which would be madness for each downstream project to manage.

The practical effect of this, is that many dependencies only start testing a new Python version when all the wheels of their upstream dependencies are available. This requires a large coordinated effort to track the layers and layers of upstream dependencies down that don’t have Python 3.13 wheels uploaded yet.

Worse, many projects have robust but strict version control and release protocols. Most often new wheels only get uploaded on a new release, which means there can be some delay between wheels compiling and them being uploaded to PyPI. Backports are not always possible, and often also take some delay.

Currently this whole process is cramped in a mere two months, this year between 2024-07-30 and 2024-10-01. Of which the first month there are many vacations.

So this is the reality we have to deal with - at least from my perspective. Now the problem with the solution you’re proposing, is that I can’t ask “hey dear small upstream dependency, running fully on volunteers, could you please upload manually-build nonstable-ABI wheels to PyPI, because we need to start testing. Oh and by the way, please track any CPython ABI changes, and be ready to re-upload on a change. By the way, PyPI doesn’t allow anything to mark them as pre-ABI, so be ready to explain all this stuff to your regular users”

And that to dozens of projects.

So, what we’re missing, is a system to deal with ABI changes or with beta wheels.

This problem could be solved on many different levels. I think about now I suggested about all of them in this thread and the previous one.

If we can get some system to properly manage pre-release / nonstable ABI wheels, that would increase the test time from a mere two months to close to half a year.

So on what level do we need to discuss this to move it forward? Do we consensus about what the problem is, and do we need to discuss potential solutions? Or don’t we and do we need to discuss that?

brettcannon · July 10, 2024, 9:04pm

Well, you can ask, they can also have every right to say “no”.

You probably need to write a PEP.

I think there’s agreement this is an issue, but how widespread it is and what solution is going to make the biggest difference w/o doing some research is unknown (e.g. getting most of the community to make a new release within 6 months of a new Python release is still difficult, getting nearly all projects to release wheels, etc.).

rgommers · July 11, 2024, 10:29am

As a data point here: for NumPy 2.0 we just broke ABI for the first time in 10+ years (on purpose at least). When we actually had to merge ABI-breaking changes in the main branch, this was very disruptive for CI setups of downstream packages which were testing with numpy nightly wheels, and a major hassle to deal with.

CPython ABI changes would be even more of a hassle to deal with. In practice, I think the proposed idea here to allow or encourage uploading wheels for releases of Python packages to PyPI is not going to be that useful as a result, and I don’t think we’d consider doing that for NumPy or any other package with a heavy release process.

The most high-impact work/solution is still improving the stable ABI and encourage packages to move over to that. And if more time is needed for rollout after the ABI is declared stable, then increasing the time between that point and the final release would be way better than anything done to encourage wheels targeting a not-yet-stable ABI. The change from zero stability pre-.0 to a 2 month window was quite nice - but 2 months is still a pretty short window.

ncoghlan · July 12, 2024, 4:03pm

I do wonder if there might be something that could be done in terms of the way we promote CPython releases. At the moment, docs.python.org and the main download links on python.org switch the instant we publish a new feature release. Folks that don’t know any better are actively pushed into using the latest and greatest version, even though we know the wider packaging ecosystem won’t have fully caught up yet.

Perhaps we could potentially adjust things to give releases a formal “ecosystem update” period after the October release date where the release is promoted out of its pre-release status (so all the usual backward guarantees apply, the support lifecycle timer starts ticking, and all the downstreams doing their own source builds anyway can get their respective release processes started), but the default download links don’t switch yet and the release download page contains a caveat and explanation regarding the nature of the ecosystem update period and the potential impact of adopting a release that is still in that phase? (i.e. if you build all your own packages from source, you’re fine, but if you rely on projects publishing pre-built binary artifacts you may need to wait a while)

For example, perhaps the ecosystem update window could run from October to December, with the default download links only switching the following January. There would still be libraries without artifacts published after that date, but it would mean the actively engaged folks aren’t trying to get the entire stack updated in the time between rc1 and the final release.

Right now, that ecosystem update window exists in practice, but it’s entirely implicit.

Note: This post had started down a more technical path before the above idea occurred to me. I think this is still relevant to the topic as background, so I’ve included it, but it doesn’t provide anything that could potentially improve matters in the near term, so I moved it to the end rather than keeping the post in the order I originally wrote it.

Part of the challenge here is that the scale of the problem varies a lot depending on what domain people are working in (it’s the usual refrain of binary dependency chains being shallow in most domains, but spectacularly deep in data science and machine learning).

I recently dropped a project from targeting Python 3.12 to targeting Python 3.11, since one of the dependencies involved didn’t publish Python 3.12 wheels yet. That’s the nature of package distribution having a long tail, though: in many cases, rebuilds to support a new version are reactive in response to demand rather than proactive, and that demand may not eventuate until the project’s main authors want to upgrade their Python version (with users treating the problem as a version constraint rather than pestering the maintainer about publishing new artifacts), or until the binary artifacts start falling more than a single release behind.

Projects switching to the stable ABI genuinely fixes this problem as the existing binary artifacts continue working even on new CPython releases.

I’m also still mulling over some ideas which came up in the CPython CalVer thread about potentially splitting the way we version the CPython ABI from the way we version CPython as a whole, such that it would theoretically be possible to make a CPython release that remained backwards compatible with the previous release’s ABI (nothing coherent enough to even post an Ideas thread about yet, but I do think there’s potential in the concept).

Increasing the amount of time between the “ABI freeze” date and the “general release” date is unlikely to happen though, as there are genuine technical reasons we picked rc1 as the freeze date for the current scheme where the Python version and the CPython ABI version are tightly coupled, and even with only 2 months, there’s a large portion of the user base that has all the binary dependencies they need available on day 1 (even data science users, thanks to the tremendous efforts on that front from the maintainers of the core data science stack).