Adoption of new Python in PyPI packages, longer RC periods?

hauntsaninja · September 30, 2024, 6:46am

Today I polished up an old recipe of mine for checking whether your dependencies support a given version of Python. Sharing now, in case it’s helpful to anyone in advance of Python 3.13. It determines level of support by looking at a combination of version specific wheels and classifiers.

I wanted to start a conversation about whether we should allocate our time within the release lifecycle differently across alpha → beta → release candidate.

Currently testing large applications on prerelease Python is quite cumbersome until your dependencies support the new Python, especially your extension module dependencies. This typically only starts happening in earnest during the release candidate phase, when ABI is frozen.

I also bring some data! I took a collection of 1312 PyPI packages my workplace uses and used the above code to determine when they appeared to explicitly support a new Python version. At work, we’re on Python 3.11 (upgraded in 2023/10, about a year after release).

Here are graphs of when packages that eventually added a classifier or explicit wheel for a given Python version did so:

Here you can see them overlaid:

Usually pure Python dependencies work pretty well on new Python versions. Sure, it’s nice to know via classifier that upstream is testing on a given version, but it’s much less of a blocker than extension module support. So here’s the same chart, but filtered to packages where we observe an upload of a wheel that explicitly supports the given Python version:

…which is neat, on that last one, Python 3.13 is currently running about a month faster than Python 3.11.

Here are some thoughts:

Every time these lines move is because someone somewhere did something in response to a new Python version, then made it freely available on the internet. Open source is so cool
While spot checking, I noticed a number of these lines moved specifically in response to @hugovk doing things on the internet. @hugovk is so cool
This isn’t visible in the graphs I shared, but there are extension modules supporting prerelease Python’s during the beta phase. I asked Hugo about this and he said that ABI breakage during beta hasn’t been an issue, and if it were ever an issue, you can reupload with a different build tag.
- Is this something we (or tools like cibuildwheel) should encourage?
Sphinx declared support for Python 3.13 via classifier in August 2023, well before 3.13a1 was even released and the earliest amongst packages in my sample. Recent events make this especially amusing to me
We could consider changing the lifecycle from (7 months alpha, 3 months beta, 2 months rc) to (6 months alpha, 2 months beta, 4 months rc). Eyeballing it, but also disregarding publicity effects, that could potentially triple the number of packages that test / build on new Python versions.
How surprised are folks that we had an issue surface a week before 3.13 release and we had to move it back?
Are there any graphs people would find interesting to see?

barry · September 30, 2024, 5:06pm

Pretty surprised. I don’t know if we could have gotten signal earlier (e.g. during the betas), but it definitely makes me nervous to be making such a big change this close to the release. We discussed it, @thomas (RM) is on board, and I agree, but still, I’m .

hugovk · September 30, 2024, 8:51pm

Thanks for the graphs! Good to see data verifying my feeling that more projects are adopting sooner.

Thank you!

Yes, I think so – it helps people test their own projects when their dependencies (and dependencies of dependencies etc.) can be installed easily. We can call it experimental support. Part of the deal of running pre-releases is not to use it in production. (See “ABI breaks?” at Help test Python 3.13! - DEV Community)

Yes indeed Sphinx is well-tested against pre-release Pythons because we run doctest on the CPython CI. Recent events suggest we need more benchmarking or performance testing.

Each phase can be characterised:

alpha – add your new features, add your new bugs!
beta – no new features, fix your bugs!
RC – nothing but docs and a few stability fixes

The suggested change means less time for new features, and fixing them, and a longer period of branch stabilisation.

It’s a question for the core team – would we be happy with a twice as long RC phase of minimal activity, potentially with a locked branch with more bugfix PRs piling up for 3.xx.1?

And a new feature added just after beta goes into the next feature release, which will be out in 18 months instead of 17 months?

hauntsaninja · September 30, 2024, 9:35pm

We could consider changing the lifecycle from (7 months alpha, 3 months beta, 2 months rc) to (6 months alpha, 2 months beta, 4 months rc)

Each phase can be characterised […] RC – nothing but docs and a few stability fixes

By extending the RC phase, my primary intention is to have a longer period of time where ABI is frozen, so the community has more opportunity to ship wheels. In my mind, the proposed 4 months RC would be equivalently beneficial to “2 months gamma (ABI freeze) → 2 months RC (minimal changes)” or something like that. Encouraging “experimental support” wheels would be an alternative way of addressing this.

In general, I never quite got the logic of locking the branch and avoiding even landing bugfixes for extended periods of time until 3.xx.1. I’m happy to do anything that makes RM’s feel more confident and happier and saner, but from a user’s perspective it never quite made sense to me.

thomas · September 30, 2024, 9:36pm

FWIW, as RM, the RC phase is currently a mess. ABI stability should not be the main concern for the RC phase. Ideally the whole RC phase would last no more than a month. ABI changes in the beta phase are quite rare, and also relatively easy to work around. But given that this is the expectation the community has, and how reluctant people are to build release artifacts for the betas, maybe we should lean into it. We can have an extra month in the RC phase, and have another RC release, where only the last (planned) one is a “real” release candidate in the traditional sense.

thomas · September 30, 2024, 9:43pm

I can explain this Any change we make in the RC phase runs the risk of invalidating the testing people have already done. It’s not just about ABI changes, it’s about all the subtle semantics across all of the standard library. A change in rc2 is much riskier than a change in, say, 3.xx.1, because there is no 3.xx version for users to fall back to if the change ends up breaking them. A change in 3.xx.yy that has unforeseen negative impact means users are stuck at 3.xx.yy-1. The same change in 3.xx.0 means they can’t use 3.xx at all, and making that change after we’ve given them a release candidate and said “this is as close to the final release as we can get” is disruptive and demotivating.

BrenBarn · October 1, 2024, 7:44am

This type of data is fascinating, thanks!

The main thing I see in the graphs is that adoption shoots up most noticeably after the final release. This makes sense to me as many packages will not update until they know the ship has sailed. But I’m curious then why you say that extending the RC period will “potentially triple the number of packages that test / build on new Python versions”. Do you mean triple the number of packages that test/build on a new version before its final release?

Given that, what I would tentatively conclude is that extending the actual lifespan of a single release is what would be most beneficial, as it gives packages more time to update before their target shifts (i.e., before a new release comes out).

What I’m most curious about though is the “network topology” of these updates: as in, which updates are blocked on which others, which may to some extent be reflected in the pace at which dependents release after their dependencies. I’m not sure if you can get this with your data, but I always have this hunch that there is a bottleneck where a large number of packages are waiting on a small number of dependencies before they can move forward.^[1] It seems that many of these “keystone” packages are ones that make the strongest effort to test on pre-release versions. If that is the case, then if extending the RC helps these keystone packages, it could have a multiplier effect where more packages are able to follow up because the keystone packages can unblock.

I guess more generally, the way my thoughts go when I see this is: is there something we can do that is targeted at, not necessarily the most packages, but the “most important” packages (in terms of dependency load-bearing)? And then will benefiting them have automatic benefits for other packages (due to unblocking the dependency chain)? If so, it might be useful to hear from the teams that maintain some of these keystone packages to see if extending the RC phase would make things any easier on them?

The main example I encounter is the scipy stack, where nothing can move until numpy moves. ↩︎

hugovk · October 1, 2024, 9:37am

Yes, it shoots up after final release, but RC also has a noticeable increase compared to beta. Extending the RC means this increase could apply to more packages, so more are ready before final.

henryiii · October 1, 2024, 3:17pm

One big change this year vs. 3.12 is NumPy support happened earlier. Last year’s 3.12 removed distutils, so NumPy and other packages were delayed. I think support for betas and RCs has continually been getting better, but removal of distutils cased 3.12 to look more like 3.11 instead of been right in-between 3.11 and 3.13. Not sure what affect free-threading might have had, NumPy got that too at the same time and still managed a pretty early release.

The ability to get a RC of CPython is getting even better. cibuildwhel and GHA have been fantastic in helping people try out betas and RCs, but this year conda-forge has been building with the RCs earlier than I remember them ever doing (I’m always shocked by how many students use Conda literally just to get Python, then use pip from then on - I think in my Software Engineering for Science class, I’d guess 50%-90% of them do this!). We’ve also got the Python distributions that uv/hatch/pdm use supporting 3.13.

As for cibuildwheel, the problem with shipping a binary with an ABI change is that it’s really hard to know you need to fix it, and to fix it. Rebuilding a complex package, adding a build number, and uploading it isn’t trivial for many workflows. And one or more versions of your package will just segfault if you don’t. It would be interesting to know how often ABI changes are made in betas, and maybe put a small discussion on it in the docs. We start supporting opt-in during the betas, and enable builds by default for the RCs. I remember from somewhere that the last Beta (4) is also promised to be ABI stable, though we’ve always doing the build-by-default on the first RC.

Did the 3.13 release get moved? Is there somewhere to see the expected date? I see the sphinx & Incremental GC issue, but no dates or clear resolution.

I should note, since people will sometimes look at the absolute values on this plot and say that 3.x isn’t ready to be used: a lot of packages support 3.13 but don’t have either explicit classifiers (I really like explicit classifiers, but some people don’t at all), or just haven’t bothered to update the classifier. I was just looking though some of the packages I’m active maintainer on, and 11 have wheel or classifier, and 30 don’t. Most of those, though, like build, nox, etc. do support 3.13 and are testing on it, there just hasn’t been a release to update the metadata. 2-3 need a release to get wheels, and I’m not aware of any that actually don’t work with 3.13. FYI, cibuildwheel itself doesn’t have a 3.13 classifier yet, since while it builds 3.13 wheels, we haven’t actually tested running it from 3.13 yet, which is what we base the classifier on.

Jelle · October 1, 2024, 3:29pm

henryiii · October 1, 2024, 3:49pm

Thanks! I must not have read the initial post fully. The discussion afterwords wasn’t clear. Looks like 3.13rc3 is out: Python 3.12.7 and 3.13.0rc3 released

hroncok · October 1, 2024, 4:10pm

Slightly related: Python 3.13 Wheels Readiness - Python 3.13 support table for most popular Python packages

ngoldbaum · October 1, 2024, 6:32pm

This is largely thanks to Meta sponsoring the team I’m on at Quansight Labs to do the work.

In the counter-factual universe where Meta didn’t commit to help out when PEP 703 was accepted, then free-threading likely would not have been ready unless a community member happened to take that on. But of course free-threading was merged contingent on that support and that probably effected the willingness of any community members to take on what many thought of as an impossibly big task^[1].

Making NumPy truly thread safe is a very big task that we need to work as a community to figure out and accomplish, but it turned out to be not so bad to fix all of the hopelessly thread-unsafe global variable use in NumPy and get the free-threading build in a state with similar guarantees to the GIL-enabled build. ↩︎

nas · October 1, 2024, 6:42pm

I think a longer RC phase would be better. Python has a quite low-level and rich extension and embedding API and so it’s difficult to totally avoid ABI/API changes in the beta phase. The 3rd party extensions are reluctant to spend a lot of time supporting betas when they later have to fix things because of last minute changes. Things have gotten a lot better recently in terms of packages supporting beta and RC versions. The nightly scientific Python wheels that are on Anaconda are really useful to me. Still more time in the RC phase would be helpful.

A more concrete example. Imagine a software application that requires all of the following packages: matplotlib, numpy, openpyxl, pandas, pillow, pyarrow, reportlab, scikit-learn, statsmodels, . It cannot be tested with pre-releases until all those dependencies are updated. That takes quite a bit of time.

steve.dower · October 1, 2024, 8:07pm

The reason there are last minute ABI/API changes that affect 3rd parties is because we aren’t getting the feedback about issues earlier. We already shouldn’t be making deliberate/unnecessary changes here during beta anyway.

Changes to our API that occur late in the process are really for the best. The alternative is that we’re stuck with a problem for an entire release cycle, and probably for an entire deprecation period, only because we promised to not fix it during RC and we didn’t find out about it during beta. A break during RC is better than being broken for 2-5 years.

Adjusting the labelling of the releases is pure marketing. We can call them all RC’s if that’s what it’ll take for people to test more than a month before stable release, but ultimately, the point of all pre-stable releases is that we can make changes if it’s better for the change to be made than for it to remain broken.

oscarbenjamin · October 1, 2024, 10:01pm

Speaking for the packages I maintain there is always CI that tests against the prereleases of CPython since it is now easy with actions/setup-python and e.g. 3.13-dev as the version. That picks up every prerelease from alpha onwards. However I still want to wait for 3.13 final release before releasing a version/wheel that is supposed to be compatible with 3.13.

The problem then is how this plays out further on in the dependency stack because downstream either needs to build their whole stack from source in their 3.13-dev CI job or they can’t test until upstream uploads wheels.

Similar reasoning applies for package authors as well which is why they may not want to put out final releases that claim compatibility with 3.13 before 3.13 is finalised.

bwoodsend · October 1, 2024, 10:13pm

As someone who never thought to read the meaning behind each of Python’s pre-release states, I never realised just how similar the beta and RC candidates are intended to be to the real release and would therefore not declare support for a new Python version until the true release on the grounds that it felt presumptuous to assume that some last minute change wouldn’t come along and break everything. I see now that that’s wrong but I would be surprised if there aren’t others in that boat.

oscarbenjamin · October 1, 2024, 10:43pm

Maybe there could be an alternative version of PyPI like beta.pypi.org (test-pypi is not suitable) where package authors can upload their nightly wheels so that downstream projects can test them. Basically like the numpy/scipy nightly wheels but open to everyone who has PyPI packages rather than a small curated set:
https://anaconda.org/scientific-python-nightly-wheels/repo?type=pypi&label=main

Then it would be easy for everyone to have a bleeding edge CI job that uses 3.13-dev plus nightly wheels of all dependencies.

hugovk · October 1, 2024, 10:48pm

Thank you for testing early!

I’m curious, why not? What sort of thing would you need to release earlier?

Indeed, some simply can’t test until dependencies have wheels, meaning some have to wait until after final release, which could be too late to give feedback to CPython. Dependencies releasing a wheel during prerelease would help this.

jamestwebber · October 1, 2024, 10:54pm

At the downside of some additional infrastructure, this seems like the right solution^[1]. If it were possible for automation to test against whatever is currently available on dev-pypi, that would lower the friction for package maintainers who need to wait on their dependencies.

although I would bikeshed the name to dev.pypi.org or something else besides beta ↩︎