Adoption of new Python in PyPI packages, longer RC periods?

hauntsaninja · October 8, 2024, 11:46pm

My goal is to see if there are ways we can increase our confidence in the 3.x.0 release

I think the way to accomplish this is to empower people who could encounter and triage regressions to test their use cases against CPython prereleases.

Currently, if you have a substantial app or a package with a lot of dependencies, it’s very cumbersome to test a prerelease until your dependencies support Python 3.13 — in particular, your extension module dependencies. (I have several applications at work where it’s effectively impossible for me to test prereleases). It becomes dramatically easier to test prereleases once your extension module dependencies ship wheels that support 3.x.

Crucially, this only really starts happening during the RC phase, because that’s when we freeze ABI, and so is when extension modules start uploading 3.x wheels to PyPI. This basically means that large applications will only start testing during the RC period, where we’d like to be conservative and have minimal changes till release. It’s not ideal that we’re discovering large regressions just a few days before the final planned release.

There seems to be rough consensus in this thread about a) trying to increase confidence in 3.x.0 release, b) extending the period of time prior to release when it’s easy to install dependencies as a way to do this.

There seems to be some divergence in the thread on how to accomplish this. Solutions include:

Moving the ABI freeze earlier (my concrete proposal)
Normalising sharing experimental wheels prior to ABI freeze, implemented via either a) different package index, b) detecting ABI breaks and having maintainers yank or build tag shadow, c) hoping there won’t be ABI breaks
Evangelising the Stable ABI

We can push on all three of those things. I’m less bullish on experimental wheels, for reasons discussed previously, but please make it happen and make me eat those words

To be explicit, my one line tldr is: “get packages to provide easy-to-install versions that support 3.x ASAP (prior to 3.x.0), so it’s easy for anyone to test 3.x prereleases on their workloads”

The graphs are curiosity candy. I think it’s genuinely really cool to see the whole ecosystem coordinate — Python 3.13 was probably the easiest Python to test prerelease ever. It’s also informing to e.g. see the hockey stick in version specific wheels at the start of the RC period

methane · October 9, 2024, 8:27am

As a library maintainer, I didn’t care about Trove classifiers. I don’t want to create new release only for adding a classifier for new Python version. Should I do?

I want some notification when RC is released, available at actions/setup-python and cibuildwheel, and Cython supporting it is available.

willingc · October 9, 2024, 6:20pm

Thanks, @hauntsaninja, for clarifying your thoughts.

I’m okay with moving the ABI freeze earlier if it doesn’t extend the release cycle. Projects benefit from having predictable release schedules. We’ve set a good cadence over the past few releases, making release dates more predictable.

Personally, I find value in the CPython version being testable in CI on my project’s PR merges. Like @methane, I find Cython support important. Overall, I’m less concerned at releasing my project at 3.x.0, but more interested in releasing when well tested after 3.x.0 release even if that becomes 3.x.1.

hauntsaninja · October 9, 2024, 7:19pm

Personally, I’d add the classifier at the time it’s green in CI, but only make a release whenever I usually would. There’s some more discussion in this thread. I would still make a release ASAP when adding version specific wheels (and thank you for adding 3.13 wheels to msgpack!).

I like your idea of opt-in notifications when your build / CI dependencies support the RC!

brettcannon · October 9, 2024, 9:34pm

I feel like if we just say, “doesn’t @hugovk already have a GitHub app to open an issue on any registered repo when a new release at the chosen release level comes out?”, it will magically appear.

hugovk · October 10, 2024, 6:28am

If you’re lucky, you’ll even get a PR

I think adding it is enough, and release as you would following your normal schedule. It’s a useful signal to your users that you have tested and believe you have support. We can check the repo if there’s no classifier on PyPI yet. (Tip: add it already during your pre-release testing.)

Good news on all fronts!

There’s a new RSS feed you can follow that will notify for each release, including RCs: Python Releases

I’ve hooked it up to this Mastodon bot: Feed for python_releases

3.13 has been available on actions/setup-python since alpha 1: Add Python 3.13.0 alpha 1 · Issue #742 · actions/setup-python · GitHub

And for cibuildwheel since beta 1: Release v2.18.0 · pypa/cibuildwheel · GitHub

Cython is more challenging, as it’s more sensitive to C API changes, but they’ve also had preliminary support for 3.13 since alpha 1 to allow early testing: Cython Changelog — Cython 3.1.0a0 documentation

In fact, we also reverted some C API changes at the start of the alpha for the benefit of early testing with Cython: Python 3.13 alpha 1 contains breaking changes, what's the plan?

Meaning it’s been possible for many to test for a year before the big final release, and to create wheels to help others test for 6 months. I think the main blocker has generally been a lack of pre-release wheels from dependencies.

h-vetinari · October 10, 2024, 6:42am

I disagree on the “pure marketing”. If the RCs didn’t promise ABI-stability with the final release, we wouldn’t be able to do the builds in conda-forge ahead of time (and same for many other packages pushing wheels early to unblock the chain of dependencies).

That said, I do agree that if it’s warranted, breaking during the RC phase is better than having a broken state for several years. It’d just mean that everyone would have to throw away (yank, mark broken, whatever) the artefacts that were based on a rc1 that ended up not ABI-compatible with final. So it can be done if a strong enough reason is discovered late in the process, but it has a cost that should be weighed carefully.

I’ve had the same question, and the outcome was that Cython 3.0.11 does support the regular Python 3.13 (with the GIL) – for free-threading tough, Cython 3.1.0 (no tag of any kind yet) is needed.

nmstoker · October 17, 2024, 3:56pm

This seems sensible for most scenarios, however with the absence of troves on some packages that are essentially “done” and stable for long periods of time, might it be feasible to find a way to allow the maintainer to add a new trove (for the latest Python release) to an existing package release version in PyPI?

That way people browsing packages get a bit more reassurance than if there’s no trove at all and the maintainer isn’t pressured to do any releases purely for signalling that they understand it works in the latest version of Python.

dimaqq · October 23, 2024, 8:11am

A side comment on PYPI stats.

I help maintain one library and when a new Python version comes around, we test it, usually manually, once; rarely in CI, and let it be. Our library is pure Python, so we publish a single wheel. We don’t feel like making a release to add a classifier only.

In other words, there are packages like ours where RCs are tested, but you can’t see that in statistics.

hauntsaninja · October 23, 2024, 10:03pm

I agree! (and thank you for manually testing against RC)

For that reason, I don’t report the denominator of 1312 packages in the graphs (the y axis is not “percent of sample”). The comparison between Python versions still holds and informs.

It’s also why I break out wheels with explicit versions into its own graph. That affects user experience on new Python much more strongly than classifier — this also motivates the discussion of an earlier ABI freeze in this thread.

DanielNoord · November 4, 2024, 6:25pm

One of the issues that I don’t think has been discussed so far is that the ecosystem is dependent on some pretty old packages that are no longer receiving the love they used to get. This is often for good and valid reasons, a maintainer might have just moved one which is fine, but it does mean that testing is difficult.
Sometimes dependencies are not even needed, but as a downstream user I can’t do much about dependencies being incorrectly declared if the dependency that is doing so isn’t maintained itself.

An example for Python 3.13 would be eventlet, which is clearly struggling to support 3.13 even though issues were already created in June, so well before the RC period.
I would have loved to help test the RC (and current versions) at my day job and I have the perfect microservice that is limited in scope that I could do that with. However, due to dependencies on fastapi and therefore eventlet I can’t even get a relatively simple webserver to build.

What could I, as downstreamer user, do to help out in this case? I lack the knowledge of threading and eventloops to really help out in eventlet and I don’t want to start posting “Bump” in these issues as that doesn’t contribute at all. I also don’t want to take any help from Core Developers for granted, but it feels like in cases such as these it might actually be beneficial if there was a way to flag these critical, broken dependencies during the development cycle and see if Core Devs could help out. They often have the required knowledge of the Python internals (and changes in the new version) that would be helpful to unblock these packages.

Perhaps, this is all just too much rambling, but I’d like to improve the situation I’m noticing and I don’t really know how to.

pf_moore · November 4, 2024, 7:15pm

I think that’s a different issue. This issue is about how we give project maintainers as much time as possible to test on new releases. In the case of eventlet, it sounds like they had plenty of time (they were working on the issue in the beta phase of 3.13). The problem there is more about how much maintainer time is available. And while that’s a concern, it’s not something that can be helped by changing the schedule for beta and RC releases.

Open source sustainability, and in particular managing the fact that there are no guarantees of any particular level of support from open source maintainers, is a different, and far larger, problem.

And yes, your point that the amount of testing time available depends on how quickly your dependencies provide at least minimally workable releases for the new version is important. But IMO the starting point has to be doing what we can for projects that are not constrained by their dependencies. Because that’s the place we can have the most direct impact.

DanielNoord · November 4, 2024, 8:09pm

I agree with your last sentence, it is indeed the place where the Core team can have the most direct impact. However, from my experience at my day job and as pylint maintainer I wonder how big that impact will actually be. I think that although the impact is direct, it personally believe it will be very minimal.
With 3.11 pylint and therefore those that depend on us in their CI (for better or worse) we were blocked on dill, which struggled to release a version that was 3.11 compatible. Eventually leading to dill-pylint, just so we could provide a 3.11 compatible version. I don’t think having a longer RC period would have improved this situation as dill was just waiting for the 3.11 release the be final. Similarly, a recent example in jedi (also a critical dependency probably unbeknownst to many) shows that extending the RC period would only prolong the period in which we need to wait for critical dependencies to release a new version.
I’d like to stress that I don’t agree or disagree with the stance that the Jedi maintainers are taking here but just want to provide the example as evidence that prolonging the RC period can actually have negative effects.

Personally I believe that the point I’m addressing falls under this concern as well. I think the discussion in this thread focused on tweaking the length of certain periods in the release schedule, in the hopes that this would give more time to test. From my personal experience as open source maintainer and at my day job I have seen two main blockers from me being able to test properly:

Projects that simply lacked the capacity to keep up with Python development and clearly needed some help/guidance on how to become compatible again.
Projects that were waiting on the rc period to be done before finally releasing a new version.

Projects that fall in category 1 wouldn’t be helped by prolonging or shortening any period. The (direct) impact the Core team could have there is to offer assistance with the updates.
Projects that fall in category 2 clearly don’t benefit from prolonging the RC period.

I’m aware that this is all based on my personal experience, which includes only a subset of the ecosystem (I don’t think I have ever installed numpy unless it was a dependency) but I feel like these two categories keep recurring every year. There are probably projects that would benefit from a longer RC or beta period, but I wonder if those outweigh the projects in category 1 and 2, which to me seem much more problematic and where our/your efforts should be focused on.

Edited since I saw I made a claim somewhere which was clearly just my opinion so I changed that.