Restricting "open ended" releases on PyPI?

I’ve tried multiple times to rewrite MarkupSafe as abi3, both in C and in Rust. I’m not super comfortable in either of those languages, and for speed MarkupSafe relies on some Python string information that is not available through abi3. So I haven’t been successful. If anyone came along and contributed a Rust abi3 PR that was reasonably close in speed, I would be eager to accept it.

With the GitHub workflow I’ve set up, it’s approximately as easy to trigger a completely new version as it is to trigger a new wheel for an existing version. So I wouldn’t really mind. But it’s not necessary, since the new release wouldn’t represent any actual changes. It’s a similar objection I have to keeping the Python version trove classifiers up to date, it requires a new version just to say “yep, still compatible”.

4 Likes

This would also probably require specifying tag precedence/ordering and calculation.

2 Likes

I think draft releases are a somewhat separate concern from this. The reason draft releases are useful is mutually exclusive to why the situation flagged is problematic.

I don’t think so.

Speaking for pip, most of the cost around dependency resolution is in fetching metadata for a single package release, and not in fetching the listing of files from the index. It’s also not really feasible to have different answers for same-package-name in a single resolve since we hold all the parsed page information in memory IIRC.

I don’t expect that changing how many files are in a release changes much for resolvers practically, especially since file listings are updated whenever any new file is uploaded.

At best, it means that the HTTP cache is less likely to be invalidated but those have a short-enough TTL already IIRC.

How exactly does this work?

You can’t upload a distribution file with the same filename and, assuming that the package metadata has not been changed, ~all backends should be generating the exact same output file(name).

I genuinely don’t follow when this can be an issue, outside of one somewhat convoluted scenario (below) that I think is unlikely enough that we don’t need to care.

The only case I can think of where this can happen is a package moving from pure-Python wheel to per-ABI wheels but those are the sorts of situations where most projects also bump the major version and if they’re authoring code that builds a Python extension, I’d expect such users to have more familiarity with packaging systems – making it less likely that they’d make this mistake.

And, to be add a bit more context: pinning with hashes is the recommendation for when you want to be secure with pip (we don’t have secure defaults, which isn’t great but there is a documented approach for being fairly secure).

TBH, this is IMO the main benefit of doing this, it’s a (small-surface, yay 2FA) attack vector we’d be defending against here.


I think the decision to make here is how much effort/disruption removing this “quirk” is worth?

I think the benefits of the existing behaviour are worthwhile to not just yank this out without though and it’s a good feature to not need a new release to advertise that you’ve added support for a new Python version (or have time to resolve issues if you aren’t able to build a Windows release due to an automation issue that took more than a week to resolve).

TBH, I don’t think it would matter much either way if PyPI removed this capability – package maintainers will adapt[1] since “cut a new release” is a straightforward solution here (if a bit heavy-handed) and most of the cost for doing that are on someone doing the work on PyPI implementation and any surrounding communication work for it.


  1. Some might also be grumpy about it. ↩︎

1 Like

Thanks for sharing this; with that given, it sounds like the “pessimization” argument is pretty weak.

The scenario below is one I had in mind, but there’s another (also arguably very convoluted) I was thinking of:

  1. foo=1.2.3 already exists on the index
  2. The maintainer of foo uses a matrix of CI runners to build and publish wheels in parallel for different (OS, arch, etc.) configurations. This matrix may be dynamic (GHA supports this) and/or grow additional configurations between versions.
  3. The maintainer makes changes in prep for foo==2.0.0 but forgets to bump the actual version
  4. The build/publish workflows run in parallel, meaning that there’s a race condition: a version filename for a wheel configuration that hasn’t appeared before can be successfully uploaded before a conflicting filename fails.
  5. The maintainer sees the eventual failure and assumes that all wheel uploads failed, when really one or more succeeded (and are now being served with an unexpected version)
  6. ???
  7. Extreme user sadness at some indeterminate point in the future

This is arguably pretty unlikely, but not impossible: many projects do use parallel build/publish matrices on CI systems like GHA, and it isn’t inconceivable that the matrix changes or specializes in a way that allows a new wheel to be uploaded to an existing version by accident.

Fully agreed!

Slight user sadness. Either an install will fail (not good, but not the end of the world) or it will work (so, good, maybe? or are you considering it bad that the user got a later version than expected and it worked fine?)

Either way, user reports the issue, maintainer yanks the offending file, no real harm done.

It’s not good, but it doesn’t seem like it’s any worse than the (far more likely) “maintainer ships new release with a nasty bug in it” situation…

1 Like

It’s worse than that, I think: PyPI doesn’t allow single files in a release to be yanked, so a release that gets “tainted” in this way is either permanently broken or has to be entirely yanked (eliminating the value of being able to roll-forwards additional wheels for the same release).

(The assumption is that the new release is not compatible with the old release, i.e. foo==2.0.0 is intentionally a breaking version but is now being served to some percentage of foo==1.2.3 wheel users for whom the new wheels are more specific than the old ones.)

Similarly, debugging this is obvious from our bird’s eye view, but I think it’d be pretty initially confusing: the user report would be something like “a version that worked for years is suddenly broken,” since user reports generally occur at the layer above wheel/sdist specificity. That’s not to say that it’s impossible to triage or anything, but I do think the root cause here is ultimately pretty opaque (race condition in CI build matrices → partially valid wheel uploads that look like they didn’t actually succeed → dependency resolution includes wheel specificity checks that (reasonably!) aren’t part of the public interface).

With all that, I’ll reiterate that this is speculative and I have no actual quantitative evidence that any package has ended up in this state. It’s just an example I imagined as a “pathological” case for how the current yank/wheel tagging/open-ended behaviors interact :slightly_smiling_face:

That’s a problem with PyPI - PEP 592 allows for yanking at the file level.

So OK, I guess we could add a second restriction to PyPI to mitigate the impact of the first restriction. And maybe that’s even the right thing to do for practical purposes. But it seems a bit off to me, TBH.

It was pointed out to me that there’s something wrong with these queries and results, so I’ve edited the post to hide them behind a details expand, so as to not confuse the conversation.

Summary

On the topic of open endedness, I was curious to understand how prevalent the issue was, so I ran a few queries to answer some questions.

  1. What’s the longest open-ended release has gone between first file and most recent, and for comparison, average and median?
warehouse=> SELECT
    AVG(DATE_PART('day', f.upload_time) - DATE_PART('day', r.created)) AS avg_days_apart,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY DATE_PART('day', f.upload_time) - DATE_PART('day', r.created)) AS median_days_apart,
    MAX(DATE_PART('day', f.upload_time) - DATE_PART('day', r.created)) AS max_days_apart
FROM
    projects p
    INNER JOIN releases r ON r.project_id = p.id
    INNER JOIN release_files f ON f.release_id = r.id
WHERE
    DATE_PART('day', f.upload_time) - DATE_PART('day', r.created) > 2;
  avg_days_apart   | median_days_apart | max_days_apart
-------------------+-------------------+----------------
 9.453471673727293 |                 7 |             30

Note: I use a date difference of 2 since 1 surfaced a fair amount of releases that happen around midnight, so span two days. I didn’t want to change my queries to do second-difference-precision date math since it wasn’t relevant to my queries and I was a little lazy.

So while open-endedness is possible, the worst offenders I could find were at most 30 days.

  1. In recent time, what are some of the projects that exhibit adding uploads after the initial release?
warehouse=> SELECT
    MAX(f.upload_time) AS most_recent_file_date,
    p.name AS project_name,
    r.version AS release_version,
    DATE_PART('day', f.upload_time) - DATE_PART('day', r.created) AS days_apart
FROM
    projects p
    INNER JOIN releases r ON r.project_id = p.id
    INNER JOIN release_files f ON f.release_id = r.id
WHERE
    DATE_PART('day', f.upload_time) - DATE_PART('day', r.created) > 2
GROUP BY 2, 3, 4
ORDER BY 1 DESC, 4 DESC, 2, 3
LIMIT 10;
   most_recent_file_date    |     project_name     | release_version | days_apart
----------------------------+----------------------+-----------------+------------
 2024-01-31 22:17:17.33082  | cybotrade-indicators | 0.0.7           |         22
 2024-01-31 22:11:14.91034  | crosspy              | 0.0.0a3.dev74   |         26
 2024-01-31 22:01:20.637333 | libroadrunner        | 2.5.0           |         17
 2024-01-31 19:58:18.50832  | temporian            | 0.7.0           |         19
 2024-01-31 17:14:49.46431  | topoly               | 1.0.3           |         12
 2024-01-31 15:48:17.348508 | numexpr              | 2.9.0           |          5
 2024-01-31 10:02:09.885435 | autoai-libs          | 1.16.2          |          9
 2024-01-31 07:03:12.093301 | rqfactor             | 1.3.16          |         19
 2024-01-31 06:23:15.471182 | clika-inference      | 0.0.0           |          7
 2024-01-31 06:23:13.950191 | clika-compression    | 0.0.0           |          7

Feel free to dig into why some of these have these behaviors.

  1. How many projects overall exhibit this pattern?
warehouse=> SELECT
    COUNT(DISTINCT p.id) AS num_projects
FROM
    projects p
    INNER JOIN releases r ON r.project_id = p.id
    INNER JOIN release_files f ON f.release_id = r.id
WHERE
    DATE_PART('day', f.upload_time) - DATE_PART('day', r.created) > 2;
 num_projects
--------------
         8715

Anyhow, I’m not weighing in on the merits or risks of open-ended releases, but I often like to have some data handy when trying to understand the scope.

11 Likes

I couldnt say what is wrong with the query, but the first at least is certainly giving the wrong answer

eg PyYAML · PyPI shows uploads ranging from Jul 18 to Jan 18, which is much more than 30 days

2 Likes

For the Pillow project, we have at least once had an occasion when the CI failed to build a subset of wheels. Rather than delaying the release, we released the majority of the wheels that built successfully, then figured out and fixed the rest, and uploaded those later.

A bit more common, we’ve sometimes found problem with a badly compiled wheel after release, and then uploaded a fixed version with a build number.

Why not a patch release? For one, we might not want to ping everyone with a “new” release, when there’s no functional changes, and for most people, the binaries haven’t changed either.

Also, previously our release process had some manual steps. At one point, we had one person who built the Windows wheels, so had to pause the release to wait for them. All this made new releases quite slow and hefty process compared to uploading new files to an existing release.

Our next release will be fully automated using the excellent cibuildwheel and Trusted Publishing to deploy from CI to PyPI, so new patch releases should be much easier, but I still have some concern about using 34 hours of CI time to build, uploading 22 MB of 50 wheels to PyPI, and pinging a new release.

8 Likes

You’re right, and thanks for poking through my assumptions! I’ll have to fix the query and run through them again.

1 Like

I spent some time trying to use the distribution_metadata table in the public BigQuery set, so that anyone else could query/verify, but there’s some curiosities in there, likely due to project name transfers over time, so I abandoned that approach.

I redid one of the queries to get the “meat” of it - and here’s a table (behind the details).

Summary
warehouse=> SELECT
    p.name AS project_name,
    r.version AS release_version,
    ROUND(EXTRACT(EPOCH FROM MAX(f.upload_time) - MIN(f.upload_time)) / 86400) AS days_apart
FROM
    projects p
    INNER JOIN releases r ON r.project_id = p.id
    INNER JOIN release_files f ON f.release_id = r.id
GROUP BY 1, 2
ORDER BY 3 DESC, 1, 2 DESC
LIMIT 30;
      project_name      | release_version | days_apart
------------------------+-----------------+------------
 ioLabs                 | 3.2             |       4067
 digest                 | 1.0.2           |       4041
 django-image-sitemaps  | 1.02            |       3760
 HeapDict               | 1.0.0           |       3615
 waipy                  | 0.0.1           |       3602
 waipy                  | 0.0.6           |       3597
 waipy                  | 0.0.5           |       3597
 waipy                  | 0.0.4           |       3597
 waipy                  | 0.0.3           |       3597
 waipy                  | 0.0.7           |       3595
 python-xmp-toolkit     | 2.0.1           |       3592
 waipy                  | 0.0.8.1         |       3577
 waipy                  | 0.0.8           |       3577
 waipy                  | 0.0.8.6         |       3525
 setuptools             | 0.6c3           |       3405
 vim-bridge             | 0.5             |       3339
 setuptools             | 0.6c5           |       3294
 setuptools             | 0.6c4           |       3294
 geopy                  | 0.93            |       3273
 multipart              | 0.1             |       3253
 jaraco.input           | 1.0.1           |       3223
 setuptools             | 0.6c6           |       3152
 visibility-graph       | 0.4             |       3084
 graphistry             | 0.9.9           |       3068
 export                 | 0.1.0           |       3067
 setuptools             | 0.6c7           |       3056
 metasyntactic          | 0.99            |       3045
 AddOns                 | 0.7             |       2999
 hr                     | 0.1             |       2982
 sphinxcontrib-spelling | 1.1             |       2938
(30 rows)

The top one - ioLabs - looks like they uploaded a wheel 11 years after the initial release.
The waipy ones appear as a slew up .egg files added to older releases.

Anyhow, if there’s other data that would prove interesting that I can get at, let me know!

2 Likes

To add a data point from the other side (a dependency management tool):

In Poetry, we sort of ignore that wheels can be added to an existing release later - for performance reasons. Of course, that has the disadvantage that you have to clear your cache to get wheels that have been added later. We also receive issue reports about missing wheels in the lockfile from time to time (which we respond to with “Please clear your cache”). But in the end, I think the performance benefit might still be worth it - at least for Poetry.

6 Likes