An index for deleted PyPI packages / versions

PyPI allows arbitrary deletion of packages and versions which can be super disruptive for anyone depending on these packages (see a long earlier discussion on whether this should be possible in Stop Allowing deleting things from PyPI?). You can pin your complete dependency tree with Poetry (like we do at my day job), and one day wake up and your project won’t build because Google has pulled a jaxlib version (which has happened multiple times). Very fun.

Anyway, a few months ago I’ve helped fund a startup that - among other things - creates daily snapshots of the PyPI registry. Based on these snapshots we’ve setup a Deleted Python package registry. A registry that contains all packages and versions that have been deleted from PyPI (since Sept. 2023 and updating daily). It has links (no registration required) to download any wheels or files from deleted packages. E.g. here’s jaxlib 0.4.4 (deleted a few days ago):

https://dashboard.stablebuild.com/pypi-deleted-packages/pkg/jaxlib/0.4.4

Hope this is helpful for others (although I’d much rather disallow deletion from PyPI, but OK). :partying_face:

2 Likes

Just a heads up to any users that this likely contains a lot of malware and typosquats that PyPI admins have removed (e.g. https://dashboard.stablebuild.com/pypi-deleted-packages/pkg/yocolor).

Also: in case anyone reading would like PyPI to disallow deletion and has resources to apply towards development work, the PyPI team is interested in implementing https://github.com/pypi/warehouse/issues/11798 based on the proposal in https://discuss.python.org/t/stop-allowing-deleting-things-from-pypi/. You can email sponsors@python.org to get started.

5 Likes

Hi @dustin, thanks for adding to this. Is the deletion reason of packages logged somewhere by any chance? Would be a good thing to add here. Team has been looking at integrating a CVE database here to flag stuff that was pulled for having CVEs.

It’s not, that would be https://github.com/pypi/warehouse/issues/4703, which we’ve also been looking for resources to implement it for quite a while.

FWIW, these two things are pretty much mutually exclusive: projects that have CVEs don’t usually delete their releases because of a CVE, and projects that are malware pretty much never have associated CVEs.

2 Likes

@dustin Thanks for the thread and for your quick replies. I’ll get at least the flagged packages from GitHub - ossf/malicious-packages: A repository of reports of malicious packages identified in Open Source package repositories, consumable via the Open Source Vulnerability (OSV) format. imported and add a warning to those pages and will monitor the GH issue.

FWIW, these two things are pretty much mutually exclusive: projects that have CVEs don’t usually delete their releases because of a CVE, and projects that are malware pretty much never have associated CVEs.

Yes, well aware (although I’d expect versions might be pulled or yanked) - but they’re two sides of the same coin: how do you archive old versions, without exposing users to large security risks.

One nice thing about integrating a malicious package list is that we can see exactly which packages get installed from the StableBuild PyPI snapshots and by whom, so if a user is pulling packages flagged as malicious we can tell them.

update: This now shows malicious package warnings as long as they’re in ossf/malicious-packages. It’s a start :slight_smile:

1 Like

You may want to distinguish between projects where every version has been deleted rather than those where the project still exists but some versions have been deleted.

The former are likely to have been removed due to being typosquatting attempts, and provide no value to anyone (besides other malware authors looking for ideas).

The latter are probably genuine projects that are either trying to stay under PyPI’s size quota or are overzealous in removing vulnerable versions (there’s no need to delete them - publish a fixed release or yank it).

It would also be nice to see some clear way to contact StableBuild regarding removal of a package from your cache. Sometimes they will be removed for security, copyright, patent infringement, or trademark reasons, or simply for being a mistaken publish. While strictly speaking you guys may be in the clear to do what you’re doing (provided you’re checking licenses, since being on PyPI doesn’t necessarily grant you the rights to also redistribute licensed content), you’ll want to be able to handle concerns from the original publishers directly. Right now, the only contact form I can find appears to go to your marketing and/or sales teams, and if one day I have to help our lawyers get in touch with you, that’s not where you want them going first :wink:

2 Likes

It’s hard for me to see how to really get this right. Some people withdraw because they made a mistake and don’t want the broken package floating around. Some remove because they just don’t want people using earlier versions (I’d say this is where archiving is most justified). Some get killed for legal/security reasons - takedown due to copyright/trademark etc. and malware removals. AFAICT, the PyPI terms of service gives PyPI the rights to host forever (even if they don’t necessarily), but the uploader has not granted someone else those rights. Seems pretty murky… [Edit]: also, there have been cases where an interface package talks to a service no longer being offered, and so they yanked the pkgs.

Before we go too FUD on the PyPI ToS, here’s the relevant part (points 2&3 are sub-bullets under the first, despite Discord’s formatting):

  • I retain all right, title, and interest in the Content (to the same extent I possessed such right, title and interest prior to uploading the Content), but by uploading I grant or warrant (as further set forth below) that the PSF is free to disseminate the Content, in the form provided to the PSF. Specifically, that means:
  • If I upload Content covered by a royalty-free license included with such Content, giving the PSF the right to copy and redistribute such Content unmodified on PyPI as I have uploaded it, with no further action required by the PSF (an “Included License”), I represent and warrant that the uploaded Content meets all the requirements necessary for free redistribution by the PSF and any mirroring facility, public or private, under the Included License.
  • If I upload Content other than under an Included License, then I grant the PSF and all other users of the web site an irrevocable, worldwide, royalty-free, nonexclusive license to reproduce, distribute, transmit, display, perform, and publish the Content, including in digital form.

So licenses that allows the PSF to redistribute, which would include most OSS licenses (that allow all recipients to redistribute) are used. If no license is included, or the license does not explicitly allow the PSF to redistribute, then the ToS (attempts to[1]) overrule and make the content redistributable by the PSF and anyone who uses PyPI to get the content. Otherwise, only the PSF and “mirroring facilities” have the right.

I’m sure there’s an interesting legal argument about what constitutes a “mirroring facility” (which is undefined in this text, but could reasonably be argued should not provide content that has been removed from upstream/PyPI), and I’m also sure that StableBuild doesn’t want to have to pay lawyers to have that argument :wink: A genuine effort to allow packages to be removed from their mirror manually is very likely to avoid the entire discussion.


After a bit more searching I located the DMCA contact for StableBuild deep in their ToS, and since it’s just support@ email I would assume that’s a perfectly good contact for non-DMCA takedown requests (nobody really wants to start by invoking DMCA). Would be nice to not be so buried, but it’s not totally absent.


  1. I am not a lawyer, so I’ll make no statement about how successful it is. ↩︎

2 Likes

@steve.dower Thanks for your perspective; we did get a legal opinion regarding the mirroring of of these repos (incl. PyPI) before starting StableBuild - and we believe our current usage is in accordance with the ToC.

(An interesting factoid is that PyPI very often does not remove wheels from CDN when packages are deleted either.)

@dustin, I have access to the preview PyPI Malware Reports repo now (thanks for the tip), and will monitor the progress there - I’ve reached out to some security vendors as well, to see if they’d be interested in allowing reuse of their vulnerability database for the project here.

Some people withdraw because they made a mistake and don’t want the broken package floating around. Some remove because they just don’t want people using earlier versions (I’d say this is where archiving is most justified).

Yeah, the problem is you have no idea what those are and that things might be misaligned by author and users. What if I believe bleep 1.1.1 was a mistake and I remove it; now I might break someone else’s project downstream who was happily using that version w/o issue. My personal conviction is that once a package is in a public package repo the original author should not have full control over it anymore, fine to mark it as yanked, but not to delete it - you never know what you break for whom.

1 Like

I agree. The problem would be if someone publishes a package with a license specifically granting the PSF permission to redistribute but nobody else. That’s the kind of thing I could see… oh, say a processor or GPU company doing :wink:

True, but removing the user-visible links probably constitutes “no longer distributing” for the purposes of these terms. Another under-defined concept that would require lawyers and a judge to argue (and probably hilarious analogies involving phone books or warehouse keys :smiley: ).

What if I know bleep 1.1.1 was a mistake within five minutes of publishing but now my legally-protected secrets that were accidentally included are public forever thanks to an efficient mirror? This is the kind of scenario where I suggest you want an efficient, direct and private way to deal with it, because if a company cares enough to make it a big legal fight, they will.

1 Like

Slight correction: PyPI never removes any distribution file (not just wheels) from our object storage backend (not just CDN) unless we are somehow legally obligated to do so (very rare).

3 Likes