Stop Allowing deleting things from PyPI?

Background

Today PyPI announced a plan to require the top 1% of packages will, in the future, require having 2FA enabled to interact with those projects, and as part of that announced it was giving away free hardware security tokens to eligible maintainers of those projects.

A handful of people took umbridge to this new requirement, and in one case, the author deleted the project (and then immediately re-registered it, but that wasn’t hardly a requirement).

This got me thinking of deletions again on PyPI, and lead me to open this discussion.

Currently today, PyPI allows projects to arbitrarily delete files, releases, or entire projects from PyPI, and historically this was the only way to deal with a “brownbag” release. However, we now have yanking support which offers a way to deal with a brownbag release without deletion.

This can cause problems, such as with the now famous left-pad incident where a user deleted a much depended on package on npm, causing most projects to break, and also risking anyone to come along and register a malicious left-pad.

Of, course with PyPI we have two different sets of users, we have authors, who generally want to have the power to do anything they want at all with their projects, and users who generally want authors to be as constrained as possible to remove any surprises in what they might expect.

Currently our deletions skews entirely towards the needs of the authors, and away from the needs of the users, and I want to challenge that assumption with deletions.

I think that it does users a disservice if, under their feet, a maintainer can delete their project and someone new can come along and claim that project, with no warning unless they’ve gone to great lengths to protect themselves using something like pip’s --requires-hashes.

So to that end, I propose that we should end deletions of projects on PyPI, at least for projects deemed “critical” but I’d like to push to remove it for all projects.

I think that we can have some exceptions here though, something along the lines of:

  • Projects that are < 7 days old.
  • Projects that only ever had a release versioned 0.0

This does mean that an author can’t delete everything and walk away from a project, it has to either sit on their project list or they need to find someone to hand it off to. I think that is a reasonable outcome.

That does raise the question of whether we should allow the deletion of files/releases. I’m less sure of myself here. There’s no security reason that we need to disallow deletions of versions, since if we disallow project deletion, then we know that the project is still owned by the owner or someone that they’ve handed the keys to the project over to.

There is a practical reason to disallow it, in the case of an accidental or malicious deletion (say someone’s account got compromised), PyPI does not offer any way to restore those files, causing those files to forever be lost unless a PyPI admin manually fixes things (which I just did for atomicwrites, and it took about an hour to restore 35 files).

However, PyPI tends to shy away from constraining authors unless we have a particularly good reason to do so, and the case for preventing deletion of files is more sketch, but I personally lean towards preventing file deletion as well (with similar time gated exceptions).

Overall, I’m not sure if it makes sense to allow deletion of files/releases anymore, but I’m pretty sure that we should kill the ability to delete projects.

What do folks thinks?

12 Likes

I think we should do this for all projects - disallow deleting a project that has releases and being disallowed after 7days. I wouldn’t treat 0.0 specially. For deleting files, being disallowed after 7 days is equivalently reasonable IMO.

FWIW, we can still have a left-pad-like incident, where someone yanks all but a non-functional version - but that wouldn’t break any workflows with pins.

3 Likes

By Pradyun Gedam via Discussions on Python.org at 09Jul2022 08:45:

[…] I wouldn’t treat 0.0 specially. […]

I took Donald to mean projects which did a single release and never came
back, rather than a specific revision.

Cheers,
Cameron Simpson cs@cskk.id.au

2 Likes

I have some concerns (there’s a lot of genuine junk on PyPI that we should retain the right to clean up in the future should we ever want to) but I’m a strong -1 on having this only for “critical” projects.

I have some deep reservations about the whole process of designating certain projects are critical, because authors have absolutely no control over what counts as critical[1]. And in particular, I do not want to see a gradual increase in restrictions on packages or authors based on what’s critical or not.

If preventing deletion is a good thing, it should be imposed for all projects. If it’s not, then the reasons it isn’t apply equally to all projects.


  1. I can imagine a DDOS attack on an author, which sets up a bot swarm to download one of their packages, thus locking the author out until they get 2FA set up… ↩︎

6 Likes

I have had at least two good reasons to delete files or releases in the past:

  1. Pip & co can still pick up yanked files/releases if there’s no better match. For PyTorch we had a very concrete problem with that: the only sdist left on PyPI was a very old 0.1.2 one, and pip install torch picked that up even after yanking for platforms without wheels like 32-bit Windows (PEP 592 explicitly allows this). This gave a steady stream of bug reports. It could of course be worked around by handcrafting a dummy sdist with a newer version number that errors out quickly - but deleting seems better.
  2. Space constraints: for NumPy and SciPy we’ve put old beta/rc versions on PyPI and then cleaned them up after the discussions on PyPI size constraints. I’d find it strange if on the one hand we deem file sizes so critical that uploading very large files is disallowed (xref (What to do about GPUs? (and the built distributions that support them)) and cumulative size for a package is capped fairly strictly, but on the other hand we do allow uploading nightlies and experimental/temporary releases but then forbid deleting them.

And of course there are other reasons one can think of for deleting files/releases, like accidentally including credentials in a file or a missing license which breaks GPL redistribution requirements.

This sounds reasonable. There may still be an incidental case where developers would like to remove a package, like package name being a copyright infringement, but those can be handled via filing a support issue requesting removal.

1 Like

Personally I see a valid point on both sides. As a user of PyPI I would appreciate if a package owner would no longer be able to pull the rug under my feet. Donald’s proposal to stop allowing deletion of packages and releases would help.

As a maintainer and owner of several critical packages I also sympathize with other package owners. I have been on the receiving side of abusive customers who think that they are entitled to free 24/7 support for any package on PyPI. Two of my critical packages are deprecated and obsolete since Python 3.4 and 3.6.

I’m also worried that this proposal may send a wrong signal to package maintainers and OSS developers. Some may see this as an attempt to remove power from unpaid maintainers in order to facilitate big corporations that do not pay the maintainers. The author of atomicwrites was annoyed by the fact that they were forced to enable 2FA (which IMHO everybody should use always). If you remove more power from the maintainers, more maintainers may get annoyed and find ways to sabotage their packages.

I propose to implement a different mechanism instead. It’s going to be more work for PyPI and PyPA team but IMHO it’s going to improve the lives of both package maintainers and users. Instead of preventing the deletion of a project completely, PyPI should offer a way to mark a package/version as deprecated/unsupported, then after a grace period as pending removal, and eventually allow a user to remove the package/version.

In deprecated/unsupported stage tools like pip and pip-audit show a warning that one or more packages are deprecated. This allows users to detect deprecated packages early.

In pending removal stage PyPI no longer offers the package for download unless pip or other clients indicate that they want packages in pending removal stage. This allows users who have ignored warnings to unbreak their software for a short amount of time.

10 Likes

When you’re a volunteer, you provide the community with value. It’s an act of generosity. You trade your free time for responsibility, often with no money or recognition. It’s a labor of love.

In fact, this love is so great that most of our software is published under radically free licenses like BSD/MIT/Apache/etc. They essentially say “do with this as you please; just credit me and don’t sue me”. Even the more restrictive copy-left licenses like GPL allow a person who received a copy to redistribute it. So essentially, the author waives their rights to control how the software gets distributed. It’s a gift to the community.

Moreover, the authors of libraries under any license that choose PyPI for distribution adhere to PyPI’s Terms of Use which state clearly that:

by uploading I grant (…) that the PSF is free to disseminate the Content, in the form provided to the PSF. Specifically, that means: (…) I grant the PSF and all other users of the website an irrevocable, worldwide, royalty-free, nonexclusive license to reproduce, distribute, transmit, display, perform, and publish the Content, including in digital form.

(Emphasis and abbreviations mine, made to increase clarity.)

In light of all this, I claim that it’s already illegal for authors to remove access to code they themselves made available on PyPI.

Sure, the author is free to stop maintaining OSS code. They are free to change licensing on future releases of code they authored on their own. But their hands are tied when it comes to code that is already out there.

In other words, I believe that no maintainer should have the autonomy to delete a package from PyPI. It’s both a question of Terms of Use, their own chosen license, as well as betraying the labor-of-love aspect of their work by putting the entire ecosystem in peril, due to both the risk of a malicious actor snatching the name, and through the sheer unavailability of the dependency. To put it bluntly, I believe it’s unethical to pull the rug out from under your users.

Instead, I think package deletion should be replaced with an equivalent of GitHub’s project archiving.

4 Likes

It’s a bad thing to do so, I agree. Are you sure it’s truly illegal though? The license just gives anyone who receives the code the right to redistribute it, including modified copies. The PSF terms of conditions mean that the PSF has the right to keep the content. It can choose to exercise this right or not. It can choose to allow PyPI users to control whether this right is exercised or not. Can’t it?

Sorry for being picky, but I think we have to be careful with such statements because package deletions have already happened and I don’t think their authors have been infringing the law, although we’re not happy that they did this.

4 Likes

I meant 0.0 specifically, but a single release would be OK too.

One thing to keep in mind is that it’s currently not possible to have a name on PyPI without uploading a package at least once (unless you’ve had it so long that it predates PyPI requiring that, which is like 7+ years now at this point). So the idea was to allow deleting placeholder packages. If we start allowing project creation without upload, then we could restrict that further.

I’m a fan of turning it on for all projects, because I think a lot of the benefits do apply equally to all projects. However I think that it’s wrong to say that all the benefits apply equally to all projects. Preventing say, pip from being deleted is a lot more beneficial than preventing say, dstufft.testpkg2 from being deleted. Limiting it to “critical” projects (which are really just the projects that are being downloaded a lot) limits the cost of preventing deletion to where the reward is the highest.

That being said, I think that we should take away project deletion across the board (sans some exemptions like new projects, etc). If someone really needs to delete a project, PyPI admins can retain the ability to do so.

Thanks. I had forgotten about the file size issue TBH, which is probably, on it’s own, a big enough concern to allow deleting files still. For odd ball cases like the pytorch’s old release we could punt that up to PyPI admins, but there is likely a long tail here that might end up pretty long.

PyPI always has to play this balancing act between giving power to maintainers and constraining maintainers to provide a more consistent and secure ecosystem. There’s a long list of things we’ve down to remove power from maintainers that have made PyPI better in the long run:

  • Remove the ability to host files externally
  • Remove the ability to delete and re-upload files
  • Restrict what artifacts people are allowed to upload to PyPI
  • Restrict what names/versions are considered valid on PypI

There’s more, but each of those were a place where we took some power away from the maintainer, but in doing so provided a safer, more consistent experience for all. So I’m opposed to the idea that we can’t restrict authors at all, because we have, and should continue to do so where the reward is worth the cost.

I feel pretty strongly that some form of preventing deleting the project should be implemented. If we allow deletion of releases/files still then authors can still pull their packages from PyPI if they want by just deleting all of the releases, they just can’t release the name to allow anyone to grab it again. The name is theirs until they hand it off to someone.

The other option I could see is allow project deletion, but deleting a name on PyPI marks the name prohibited for re-use, but I suspect that will end up with a lot more support requests for PyPI as people try to delete projects to give up some name they squatted rather than do the dance of adding someone as an owner to the name they already have.

I think some way to mark a project as deprecated would be a good feature, something similar to GitHub’s archival. I don’t think we should tie it to deletion.

It is not illegal for authors to delete things from PyPI. The ToS gives PyPI and the users of PyPI an irrevocable license to distribute, it doesn’t create a requirement to distribute.

4 Likes

A separate post to keep it distinct from my replies to other people.

I think if I’m reading the room correctly, unless a groundswell of support comes out for preventing deletion of releases/files, then we can scope this proposal to just preventing deletion of projects, but authors would still be able to delete releases/files under that project.

So to spell it out:

  • Files are delete-able at any time by the author.
  • Releases are delete-able at any time by the author.
  • Projects are delete-able if they’re less than 7 days old OR they’re only a 0 version (following normalization rules, so 0.0.0 etc).
  • Give Owners a one way flag to disable deletions of releases/files on their project.
    • One problem with allowing deleting releases/files, if your account gets popped an attacker can delete all the old releases, and you have no way to restore them without a PyPI admin doing it manually. This flag gives authors the ability to protect their projects from that case.

At any time, if a project needs to actually be deleted and the rules above doesn’t allow it, the PyPI admins would retain the right/power to do deletions, so an author could reach out to the admins.

I think that this represents a reasonable tradeoff between giving authors the power and autonomy to manage their projects how they see fit and protecting the health of the overall ecosystem.

It doesn’t mean that an author can’t hurt their users, I could for instance go and delete pip’s releases/files and screw over all of pip’s users since I have permissions on pip. Pip’s users gave me (and others!) that trust, and it’s up to me to manage that trust, and if I do something harmful, I’ll lose it.

In the long tail, we can’t prevent an author from hurting their users, because I can always just release a non functional new version and yank all the previous versions or something like that.

It doesn’t mean that an author can’t purposely hand over the keys to someone malicious, again people have put trust in that author, which includes trusting they won’t do that.

It does mean that an author can’t delete their project and just let it up to the fates about what happens to it. It requires them to explicitly hand the project off to someone else.

3 Likes

Right. I should have prefaced my opinion above with a disclaimer that I’m neither a lawyer nor a native speaker. In any case, let me rephrase:

It’s illegal for authors to demand PyPI remove access to code the authors made available through PyPI.

Sure, currently, PyPI allows for package maintainers to delete their packages. This isn’t in the community’s best interest as explained by me above, and arguably it effectively takes away the ability of a user to continue using code that was already licensed to them.

Therefore, PyPI should remove the ability for package maintainers to truly delete the package. Instead, something akin to GitHub’s archiving of a repository could be provided to mark packages as deprecated/unmaintained/insecure/etc. Since the published artifact is already licensed perpetually to PyPI per Terms of Use, this is fully within PyPI’s power and mandate.

One last thought. Of course, without package authors, the package index would be empty. Authors are a very important user demographic. My intent when writing here isn’t to antagonize authors, I’m one of them myself.

However, there are far more consumers than publishers to PyPI. In fact, every single PyPI package publisher is also a consumer of PyPI either through their package dependencies or through usage of packaging tools that are ultimately packaged on PyPI.

Therefore, I think it’s reasonable to expect PyPI to protect package consumers from disruption (and security incidents!) by fully exercising their Terms of Use.

3 Likes

To be clear, I personally agree with @ambv 's “labor of love” sentiment, but I would like to ensure we’re clear on the legal points. To be clear, I am not myself a copyright attorney, and this should not be taken as legal advice.

Not quite. Rather, authors may still demand removal from PyPI, but it is PyPI’s choice whether to comply with such a demand (or, by extension, offer a feature for authors to do so themselves), except for cases where the package contains content that either the package owner and contributors haven’t released under a license allowing PyPI the worldwide, perpetual and unlimited ability to distribute it, or that the package owner is able (i.e. being the author) to release under such a license; or content that is otherwise illegal to host/distribute in PyPI’s jurisdiction, or otherwise enjoined by law.

So, to sum up, PyPI does appear have the right to prevent package owners from deleting their packages and releases in most cases [1]. However, it does not have in any way the obligation to do so; that is entirely an ethical and practical cost/benefit question, not a legal one.


  1. Except in cases of releases that contain content that either the package uploader or PyPI cannot legally redistribute, and it is theoretically possible that the EU GDPR “right to be forgotten” may apply here to at least some artifacts, but that could be handled via a special admin-involved process, as mentioned ↩︎

3 Likes

What about deleting things for legal reasons (which the prompt when deleting calls out as one of the few reasons you should delete something)?

PyPI admins can still delete things, and deletions for legal reasons should be infrequent enough that it shouldn’t be a large burden.

2 Likes

That’s a fair point, but if we take that position, it becomes necessary to define a method of quantifying when deletion is “sufficiently bad”. I’m not convinced that number of downloads[1] is a reasonable measure here, so unless someone can define a good measure, I don’t think we have an actionable way of limiting the prohibition on deletion to “some” projects.

But given that the consensus seems to be heading towards “preventing deletion is acceptable” the question is moot, I guess.


  1. Cumulative total over all time, number in the last week/month, average weekly count over the last year, …? ↩︎

Regardless of what you choose, I want to warn pypi/psf to not treat maintainers as disposable, i.e., this thing of, if a maintainer deletes stuff, we can just continue having his package anyway. Prioritize, addressing why such folks are deleting packages, in this case, find ways for smooth transitions, rather than actions like, you wont delete your package. This is a more sustainable way if we think of pypi/psf staying relevant and working well with the community.

7 Likes

I think download count is a fairly reasonable measure, the reason deleting pip would be bad is it would impact… well almost every Python developer, and the reason deleting dstufft.testpkg2 would be fine is it would impact nobody.

Download count is a pretty reasonable way to gauge impact, since you can only be impacted by something you actually download, though it is possible to over count if someone is downloading something and not using it-- but that’s rare (and even then, their downloading itself is likely to break).

I think the hardest part about download counts is deciding where you draw the line. Obviously something that gets 0 downloads is safe to delete and something that gets 116M/mo (like pip) is not safe to delete. Between there it gets to a grey area, what about 5 downloads? 10? 100? 100? 100,000?

One benefit of a blanket policy is we don’t have to make that decision, but of course the downside is that projects like dstufft.testpkg2 (which by the way, had 6 downloads in the last month according to pypistats.org) can never be deleted either without admin intervention, and just get left laying around like cruft.

FWIW, I don’t think anyone is treating maintainers as disposable.

Nobody was or is forced to distribute their software on PyPI, and as part of their decision to do so, they granted PyPI the legal right to distribute their software forever. As part of running PyPI we set the rules of what is or isn’t allowed.

Of course, there are strong incentives to share your software on PyPI, and it holds a unique position within our ecosystem, so we’re not free to impose whatever draconian requirements we want like someone like Apple can do with their App Store on iOS.

PyPI is not and will never be a wild west where maintainers are free to do absolutely anything they want. We place constraints on them, for the good of the wider ecosystem, but we also give them broad powers for the good of their specific project (which ends up translating to being good for the wider ecosystem).

Trying to find the balance between what powers we give authors, and what constraints we place on them is probably the hardest part of running PyPI, because there truly is no correct answer.

For most projects, authors have freely given their labor[1], and it’s not great to burden them with additional effort. On the other hand, by publishing your project and asking for users, you’ve also taken on some level of responsibility. Those are opposing views, and they’re both true at the same time, and this is a fundamental problem that OSS communities have to grapple with.


  1. Not all OSS projects on PyPI are run by volunteers, some the maintainers are compensated for, and PyPI has no requirement that projects be OSS to be hosted on PyPI, though obviously most software on PyPI is OSS. ↩︎

5 Likes

I did not also say, you are. I am just saying some actions you may take may give a negative signal to the community, that is why I started by saying its a warning for whatever decision you make. This is one of those and I have seen some folks already viewing it as such.

Anyway, this is my last contribution to this discussion. I am not a PyPI expert and agree that rules are necessary anyway, I also understand the tradeoff but again maybe find out first why folks are deleting stuff especially now. The end.

Personally, I’m honored and humbled to have a package I maintain recognized as “Critical”, and feel with great power comes great responsibility. If the former would ever, at some point in the future, go to my head, I would want the structures in place to limit the harm I could do to others, which is one reason I release my software under FOSS licenses in the first place—to ensure it isn’t solely dependent upon me to continue, if others find it useful and are motivated to pick up the torch.

But isn’t that exactly what choosing to release your project under a FOSS license does, from the moment you publish it anywhere (PyPI or not)?

The whole purpose of a FOSS license is to grant anyone, without discrimination, the freedom to use, share and modify your software, in a way that cannot later be arbitrary taken away at any time should any of the (possibly many) contributors to the software decide they don’t like someone that’s using it, have a breakdown, assign copyright to someone else or a corporation, pass away, etc. It does not force anyone to share your software if they don’t want to, but neither can you force them not to, so long as they follow the terms of the FOSS license you chose.

In fact, replacability and repairability (right to repair), the ability of users and distributors to continue to use, share and maintain the software without being solely dependent on an original author/company or current nominal maintainer is one of the most commonly cited motivations for FOSS licenses in the first place.

3 Likes

No one is refuting this fact but for better or worse for good synergy its useless to use this as justification to treat anyone as disposable just because they use your platform, atleast ethically.