I want to throw out an idea for a proposal that would restrict all forms of deletions from PyPI.
I’m not entirely sure how I feel about this yet, but I was already feeling somewhat sketch about restricting only project deletions as a weird sort of half measure and looking at what other repositories are doing has me feeling even more worried that a half measure might end up being the worst of both worlds, rather than a solid compromise.
Please take this proposal with an appropriately sized grain of sand, particularly where the numbers are concerned. I’m just pulling some numbers out of thin air for now and trying to put pen to paper on what I think a reasonable policy that moved to restrict all kinds of deletions would be, to see how people (including myself) feel about it.
This policy and related proposed features are largely modeled after taking the parts I liked from what I found all of the other repositories combined with the concerns and use cases people have raised in here. I’m going to lump releases and files together, as I think that there isn’t a useful distinction between them for this discussion.
The policy would then be:
- Projects may not be deleted if they have any releases/files associated with them [1].
- Releases/Files that are a PEP 440 pre-release may be deleted at any time without restriction.
- Releases/Files may be deleted within the first 72 hours of release without any other restriction [2].
- Releases/Files that are older than 72 hours, that have been downloaded by a known installer (e.g., not mirroring tools or browsers) less than 1,000 [3] times in the past month, may be deleted without any other restriction [2:1].
- Otherwise, deletions are not available without contacting the PyPI admins [4].
- Any deletions that are allowed, follow the same caveats as they do today:
- Irrevocable
- Projects are released back to the overall pool to be registered by anyone
- Files, once deleted, may never be re-uploaded.
To support some of the use cases for deletion today, we would also add some form of the following features:
- The ability to “yank” at the project level, which would mark all files as yanked in the simple API, and would remove the project from the web UI [5], search results, JSON API, etc.
- The ability to apply a “notice” at, at least the project level to provide some feedback to people finding this project in the UI or installing it [6].
- Alternatively (or maybe in addition to) we can provide a way to “archive” a project, similar to what GitHub does, that marks the project read only. Maybe this would just be rolled into the “notice” feature, or an option when you apply a notice, or maybe it would be its own distinct thing.
- The ability to register a name on PyPI, without uploading a release/file to go along with it [7].
These things I don’t think are blockers, but I think it might be good to think about them:
- Re-evaluate whether the quota system is actually giving us the results we wanted when we first put in place. I see that tensorflow nightlies are still a massive chunk of storage, which suggests it might not be, but it’s hard to say what PyPI would look like without it and I haven’t been involved in those requests lately, so I lack context to say for sure [8].
- Consider whether there are any situations where we want to automatically garbage collect uploaded files. In particular, I’m wondering if it would make sense to automatically reap old development releases, likely with some stipulations, as I think having a historical record of every development release is less useful than every final release.
If we were to restrict deletions like the above (which again, is just me thinking through what it might look like if we did), I think that the above might represent a pretty reasonable balance? It would be hard to argue that this is out of line with people’s expectations, since the majority of other language repositories have something resembling the above and PyPI is one of the few that allow unrestricted deletion at all [9].
It is obviously a restriction on what maintainers are able to do today, but I think that I’ve carved out exemptions that generally match what most people would agree are the situations when it’s “safe” to delete a file and the rare edge cases that aren’t covered by that, we still have admin intervention available to us.
Just to go back through the thread and match the proposal to concerns people had or situations they brought up that they felt where deletion was justifiable, I see the following:
- Rules apply universally [10], and don’t attach any labels to projects which avoids PyPI making any sort of statement, real or otherwise, about the projects.
- The cases of a “bad” [11] release is still generally able to be removed if it’s discovered quickly, tempered by the fact that if you haven’t discovered it quickly, then we’re prioritizing artifact stability unless the problem is serious enough to warrant admin intervention.
- The cases of “cruft” or placeholder packages are still able to be removed, tempered by the fact that if people are actually using or downloading this “cruft”, then it might not actually be cruft as the author assumed as someone is obviously using it for something.
- Development/Nightly releases are still able to be removed, as their version numbers should communicate to end users that those files are not stable things to depend on [12].
- We don’t handle edge cases like PyTorch’s really old sdist, but I think that’s fine since PEP 592 should have handled that (and pip 22.0+ fixed that), and even if it didn’t, that feels like a sufficiently weird edge case that admins could handle it.
- Strikes can still be implemented by using the notice feature and/or the yank project feature, it’s still trivially able to be worked around using
==
, but as mentioned above, the current situation is also trivially routed around and since you can unyank/unnotice, cleaning up after the strike is over is a much smoother process. - Quotas are still possibly a problem, but that’s a service availability concern not an API / feature concern for PyPI, so that will require the PyPI admins to talk and figure things out.
Anyways, that’s what I would envision implementing if we brought PyPI in line with the bulk of the other language repositories, and restricted deletions to provide better stability to the ecosystem.
I’d be interested to hear if people feel really strongly one way or the other about the above. I’m not sure how I feel about it yet, my instinct is that I think it would be a positive change and it does a good job balancing things, but I haven’t had enough time to roll it around in my head to decide if that instinct is right or not.
-
Maybe it would be useful to allow full project deletion if the project itself is less than some number of days old. Like say if the project is less than a week old, then you can delete the project and all files associated with it. ↩︎
-
Once we have reliable dependency metadata, it may be a good idea to further restrict this to say that deletions would also be prevented if anything in PyPI depends on this and isn’t satisfied by some other available version. ↩︎ ↩︎
-
This is pulled completely out of thin air, if we went this direction, we’d want to do some information gathering to see what various cut offs would allow to be deleted from PyPI. ↩︎
-
We’d maybe want some specific policy on when PyPI admins would do a deletion, or at least general guidelines. Obviously, content that wasn’t legally distributed, malware, placeholder projects, etc would likely fall under those guidelines. ↩︎
-
Obviously it would still show up in the maintainer’s UI, to allow them to un-yank the project in the future. ↩︎
-
This might be restricted to just deprecation notices? Or maybe it would be best to leave it open ended for authors to put any kind of notice they want. Or maybe we’d have some notice categories, but not fully anything goes. In any case, this isn’t meant to be a fully fleshed out design, just a rough idea. ↩︎
-
In the past we were hesitant to do this because we were worried about making it too easy to squat names, but with PEP 541 I think that we have a reasonable process for dealing with that now. There’s also an open issue (pypi/warehouse#11296) that asks for this for other reasons. ↩︎
-
Certainly, the quota system as it exists today represents a non-zero amount of effort for both the PyPI team (who have to handle quota requests) and maintainers (who have to either ask for more quota or find ways to work around the quota). ↩︎
-
In fact, the author who deleted atomicwrites and spurred this whole discussion to happen right now assumed that PyPI was similar to these other language repositories, and that deletion didn’t mean that previous artifacts would be removed. ↩︎
-
We are using download counts to decide if a file can be deleted outside of the specific outlined scenarios, which is slightly not universal. However, that trade off exists to allow “cruft” and placeholder packages to still get deleted without admin intervention. ↩︎
-
Completely broken, has credentials leaked inside of it, is containing files that aren’t legal to distribute, whatever reason authors might have. ↩︎
-
We maybe don’t need this rule, as the way PEP 440 recommends installers to work, pre-releases are excluded by default and may just fall under the 1000/mo download threshold naturally. On the other hand, it may be worth adding this explicitly anyways just to make it clearer, and it’s possible the 1000/mo download threshold is a bad idea on its own anyways, or maybe we would decide the 1000/mo threshold is a one-way switch and once a file crosses it can’t be deleted. ↩︎