I just want to point out a couple of other reasons why releases might be better deleted than yanked.
A critical security issue was discovered in an old release
PII, credentials, or other sensitive information was accidentally included in a release.
Given that the PyPI admins will still be able to delete such packages, and with the assumption that the quota issue will be resolved, then I think limiting deletions by owners could be acceptable.
One practical suggestion: The PyPI support templates don’t quite cover this situation. There’s an external link to report a security issue, which leads you to the PyPI security page, but this only covers:
What to do if you discover a security issue in PyPI itself
Identify a security issue (“Report project as malware”) in some other package.
We need a template that allows us to report security issues or deletion requests in our own packages, and such reports should remain confidential.
Feel free to send language to either the support tracker or to warehouse itself - but generally emailing security at pypi.org is currently the way to go for that kind of notice.
If you’d like to help build out a more interactive version of the “Report” button, especially for an owner in their project’s manage page that goes to PyPI Admins for review, that’d be great too.
Disagree, and I’ve been responsible for assigning some “critical” rated CVEs that I still wouldn’t consider bad enough to forcibly prevent people from using that version if they wanted to. Yank it (if there’s no newer version), but no need to delete it.
The files are still present and accessible. If you want to delete for this reason, you need admin intervention. All deletion will do (more than yanking) is prevent pip install package==<specific version> from working, and reduce your quota.
I also don’t think that it should have been moved. Many projects are pushing things to PyPI that should ultimately be deleted and if we are going to limit deletions on PyPI then there needs to be an alternative solution for those projects. That seems like a relevant part of this discussion.
Oh? So if I still have the “download” link (i.e. the pythonhosted.org URL), I can still download even deleted files? I didn’t know that and I wonder how many other people don’t know that. Probably good to add to the FAQ or docs.pypi.org.
I’m sure a lot of people don’t know that but I do from these threads (although I have no detailed knowledge of the internals). What it means is that regardless of deletions, quotas etc every upload to PyPI creates a permanent growth in the ongoing consumption of resources. We need to reduce unnecessary uploads in the first place which is why it is important to provide alternative channels for things that should not be stored permanently.
I don’t like any solution that relies on the organisation feature. For me, that seems both opaque (I have no idea how I’d go about requesting an org account, what criteria are required to qualify, etc.) and exclusionary (I really don’t like the fact that you can pay for an org account if that gives privileges not available to non-org users).
Furthermore, do we have any actual evidence that the projects needing quota exemptions overlap with projects that we can reasonably expect to have organisation accounts?
I think the key thing here is that limiting deletions removes an activity that projects can do themselves, and replaces it with an activity that requires admin intervention. It doesn’t matter whether the intervention is increasing a quota, or deleting a file, this proposal directly increases the amount of work the PyPI admins have to do in those cases. Given the current state of the PyPI admin backlog, proposing this at the present time seems at the very least ill-timed, and potentially even harmful to the work being done to improve the admin situation. I would suggest that this PEP gets deferred until the PyPI admin workload is no longer an issue (by which I mean that it’s clear to the community that there’s no longer a problem, not simply that someone can quote figures demonstrating that turnaround times are acceptable).
Nowever, one thing that is not clear to me - if a project yanks a release, is that release still counted against the project’s quota? Because one potential way forward here would be to limit deletions, but allow yanking to free up quota. There’s nothing in PEP 592 or in the current version of this PEP about quotas, but if PEP 763 added a statement that yanked releases will no longer count towards a project’s space quota, that would address the quota question for me. I do appreciate that this could have the potential to be abused, and so might not be acceptable, but that would simply leave us back where we are at the moment.
You can also get the pythonhosted.org URL using web.archive.org on the PyPI #files subpage. e.g. here you can access the last meshzoo artifacts before the author went and encrypted everything.
I think I could be convinced of this, but in practice I think we have more public examples of widespread damage from deletions than from people being unable to publish a new version of a package.
But I also could easily see that being an observation/sampling bias problem (more downstreams than upstreams, downstreams tend to be louder when things break, “supply chain security” frenzy, etc.) .
To the best of my knowledge, growth in PyPI’s underlying object storage is not currently a major architectural constraint. It’s not ideal that deletion doesn’t also fully prune the object storage, but it also has no impact on the actual observed “size” of PyPI from the perspective of a mirror or downstream user, since mirrors and downstreams track the index listings, not the object storage itself.
In other words: permanent growth in the object storage isn’t good, but it’s also a red herring in this context.
(I acknowledge that I could be wrong about this, but to my understanding the index size is a much bigger general concern than PyPI’s object storage size, since the former has immediate implications for mirrors, performant downloads, CDN cache busting, etc.)
PyPI’s organizations are documented and the steps for requesting a new organization are linked in the first bullet of the FAQ page. I can understand if you hadn’t seen these before (since they’re PyPI specific, not PyPA), but these aspects are documented and (IMO) not opaque.
Anybody can request an organization, and to the best of my knowledge there’s no plan to make organizations “two-tiered” in terms of features: the split between “community” and “corporate” organizations is intended primarily to give PyPI the ability to eventually set pricing on corporations that make use of the community’s resources.
I think I see it the opposite way: limiting deletions means that users perform yanks, which they’re already empowered to perform. Users should not be performing deletions, full stop, because when users perform deletions the admins frequently have to go into triage mode and undo the ecosystem breakage caused by the deletion. Admins on the other hand should do deletions for things like malware and spam, and both have processes already that are largely (entirely?) independent of user behavior, since malware uploaders/spammers aren’t typically reporting their own packages
In other words: from my vantage point, limiting deletions will strictly reduce the amount of administrator overhead, even if it’s done before solving quotas more generally (which I am not advocating for). Fixing quotas will have a much larger impact on admin overhead, but removing the “break the ecosystem” button from users in favor of yanking (which they should always be doing instead, anyways) reduces the amount of one-off response the admins need to do.
At the moment yeah, yanked projects are counted against the quota. My understanding of the quota measurement is that it’s entirely based on index presence: if it’s listed in index, then it’s part of the quota. If it’s not (e.g. if it’s been deleted from the index, but still present in object storage) then it’s not part of the quota.
I would have no objection to removing yanked projects from the quota measurement, as a solution to the quota question! But re: abuse, I’m curious what the admins/maintainers of PyPI think.
Ah, I see what you mean. Yeah, I take that back then: I think limiting deletions would reduce the amount of admin triage work on undoing deletions, but the net admin work would still increase because of projects that need to delete to free up quota. Sorry for the misunderstanding!
FYI I don’t plan to try and undue the split, so there’s no need to comment that you didn’t think the split was the right thing for me to do (after it was requested that a split happen).
I was going to say, it’s not a solution, but then thought, maybe it is. Yanked releases can still be installed if e.g. you have a version pin against a yanked release (kind of the whole point?), but you don’t get them by default. So if yanked releases did not count toward your project quota, but also didn’t break the ecosystem, maybe it would effectively alleviate much of the quota pain.
(hoping this hasn’t stalled out to the point of not being a relevant comment).
I’m in favor of this proposal right now, but I do have a couple of concerns.
This proposal is very pypi centric right now. While pypi is the primary way most python users get packages with current standards (even some indirectly with distribution repackaging of sdists from pypi), this proposal would benefit from a standardized way for other indexes to be able to advertise that they also only delete for administrative reasons. This allows tools to make different decisions on caching than they can currently make, even if they are pointed at an index that isn’t PyPI.
While not directly required by this proposal, a standardized index page to advertise all yanked versions for a package and if a package was administratively modified, a timestamp of the most recent administrative modification allows various assumptions to be invalidated correctly and efficiently, allowing tooling to use the “no deletions” knowledge.
There might be a reasonable middle ground here, for indexes that want to support this, but do not have the same level of donated resources that PyPI currently has. Being able to advertise how long uploads are kept for at an index level may be a reasonable restriction for other indexes to advertise without having to commit to as much as PyPI. (for example, saying uploads are not deleted for at least 2 years from upload date might be reasonable for an index serving most recent package builds for architectures/platforms that PyPI does not support).
PyPI itself still needs an answer to the quota question. Maybe the prior point is even something PyPI would want to use at first until solving it, with something like a 5-year window. This wouldn’t provide the same level of “forever” that other ecosystems that have immutable uploads do, but it could be a start in that direction without having to commit as many resources and cover quite a bit.
This proposal is significantly more reasonable/appealing if we also make progress on append-only metadata (that can only restrict valid environment solutions, not expand them, for the security stance to remain reasonable) for allowing things like marking incompatibilities as they happen rather than needing to yank a perfectly okay piece of code that only has a (at time of upload unknown) incompatibility with another library version.
It’s relevant, but I think I might have to preclude it – I think I’m increasingly convinced that this PEP is not the right approach: PyPI’s deletion policy arguably falls entirely under PyPI-specific policy (similar to malware/AUP/etc.), and I’m no longer confident that it makes sense to have a PEP that defines the minutiae of PyPI’s policies like this one does.
I think the conversation around quotas makes this clear: I don’t think this PEP can assert a normative solution to the quota “problem”, since it’s (1) entirely a policy concern For PyPI, and (2) the admins (IIUC) see existing solutions are largely acceptable (those being the support system and organizations of both types). Given that, I’m inclined to withdraw this PEP and treat this as a PyPI-specific problem for future efforts.
Sorry if this is not a super satisfying response! I agree with your comments about being able to express deletion across multiple indices, but I think that kind of policy communication might be better suited for an entirely new discussion and/or PEP that focuses on letting indices communicate what they do rather than attempting to assert their policies through the standards process.
A couple of other potential index[1]-specific behaviors that will likely fall under a similar disposition include: PEP 694[2] index-specific upload mechanisms, and new-project registrations owned by the org. There’s likely other policies that PyPI will adopt in the future that may or may not make sense for other indices.
+1
Generally, it’s interoperability that needs to be standardized. Both between indexes, and between indexes and tools. That might lead to tools having different code paths for different indexes, but if they can discover what the policies are for a particular index, they might be able to manage that.
@woodruffw have you thought more about how to express index-specific policies (and behaviors?).
I was going to make exactly this point, although I may have a biased view as PEP delegate for interoperability specs. @dstufft is PEP delegate for index specs, and may have a different view on what constitutes a reasonable “Index PEP”.
IMO the upload API is potentially an interoperability spec[1], if the intention is that there’s a common API that tools like twine can use to upload to arbitrary indexes. And I think that a common upload API is a worthwhile goal to pursue.
I can sympathise with the view that PEP 763 isn’t about interoperability or common index behaviours, but is purely about PyPI policy. After all, other index implementations like devpi have very different rules on deletions, so there’s not much common ground in the first place.
Although I’m not proposing to take over the delegate role - Donald is far better qualified to be PEP delegate on that than I am ↩︎
Not concretely, but it’s something I’m interested in! I agree with @mikeshardmind’s points above about this being valuable as common metadata across multiple indices, so it’s something I’ll be thinking about as a potential future PEP.
Yeah, I think I wholly agree with this – PEP 694 has IMO an interop dimension that makes it suitable for PEPing (PEPification?) that deletion (as policy instead of communication) doesn’t!
FWIW, not specific to this PEP at all, but I think it’s wholly reasonable to use the PEP process to define policy for PyPI in some cases, and when we do that I think it should be clear that these are just for PyPI and not intended to apply to al indexes in general.
Primarily I think it’s good for things where the impact of a decision impacts more than just PyPI and where there isn’t a good objective criteria for what the right answer is. Largely it would be using the PEP process as a mechanism for discussion and final decision making on something.
Real examples of PEPs and my thoughts with regard to each one:
PEP 527 - Putting this in a PEP allowed us to collect feedback about what the impact of removing these file types would have and make sure the community was OK with us removing these, but it would be silly to have a PEP to tell any other repository what file types they’re allowed to support.
PEP 470 - Technically this wasn’t a PyPI only change, as it removed the mechanisms from PEP 438 that affected the (at the time non-standard) simple API and the installer tool flags from PEP 438. However, the bulk of this PEP is largely just whether or not PyPI was going to support linking to files that aren’t on PyPI or not. Other indexes are free to do whatever they want (and I believe some of them implement mirroring PyPI by linking to the files from PyPI). Those mechanisms could have been left in place and PyPI could have made this change on it’s own anyways.
But again, having this be a PEP allowed us to get feedback from the community, and effectively let the community decide whether this was a good pattern or not.
PEP 541 - Wholly a PyPI specific policy. This one is somewhat unique as it was not written by a PyPI admin, but who “owns” a name on PyPI is something that you can make reasonable arguments for in many different ways, and ultimately they’re effectively community resources, so the PEP process allowed the community to decide what the policy should be on name retention.
PEP 755 - This PEP is paired with another PEP that actually defined the mechanisms. It’s similar to PEP 541 in that it is wholly a PyPI specific policy. This one is one that I feel is probably the least strong case here, since it basically boils down to “namespaces on PyPI are limited to orgs and PyPI will limit the proliferation of namespaces”, neither of which I think are big enough questions to really need a PEP attached to them.
PEP 763 - (Hi there ) I think this one is similar to PEP 541 as well. There’s no real “correct” answer here, it just comes down to which trade offs we as a community we want to make. This again has a large impact on the wider community even though it’s “just” a PyPI decision, but again would be something that we wouldn’t expect to make decisions for other indexes on. I think it’s something that absolutely makes sense as a PEP.
Obviously every decision on PyPI can’t (and shouldn’t) run through the PEP process, for a lot of reasons:
The decision is just too small to invoke the whole PEP process for.
The decision really only affects PyPI itself and doesn’t really have any larger ramifications in the wider community.
The decision isn’t one that can be decided based on consensus, but rather other concerns (legal, operational, etc).
If we decide we’re not going to use the PEP process for PyPI only decisions that still have larger ramifications, the alternatives all seem generally worse to me? Taking the question of deletions as an example:
We can decide it in an issue on warehouse and/or back channel discussion between the PyPI admins.
A lot less discoverability that the discussion is even happening (and in the latter case, people would be unable to participate at all).
A lot harder for the PyPI admins to actually collect feedback about the community feels about whether files should be allowed to be deleted or not [1].
No clear final decision maker, as PyPI issues generally get resolved by… whoever happens to merge the PR solving the issue (or who closed the issue), or with discussion among the PyPI admins for larger issues. This typically works well, but “controversial” changes can stall out just because there’s no clear consensus. [2]
We can create a “shadow PEP” process for PyPI that roughly tracks the PEP process (probably with some tweaks to make it a little more lightweight).
Seems silly to me to have a second not quite PEP process just for PyPI.
Still has a lot of the drawbacks to the first option, just reduced somewhat.
Something else? I’m out of ideas actually lol.
I don’t think using the PEP process like this for PyPI is that weird either, at least not any more weird than PEPs like, PEP 430, PEP 588, PEP 374, PEP 385, and PEP 512. I’d argue that something like “should we allow deletions on PyPI” is going to be of greater impact to the wider community than “where should CPython host it’s source code”.
I will temper all of this with the statement that you should generally expect PyPI only decisions to go through a “normal” decision making process (e.g. open an issue, discussion, etc) and let the PyPI admins indicate that they think the issue is too big or impactful and should be sent to the PEP process.
For instance, in the original thread I posted, I probably wouldn’t have thought about the quota issues at all and unless one of the other PyPI admins thought about it, would have just moved full steam ahead. ↩︎
Which should be no surprise, that’s how the vast bulk of OSS projects work after all . ↩︎
To clear up one possibly confusing thing in my statement.
I think the question of “what should PyPI’s deletion policy be” is PEP-worthy.
I also think that quotas fall under the category of operational concerns where I don’t think adjustments to or removal of the quota system would be a suitable topic for a PEP.
Where PEP 763 struggles, is that I think the problems around needing to delete files to manage their quota is a large enough issue that would likely cause this PEP to be rejected if it got put up for proclamation [1].
This is kind of an awkward situation, since it means that a PEP-worthy topic is blocked behind something that the PEP process can’t really control, but such is the world we live in .
If someone wanted to push this forward, they’d probably have to do one (or more) of:
Work with PyPI (on the issue tracker, etc) to come up with changes to the quota system where people don’t generally feel the need to delete things to manage their project’s size.
Introduce new features that would limit the impact of the quota system [2].
Come up with a really strong justification that outweighs the need for projects to manage their project size and get consensus around that justification.
Something else?
With my PEP-Delegate hat on, without something to solve that issue, I can’t imagine accepting a PEP that limits deletions [3], and I’m generally in favor of limiting deletions in the abstract.
Of course, the quotas don’t inherently preclude limiting deletions, but rather that a number of projects felt that they needed to use deletions to manage their quotas. If they hadn’t come forward to say that they needed deletion to manage that, then it wouldn’t have been an issue. ↩︎
Something like PEP 759 is the only idea that comes to mind for me, but that was withdrawn and has it’s own problems/trade offs. ↩︎
Of course that only applies as long as I’m the PEP-Delegate ↩︎