Stop Allowing deleting things from PyPI?

And I agree with you. A removed project name should not be available for immediate grab. I would even go so far and neither allow deletion of projects and release files that are older than 7 days instantaniously. If a project or release of a project is around for a while, chances are high that somebody else has a dependency on the release. (could be a day or 3 days, too)

If I understand you correctly than your primary concern is project deletion and the danger of a malicious person grabbing the free name for nefarious purposes. I share the sentiment and go even one step further: I would like to give maintainers a tool that allow them to deprecate and shut down a project gracefully and at the same time prevent them from rage-quit-deleting a project without giving users a fair warning.

My idea is to remove a feature that can break stuff and replace it with better tools.

4 Likes

I don’t see it as “useless”, as ensuring any single author or maintainer is, ultimately, able to be replaced and cannot arbitrarily deny others the ability to use and share FOSS software as a key part of FOSS and its spirit of preserving individual freedoms for collective benefit.

Ethically (at least from the utilitarian consequentialist perspective to which I primarily but not exclusively subscribe), it comes down to a question of balancing the benefit and harm to maintainers and users (and recall that maintainers are not, often, the sole authors of the project’s code). The optimal answer requires a value judgement, which ultimately will vary from person to person.

However, for me, the focus of the core principles motivating FOSS on freedoms for users (relative to the traditional position of power of software owners), the far greater number of users to maintainers and the collective practical benefit to the community of avoiding one rouge, malicious, hacked or unsound maintainer from causing tremendous potential harm (which they may not intend) to the global community outweighs the relatively small burden on maintainers who may have a legitimate reason to delete a project/release.

I already gave four reasons for wanting or needing to delete individual files or releases. Let me expand on one. Sometimes you hit the cumulative size constraint on your package. You can ask for an increase via a support ticket, but (a) it can take weeks or even months to get an answer since PyPI maintainers are overstretched, and (b) you may not get that increase. In either case, deleting very old releases or pre-releases may be your best option. So preventing deletion of files is not a good idea.

This cumulative size issue is again not made up: I have had multiple maintainers of popular packages ask me to please poke PyPI maintainers in private or on the support issue, because they were not getting a response and were blocked from doing a new release.

You are only focusing on one side of this. In the end, you’re going to have to trust maintainers to know what is best for their users to some extent. Maintainers of popular packages know reasonably well how to make these trade-offs, and won’t delete or yank new releases without a very good reason. And your argument here applies equally to yanking as to deleting.

4 Likes

I agree with this

2 Likes

I do share the concern of this change potentially further stressing the already overtaxed PyPI admins and requiring further increased reliance on an overly limited pool of human maintainer bandwidth, and have seen instances of this as well. Perhaps this could be addressed by adopting automated features that would allow deletion in certain circumstances instead of a hard block, as some others have already proposed? For example, marking projects or releaes as deprecated for eventual deletion after some time, as others have suggested, or (to address this case) allowing deletion of pre-releases after a certain period of time after their release or that of their corresponding full release.

Yes, but its a balance between giving maintainers as much freedom in as many respects as practical with limiting potentially highly destructive actions that could have ecosystem-wide consequences that the maintainers themselves may not intend, which we’ve seen happen in the infamous left-pad incident, the auditwheel situation and a growing number of others.

Maintainers of popular packages may (and probably do) know what’s best 99% of the time, but without some limits, it only takes one such maintainer of one such package making one mistake or misjudgement, having one bad day, getting their account compromised/leaking their credentials, etc., to harm a wide swath of not only users, but maintainers of many other packages as well.

I’m a little confused here—if a release is yanked, as you mentioned previously, it will still be resolved if it is directly depended upon or if no non-yanked releases could satisfy the dependency, so I don’t see how this breaks dependents or users. Or is that not what you’re saying? I’m sure I’m misunderstanding something here, sorry.

I have to admit I’m kind of confused. I have a pretty much defunct project (lockfile) which PyPI notified me the other day was “critical.” When I asked about this @dstufft told me it was because of the number of downloads:

I believe it’s flagged because it’s one of the top N projects by download on PyPI, looks like it’s getting something like 10M downloads a month.

To my mind, number of downloads is kind of meaningless. What’s more important is the number of dependencies other packages have on such critical packages. @dstufft also referred me to the PyPI stats page for lockfile. I had no idea such a thing existed. (Thanks for that, Donald.) Looking through the plots, it seems systems using Python 3.8 are particularly interested in lockfile.

I find myself wondering about some of the same things as Dave Beazley, who posted a thread on Twitter this morning about his use of PyPI and how things have become much less interesting. I was especially sympathetic to his 4/n tweet:

And now with everybody all worked up about “supply chain” nonsense, it’s just further emphasizing the point that I didn’t release code to be “important”, but simply because I thought it was kind of cool (and maybe secondarily useful). 4/n

This was exactly my intention with lockfile. It seemed like something small and interesting at the time. I had no need for it (during my career I did essentially no cross-platform Python programming), but I wondered if a useful API could be created which hid the details of advisory locking on various platforms. As you can see, I haven’t done much with it in several years. I think the Open Stack folks did the 2015 release. I don’t ever recall porting it to Python 3. I was still only using Python 2 in 2015.

So, here I am with a casual project that I still “own” which somehow became critical to someone without my knowledge or involvement, left to wonder if I need to do anything, or if some critical software infrastructure somewhere will crumble because of some latent bug in the code. This isn’t anything I’d contemplated having to worry about in my dotage.

6 Likes

Personally, I find the author’s summary dismissal of what is a serious and growing problem in the software distribution field makes it hard to emphasize with the rest of what he says. It may not be the Reapers returning to wipe out the galaxy, but if this issue is ignored and dismissed end up posing a severe and perhaps even existential threat to the global commons that is the modern open source ecosystem upon which we all rely, and all benefit—whether hobbyists looking to show off cool cool code, scientists looking to share their research algorithms, maintainers motivated to benefit others, or companies building tools on and (hopefully) giving back to the FOSS community.

Of course, we all have different motivations, values and interests, which colors my take on this issue. Personally, as a academic and NASA-funded researcher who spends a large amount of both my free and paid time working on open source projects, what motivates personally is not sharing cool code (I doubt I’m good enough that anything I’ve written qualifies), but rather benefiting people all around the world with code that is perpetually free, libre and open, the more the merrier. Conversely, I simply wouldn’t feel motivated to release something that wouldn’t benefit others in some respect, whereas the more a project is used and the more it is useful, the more motivated I am to work on it.

To be clear, this is not out of pure altruism, but simply because I derive personal enjoyment from seeing others benefit from my work, just like others might enjoy sharing an elegant design, a cool concept or really any other hobby.

Indeed, a great (and very compellingly written) illustration of the infamous xkcd 2347 problem in a nutshell…to which there are no easy solutions, unfortunately (though it seems you are doing what you can).

Is this really a downside? It seems like that kind of cruft will continue to be around regardless of what’s decided here, it’s around right now, and it’s… fine? I don’t think the existence of dstufft.testpkg2 has caused anyone any problems…?

Re: deleting artifacts: I feel like this is blocked right now for several valid reasons, but those reasons themselves should be considered bugs.

Is there an issue on pip for this? Picking a yanked sdist does sound pretty annoying, and it would be great if pip had just… not done the annoying thing in the first place.

Yeah, as a separate issue it is extremely unfortunate that PyPI policy currently ends up forcing large active projects to delete artifacts. Or for another related issue, I was also talking to the spaCy folks about providing more wheels recently, like musl/pypy/etc., and they said that they were avoiding providing wheels for those platforms because of concerns about their quota, which again – probably not the outcome the PyPI admins were aiming for!

Maybe this is also something that could be improved on the PyPI side? idk what that would look like b/c I don’t know what constraints are informing the current design, but e.g. maybe it would be fine for established, well-behaved projects to get granted “unlimited” quotas, with some guidance about how to avoid abusing it and the understanding that if they start causing problems the admins will get in touch to figure out a better solution?

2 Likes

For what it’s worth, I’m sympathetic with this. I have many things on my Github that I just threw together because I could, and even some things in my PyPI account, none of which were ever intended to be someone’s critical dependency.

The flip side of that, is my intent doesn’t change that fact that people may start relying on it, maybe even a lot of people, and that by putting that thing out in the world, I’m implicitly responsible to some degree for it.

What “responsible” means in this case is kind of not clearly defined either. Part of what’s happening over these last years is some shifting feelings on what responsible actually means here. 20 years ago, it meant very little in regard to OSS, but today its meaning more as ecosystems mature and evolve.

Ultimately, I think that the choice of the word “critical” was perhaps a little overzealous here. In reality these are just the projects that get enough downloads that it was felt that compromise in them would have a large impact to the ecosystem. It wasn’t intended to say that these are some corner stone projects holding the world up, it could be as simple as an art project that a lot of people just happen to download to experience.

I think a lot of folks are getting hung up on the “critical” tag and are putting more meaning to that than I believe (and I wasn’t the one who chose it, so I may be wrong) was intended to mean. If it makes folks feel better about it, we don’t even model this as a “critical” project internally in PyPI, it’s just Project.pypi_mandates_2fa: bool = True or False, and we use that to display the critical badge in PyPI’s UI.

2 Likes

The quotas exist almost entirely because a handful of projects (across multiple projects on PyPI) were consuming large portions of the total storage on PyPI, and the cumulative effect was that a full mirror of PyPI currently takes over 12TB of space.

Taking a quick look at Statistics · PyPI shows me that for nightly releases, some variant of tensorflow is 12% of PyPI’s total storage, just for their nightly releases.

I haven’t been involved in granting increased quotas to projects and I see that tf-nightly for instance has a 1TB quota, so certainly some level of that is on us since we allowed them to grow that large.

I’m not really sure what the right answer is regarding quotas either! It’s possible we should remove them, it’s possible we should let some projects just have unlimited. It’s possible we should do that, but also disallow nightly releases on PyPI. I’m not sure!

I think that whether people like it or not, there’s also an impact from corporate users where issues around compliance and governance require certain minimum levels of “good practice”. While doing things right isn’t a bad thing, corporate culture is still very much based around “supplier-consumer” models, and as such the consumer (the company using open source software) expects to demand compliance assurances from the supplier (the open source developer), who in the case of open source is typically in no way interested in providing such assurances, and probably doesn’t even have the means to do so. Add to that developers under business deadlines dealing with problems caused by open source code, and dumping the pressure from those business deadlines onto the open source developer, and the relationship no longer feels like one of “giving things away freely” but more one of “being expected to work for nothing”.

Corporations are improving how they work with open source, but that’s still the reality that a lot of open source developers working on high profile projects see day to day. Having a project designated as “critical” feels to such developers less like an accolade, and more like “here’s another one you can get dumped on for”…

All of which is simply saying that yes, views and expectations have changed over the years[1] but IMO in a way that makes it even more important now to ensure that developers still feel that they have a level of control over their project, because at the end of the day that’s really the only reward they get, the feeling that “this is my creation”. Imposing constraints and demanding actions takes away that feeling of control - and it’s especially galling when that loss of control is presented as being directly linked to the idea that the project is useful to people.

Someone pointed out that by releasing software under an open source license, you’re explicitly letting people do what they want with your code and you can’t then simply refuse to let people use your project when support becomes a problem. That’s true, but I view it the other way around - the whole point about releasing the code (the open source contract, if you like) is that users are able to do things like audit the code, fix issues, track changes, etc. And that should free the developer from the sense of being responsible for all of that. Of course they may still do some or all of it, but it’s for their own reasons, not because someone demands it of them on the basis that “no-one else can do it”.


  1. And maybe I’m just a grumpy old git yearing for the good old days :slightly_smiling_face: ↩︎

2 Likes

At least from my viewpoint as a volunteer FOSS contributor and maintainer from the academic/research world who’s never worked in a corporate environment and would likely have a hard time doing so, I’ve always had a hard time understanding this assertion—particularly when it is directed my way over things like PEP 639 (not by you, to be clear). Your explanation helps me understand some of where people are coming from with that—thanks.

My perspective (click to expand)

To me, following, encouraging, and (occasionally, when very much necessary to avoid serious harms on other projects and the ecosystem) enforcing responsible and sustainable development, maintenance and packaging practices is to the benefit of the open source ecosystem and the continued growth, health and long-term viability of both the individual projects involved, the open source ecosystem and society at large.

Also, while there are some who still consider it a spare time hobby for fun, at least anecdotally I’ve noticed a significant shift in the maintainers for the more successful, community-developed, FOSS projects over that same time span (including many if not most of the core projects in the scientific stack), who have turned FOSS into careers of one sort or another and work on their projects full time.

Some are independent, while others do so as an academic/researcher, being paid by open source companies, funded by different employers or working at nonprofits and public benefit companies and organizations. However, they retain a key distinction with traditional “corporate” projects—the project is created, owned and developed by and for the community, rather than whichever entities, if any, happens to be funding one or another of the particular developers involved. In the meantime, corperations have grown far more tolerant of using, contributing to and creating open source projects.

As such, I’m not sure the old, somewhat dichotomous paradigm of open source hackers versus the corporate world still applies (and not to say that your post is necessarily tied to it, as opposed to the general mood of some), as there’s much more of a spectrum and intermingling, which seems to be, on the whole (with many exceptions) to be generally of mutual benefit (so far…).

As much as I’m naturally averse to corporate influence, I also see the benefits of more sustainable open source software world where a single rouge maintainer doesn’t have the power to push a big red button causing a ripple effect throughout the ecosystem, not only doing direct damage to other projects but also harming the reputation of open source maintainers everywhere—and ironically, perhaps serving as the impetus for governments, corporations and other external entities to open source to mandate “controls”, limitations and policies much more restrictive and detrimental than what we could have done ourselves to avoid this in the first place.

These are all very real issues and some I’ve grappled with myself, but I’m not sure the solution is offering a self-destruct button, which just creates an even bigger problem with far more user backlash. If someone no longer want to maintain the project, there are other better options than nuking it; a PyPI feature to formally “archive” projects, as has been discussed here, could help with that, and I’d rather make a happier path easier than preserve a darker one.

At least for me personally, I don’t see a few limitations on my ability of my account to unilaterally execute, purposely or not, an action with a high potential to irreparably harm my project and its users as taking away too much control.

IMO, I’d rather see a project I created used, contributed to and benefit more people around the world, and ensure it will continue to be able to do so long as it is useful to others, than have absolute control over it—that’s why I released it on a public index under a FOSS license in the first place. And for me, it doesn’t take away any of the pride I feel for having created it—in fact, it only increases it, to see my creation has grown to become something much bigger than just myself, and even more so if I’ve created something lasting and meaningful enough to outlast my own participation.

That’s what I want to leave behind—not my name engraved on a stone monument, but my work woven into the lives of others.

1 Like

I like this proposal, but I have a suggestion to prevent effective deletion of a project by deleting all releases and files:

  • The latest (non-pre-release version) release can’t be deleted if it’s more than 7 days old
  • At least an sdist or none-any wheel should remain in said release, if originally uploaded; otherwise files can still be deleted

As above, releases and files can still be deleted by PyPI admins via a support request with a valid reason.

This won’t stop bad actors, but raises the barrier to entry by requiring a new upload of malicious nature (eg upload a distribution of an empty module with a higher version number).


Having such a project setting doesn’t sit right with me; I wouldn’t want to lose the convenience. I think I would prefer if deletions require multi-factor authentication (MFA).

Irrespective of the outcome of this discussion, I’d propose that @ambv has hit on something here that probably wants a clarification: the ToU doesn’t have anything about what we might call negative cases. It says what every hosting site in the world does - by uploading, you are granting it the right to serve the uploaded content, and you can’t suddenly say “no, you can’t”. It doesn’t cover the case of unpublishing, whether that is involuntary (“oops, turns out I didn’t have the rights to distribute this code after all”) or voluntary (“I no longer want these packages here for some reason”). FWIW, it also doesn’t say anything about what happens in the case of legal demands made directly to the PyPi organization (that is, PSF, as the legal entity), and probably should at least mention that. IANAL, but I could not interpret what the term “irrevocable” applies to in this context. Other (non-legal) text like Help · PyPI does indicate that in the past unpublishing (“delete…the entire project”) was an anticipated right of the project owner - obviously this discussion might end up changing that in which case the help text will also need updating.

3 Likes

Not too important but just fyi I think this should be coupled with a clean/purge transfer mechanism. My 3 largest projects (Hatch included) involved thinking of a name, realizing it’s taken, verifying near-zero downloads, emailing owner to ask, then scheduling a time to delete or rushing to upload if they responded “just did it”.

5 Likes

Just to weigh in too, I am also in the “maintainers can yank, only admins can delete” camp. If we want to make exceptions on deleting very new/unused projects that seems fine but overall I think the correct verb for maintainers is almost always “yank”. The rare cases a true delete is needed, I think that should go through admin vetting, not least to make sure they are aware of any bugs the delete is being used to work around (which has happened before).

2 Likes

“satisfy the dependency” does not mean things are going to work. You’ll get an older version that may not build or be incompatible. Yes it does not affect users who have an == requirement, but that does not make a material difference to the argument here.

IANAL, but it is typical of such ToU; but makes clear the author cannot later revoke the license. In other words, “no takebacks”.

Indeed, I’m facing a similar situation with one of my projects right now, replacing a project that is only a standard cookiecutter skeleton with no actual content, essentially no non-mirror downloads and only a couple test releases a year ago.

That can be true, yes, but if that does occur, the issue is really due to the downstream project has not appropriately constrained its requirements to what it is actually compatible with (though that admittedly is not always easy to test), and can be fixed user or package-side by doing so, manually or in metadata/requirements.txt/etc as appropriate). And presumably, there was some good reason for the project to yank its releases, such that downstreams should take appropriate action anyway, and unlike with deletion that action is far less impactful than replacing an entire project.

I do want to re-iterate that you bring up some important real-world use cases above that should, if possible, be addressed before or during implementation of any prevention of package authors from deleting files.

The intention of yanking was it would not be selected for use unless that version was pinned, though the PEP leaves specifics up to the installer.

2 Likes

FYI there were lots of cases in Pip that didn’t have this behavior until this year: https://github.com/pypa/pip/pull/10625 (landed for Pip 22.0).

I’m not sure if anyone has any idea what versions of Pip are being used in the wild but I would be surprised if the majority of users were on 22.0+.

I don’t have any opinion on this specific topic, just thought it worth pointing out the behavior of yank is very sensitive to both the installer and the version of the installer you are using.