Stop Allowing deleting things from PyPI?

Just looking at PyPI download numbers (so ignoring CPython bundled pip), pip 22.0+ is in the majority.

1 Like

That is an interesting data point but I think it only implies that users who download a lot from from PyPi are on 22.0+, not that the majority of individual users are on 22.0+.

Having worked in an enterprise environment that uses something like Artifactory we had a lot of Python teams but the entire company would only ever download one file once from PyPi as it would then be cached internally.

Also users who are not on ephemeral storage devices or randomly created VMs will be taking advantage of Pipā€™s cache and also not downloading nearly as much.

2 Likes

Ah great. So that suggests that the case of pip using an old yanked release when itā€™s the only thing that satisfies an open ended constraint is less of an important thing to worry about, since that behavior should go away now[1] as 22.0 works it ways through the ecosystem.


  1. ā€œNowā€, as in starting now, but will likely take a year or years before itā€™s able to be assumed as the baseline. ā†©ļøŽ

As others have mentioned, it stands to reason (and can indeed be seen in the data) that most of the time, when pip is installed/updated from PyPI, it is just the latest version by default; i.e., this is P ( pip >= 22 | update). However, this by no means implies much about the unconditional probability/proportion of users, since it depends on P (update), and many users only update infrequently, rely on very slow-moving distro repackaging or never update it at all. In fact, even the Anaconda defaults channel is still stuck on pip 21.x.

Were you referring to me when you wrote ā€œSkipā€ or some other Skip-like person? I know itā€™s not a super common name, but it would be best if you can refer to folks using @ notation when possible.

Edit: Oh, never mind. ā€œSkipā€ is a verb hereā€¦ I need to do more :thinking: ā€¦

We can actually do better than look at pipā€™s downloads, we can look at what versions of pip are actually downloading from PyPI.

Thatā€™s what versions of what tools are downloading any file from PyPI over the last 30 days.

5 Likes

Ah, great, thanksā€”I figured youā€™d have that data somewhere via user agent or what have you, but I couldnā€™t find it anywhere public, at least without messing with the BigQuery dataset.

Performing some quick and dirty analysis in pandas, we see that around 90% of downloads are performed by pip, with most of the rest by requests without a specialized user agent (I presume thatā€™s generally various mirroing and custom tooling):

Name Download %
pip 89.06
requests 8.27
bandersnatch 1.07
Browser 0.57
setuptools 0.36
Bazel 0.21
Nexus 0.11
pex 0.10
Artifactory 0.03

Considering only the downloads by pip, we can break it down by major version:

Major Download %
22 35.22
21 33.45
20 23.48
9 3.77
19 2.93
18 0.68
1 0.16
10 0.15
8 0.13
6 0.02

Plotted by major version in a manager-friendly pie chart, we have

The big gap between >=19 and <= 20 is somewhat surprisingly, at least relative to the lack of such between 20.x, 21.x and 22.xā€”the latter have similar proportions, while the former have far fewer combined than any of the previous three individually.

Quick n' dirty analysis script
import pandas as pd

DOWNLOADS_FILE_PATH = "~/Downloads/pip_downloads.csv"

df = pd.read_csv(DOWNLOADS_FILE_PATH)

by_tool_name = (df["downloads"].groupby(df["name"]).sum()
                       / sum(df["downloads"]) * 100)
print(by_tool_name.round(2).sort_values(ascending=False).to_markdown())


df = df.loc[df["name"] == "pip", :]

df["major"] = df["version"].str.split(".").str[0].astype("int64")
by_major_version = (df["downloads"].groupby(df["major"]).sum()
                 / sum(df["downloads"]) * 100)
print(by_major_version.round(2).sort_values(ascending=False).to_markdown())
by_major_version.sort_index().plot(kind="bar")
2 Likes

It looks like some of them may be poetry? At least if Iā€™m reading the poetry code correctly it doesnā€™t look like poetry has a user-agent[1].


  1. Which reminds me that we should really formalize the user agent format at some point. ā†©ļøŽ

2 Likes

Ah, of course. Are there any others besides custom/one-off tooling? As far as I understand, PDM and Flit use pip under the hood and I assume pipenv does too, Hatch doesnā€™t have installation functionality itself (though I thought it did), PBR is just a Setuptools plugin and Setuptools is already there (I didnā€™t even know it could install from PyPI itself). Iā€™m not sure about other tooling like Enscons, Maturin, MesonPy or Scikit-Build, but AFAIK the numbers are either small or they donā€™t have install functionality either.

As in a dep add command, correct

Hi everyone,

As a maintainer, I think itā€™s clear that by using the PyPI index, Iā€™m bound by its rules. If delete becomes forbidden, itā€™s not really my place, as an author, to voice a strong opinion.

I would simply suggest that these rules are clearly stated when registering to PyPI, not obfscutated in some ToS. Communication is key for a healthy relationship between authors and this service. For instance, while I was quite intrigued that one of my projects was considered critical, I would have appreciated a bit more context in the email we received. Unfortunately, I lack time for monitoring this space or your GH org so Iā€™m likely missing out on a lot of discussions, but as a user I hope for the relevant bits to come to me in due time.

So, by making clear that you canā€™t delete a project when creating it, at least we make sure people can make a decision if they want to continue or not.

The only thing I wonder is, should a project I own is not maintained anymore, but served by PyPI, how does PyPI communicate to end users that a project may be served but not active anymore. Will the user carry their expectation to the author or to PyPI?

It would seem to me availability would likely translate with ā€œfreshā€ in userā€™s mind.

Could installers warn users when a project they install has the Development Status :: 7 - Inactive classifier set at least? Could PyPI make it clearer as well in that case?

PyPI and the ecosystem used to be a bit rocky for authors but in the recent couple years (or so) itā€™s been much more powerful and stable and itā€™s been great so thank you all for the work done.

Cheers,

5 Likes

We are, in this discussion, part of the ongoing effort to define what ā€œresponsibleā€ means, what a maintainerā€™s rights and responsibilities are. And so I write to defend the maintainerā€™s right to withdraw their labor, including withdrawing the publication of their past work.

(I appreciate @ambv pointing out the language of the Terms of Use, and I thus acknowledge that currently, PyPI has license to retain and continue to distribute work that has been previously uploaded to the platform. As a platform, PyPI might still choose not to do so, perhaps under particular circumstances.)

If a maintainer wants to withdraw their labor, or threaten to do so, in order to negotiate better working conditions or pay, then one option currently available to them is to remove their past work from their own website as well as public platforms as GitHub and PyPI. If we disallow this option then we remove part of a maintainerā€™s ability to strike and thus their leverage in negotiations with their users.

There are other reasons that maintainers might want to delete releases and projects, some of which garner more sympathy than others. The developer who removed left-pad from npm did so to protest the platformā€™s decision to remove his control over a different package name and give it to a for-profit company. Seth Vargo took down packages to protest their use by a government agency he disapproves of. A maintainer might be completely done with supporting a rude and over-entitled user base. A developer might want to indicate to users that the code is no longer supported, and/or force them to use a fork someone else maintains, or to use a specific index that is not PyPI.

But, to me, the biggest argument for PyPI as a platform continuing to allow project owners to delete projects and releases is that, if we donā€™t do that, we remove a crucial part of maintainersā€™ leverage in negotiating with all users for better pay and better working conditions. Right now, every project has a piece of negotiating power: if we decide that the current situation is untenable, we can take the build artifacts away, and possibly take down the source code repo itself as well, and then users will have to find some archives and fork the project, or switch to a competitor. The threat of that inconvenience is part of the power that independent maintainers retain in an environment where they are frequently exploited.

Perhaps some kind of compromise is in order, where project owners and maintainers can still delete stuff as they can now, but thereā€™s also a structured appeals process where users can appeal to PyPI to re-publish particular releases. I recognize that this would be a huge headache all its own, in comparison to a no-exceptions prohibition or decision not to prohibit.

Thatā€™s a real and compelling problem and Iā€™d love to better understand whether there are some other ways to address it.

7 Likes

Iā€™m still considering your whole post, and I want to do a longer form reply to it specifically when I sort out how I feel about the content, but I just wanted to clarify that my most recently suggested idea here doesnā€™t remove a maintainerā€™s ability to withdraw their work from PyPI[1].

Hereā€™s my most recent suggestion:

Unless Iā€™m missing something, that doesnā€™t prevent authors from withdrawing their labor, it just prevents them from being able to free the name up for anyone to register unless theyā€™ve explicitly opted into losing that power.


  1. All though the one way-ness of our deletes makes this feel like this is a particularly footgun-ish way to implement that. ā†©ļøŽ

5 Likes

I agree with allowing more control (basically Donaldā€™s current suggestion) though please letā€™s not use loaded terms. I wonā€™t respond fully here because it would be long and philosophical in nature but I think itā€™s incredibly important to use language precisely.

Even in the extreme hypothetical where I work on something for years for free under the MIT license and I have no help, only a maelstrom of bug reports and feature requests from rude folks, and some Big Corp Foo ends up monetizing it for billions, and letā€™s even say Iā€™m impoverished; exactly no one in that situation is being exploited. Perhaps in a Marxian sense sure, but in reality, and how most people view that word, no.

One might claim that the users should open pull requests or that itā€™s morally repugnant for Big Corp Foo to not support me, and we might agree on that. However, they arenā€™t breaking a social contract.

People who participate in open source are volunteers and all volunteers implicitly enter a social contract. Namely, that one is providing value and the beneficiaries owe the volunteer nothing for that value.

By a volunteer taking away what theyā€™ve already provided they are breaking that contract. We wouldnā€™t tolerate a medical professional with Doctors Without Borders to remove sutures or a builder with Habitat for Humanity to remove bricks if they decided they wanted to improve their conditions or if a beneficiary suddenly came into wealth. So, some limits exist and are justified.

11 Likes

Sorry what? Your view is not mine at all on Open source. The whole debate about value and ethics in PyPI recently has left me very uncomfortable about the communityā€™s direction.

As a developer, I donā€™t decide to provide value to others. I release code that people decide to use and rely on. I then choose to help users of my projects because I want to, not because of some social contract decided by some force out there.

PyPI, as a service, is more than entitled to set limits and constraints. I chose to rely on you so I respect your decisions. However, Iā€™m not confortable about the philosophy around the fact PyPI could make a decision about what, as an author, I want to do. A project of mine is marked critical but I havenā€™t worked on it for years and clearly marked it deprecated (I even tried to pass the torch but nobody took it). I respect the decision that PyPI wants to signal that it will protect end users from loss of projects but has PyPI considered the signal sent to end users about expectations on projects being maintained because they are critical? Iā€™m sorry but if this steams from ā€œwe are in social contract so we owe beneficiariesā€ then I think PyPI needs to get clearer about its goals.

4 Likes

I think in this discussion there is a need to distinguish between two different things:

  1. As a contributor to an open source project your work is released under whatever license and the license typically means that others can use the product of that work even if you later decide that you do not want them to.
  2. PyPI hosts released files for a project under a particular personā€™s name (or a groupā€™s name) and that person/group should be able to decide that they do not want those files hosted on PyPI under their name any more.

If I decide to withdraw my BSD-licensed release files from PyPI then the license permits anyone else to make their own version and upload new files under a different name provided they preserve the copyright notice and respect the other terms of the license. I should still be able to withdraw the files from PyPI that are currently hosted under my name with my email address etc: nothing in the BSD license requires that the files be downloadable from PyPI under any particular name. Similarly I can delete any repos from GitHub etc without breaching any terms of the BSD license but the license permits anyone else to keep their own fork if they want to.

Technically the license does not prevent someone else from putting the files up to PyPI under the same name but there is a separate question about PyPI ensuring trust to users who install or depend on some project. Anyone who uses my project has (at least implicitly) decided that they trust me or a group that includes me to maintain that project responsibly and not sneak in malware etc. For that reason PyPI needs to not allow others to take over a particular name so it can provide trust in the continuity of the contributors behind any particular series of release files that it hosts.

3 Likes

Yes, good point. I think it is not necessary that PyPI withdraws all the files on request but I think that on request PyPI should:

  1. Dissociate all the data from the personal information of the author (like name, username, contactā€¦)
  2. Mark such a PyPI package as orphaned/abandoned. This is meant to protect PyPI users from unexpected results and possibly to help find an adopter for the orphaned package.
  3. Remove any personal information from all the files. This request could be independent from the request 1.

I am not sure what to do when the points 1 or 3 conflicts with the packageā€™s license (attribution).

3 Likes

So I haven;'t read all post in this rather long thread, so hereā€™s my two cents:

A while ago (during the last few years) the author of pyatom removed their package from GitHub. This broke my package for anyone trying to install it and I suddenly had to make new releases switching to a new library (there was no drop-in replacement) and for those who were stuck on an older version (for which I did NOT want to spend that effort), I had to dig out some place that still hosted the package (luckily I found a public Nexus instance of a trustworthy company that still had it).

Should this be prevented? IMHO thatā€™s a big YES. Weā€™re talking about a package that was released years ago, but still worked fine. It should not suddenly become inaccessible just because someone decided they are no longer interested in maintaining it.

So linking it to age (both for files and project) and/or downloads (or incoming dependencies from active projects) seems like a good approach. Let someone delete a release they made by accidentā€¦ 7 days is a lot, I go for something like 24h max, but then again thereā€™s very little harm if something gets deleted after a week. Beyond that? Better have a very good reason and contact PyPI staff.

Also, maybe show package owners stats about their packageā€™s downloads etc. - setting up pypinfo is a pain due to the Google API stuff. That may make thing people twice of taking down an existing and older release (which they could no longer do anyway). But if thereā€™s a mention that well-justified cases can be sent to the PyPI maintainers, then such information may discourage someone from doing so ā€œjust becauseā€.

3 Likes

I was curious what other, similar, package repositories did here, so I tried to investigate as many as I could think of.

These are all language specific repositories that allow just anyone to upload software to them, not anything where there is a closed set of trusted users (ala a Linux system).

I also ignored languages like Go that donā€™t have a central repository at all.

This is what I came up with:

Repository Delete Project Delete Releases/Files
PyPI :white_check_mark: :white_check_mark:
crates.io :x: :x:
npm [1] :x: :x:
RubyGems [2] :white_check_mark: :white_check_mark:
Maven Central :x: :x:
Packagist [3] :x: :x:
Nuget.org [4] :x: :x:
Hex (Elixir) [5] :x: :x:
CRAN :x: :x:
CPAN [6] :white_check_mark: :white_check_mark:
LuaRocks [7] :white_check_mark: :white_check_mark:

An interesting observation here is that, from what I can tell, none of the repositories implement things such that it allows deleting releases/files, but not whole projects.

Among the projects that disallow deletion, most of them have some sort of grace period or exception clauses to allow deletions in cases where the deletion target is brand new and/or is not being used, presumably to balance between being able to remove brown bag releases or releases with leaked credentials as well as ā€œcruftā€ that builds up over time and the benefits of having the registry effectively be an append only data structure.

Another interesting observation is that one of the few projects that allow deletion is RubyGems, which originally did not allow deletion, and just had ā€œyankā€ support, which functioned similarly to our yank. However, in 2016 they switched their yank to not act like our yank, but instead act like our delete.

Finally, it appears the majority of repositories do not allow deletion, and the ones that do are all the much older ones, which generally come from a time when automated package management was less of a concern.


  1. npm models project wide deletion as happening implicitly when you delete all of the files. It does allow you to delete a file if itā€™s < 72h and nothing in npm depends on it, or > 72h if nothing depends on it, has < 300 downloads in the last week, and has a single owner/maintainer ā†©ļøŽ

  2. RubyGems calls this ā€œyankingā€, and it used to work like yanking does on PyPI, but they switched to making it a delete option in 2016 ā†©ļøŽ

  3. Deletion is possible if there has been sufficiently few downloads of the package (someone said around 50 to 100). ā†©ļøŽ

  4. Nuget supports a feature like our Yank, and interestingly they allow it at the project level as well, which removes the project from the public UI in addition to yanking all the files. ā†©ļøŽ

  5. Hex does not allow deleting projects or packages unless itā€™s been < 60 minutes of a new release, or < 24h of the initial release. It supports a yank like feature. ā†©ļøŽ

  6. CPAN has ā€œBACKPANā€, which is a mirror of CPAN that does not delete any files ever from what I can tell. ā†©ļøŽ

  7. I was corrected in Stop Allowing deleting things from PyPI? - #60 by layday that LuaRocks allows deletion. ā†©ļøŽ

13 Likes

You can delete both projects and releases from the LuaRocks website.

2 Likes