Stop Allowing deleting things from PyPI?

dalcinl · July 18, 2022, 7:21pm

I’m one of those “blessed” with that “Congratulation!” email noticing me of my project being declared “critical”. I got really pissed-off, but It is not my style to go public, rant about things, or expose my opinions to the world (to the point that I’m not a member of ANY social media platform so far). I got so pissed-off that I was considering to silently stop pushing new releases of my project to PyPI, and ask my users to grab tarballs from GitHub instead (with all the annoyance such approach would mean for end users).

But now that I see that two folks I utterly respect like S. Montanaro and D. Bazley are publicly vocal about issues with these moves, I no longer feel like a freak, and I’ll break my usual self-imposed silence. On top of what Skip and Dave said, I would just add the following concerns:

I’m a worried that after my project was designated as “critical”, my PyPI account may just become a sweet spot for hacking and takeover attempts.
If my PyPI account ever gets hacked, and my project used for a supply-chain attack, my reputation is at risk. No matter how much extra security PyPI implements, I may still fell victim of phishing or a social engineering attack. People that do not know me my genuinely suspect that I was involved in the compromise. Fingers will point at me. I may be accused of negligence. I may even get secretly investigated. Is being a PyPI maintainer worth the newly added risk?
TFA and its extra annoyance on maintainers is being enforced for the most downloaded projects. What a beautiful prize I just won for the success of my project after 20 years for working on it for free! If the extra annoyance would rather be imposed on absolutely everyone, then the pill would be much easier to swallow. In line with what D. Beazley said, I don’t publish my code as open source out of “love” or “generosity”, that’s not my motivation at all. I don’t want to get ideological, but if PyPI chooses penalizing the successful for the collective benefit of the community giving nothing in return, then people may silently vote against these policies the usual way: with their feet, by leaving away.

Finally, related to the discussion about project deprecation and deletions and maintainer accountability, perhaps an additional measure to consider would be project orphaning, that is, allowing sole owners to leave its own project and be done with it. Then PyPI could automatically manage de deprecation/pending-removal/removal policies. I don’t want to harm my users by deleting my project and I would never do that, but I absolutely don’t want to be held accountable in any way for end-user security, much less for supply-chain security at companies (I don’t charge you for my software, so at least take responsibility of implementing your own supply chain via auditing and whitelisting project and releases).

tiran · July 18, 2022, 8:55pm

The “critical” flag is only visible to project owners. There is neither an indication on a project’s page nor on your public account page. You can easily verify this: log out of PyPI and check your projects’ and user page.

A malicious entity can get stats by other means. PyPI provides a bigquery API. Sites like https://pypistats.org/ use the data from bigquery API to provide stats.

That reminded me that my PyPI ticket Document PyPI security policy in FAQ and security page · Issue #7970 · pypi/warehouse · GitHub is still open. In my opinion PyPI should make it more obvious that it is neither the responsibility of package uploaders nor PyPI to guarantee that software on PyPI a) works, and b) is secure.

dstufft · July 18, 2022, 10:52pm

This statement confuses me somewhat. The information PyPI uses to mark a project as “critical” is wholly public information, anyone could look up the same download information that we did, and there’s even guides for doing so. An attacker likely doesn’t need PyPI to tell it that "projects that are downloaded often are higher value targets than projects that are downloaded less often, since that is as pretty common-sense conclusion.

So, I’m missing how this change makes your target a “sweet spot”, since your account is a sweet spot because of how many people have downloaded your project, not because PyPI has some internal label that says it’s important.

I’m also confused by this.

It seems to be saying that if your account was compromised 6 months ago, before the 2FA requirement program was implemented, that your reputation wasn’t at risk if your account got compromised. I don’t see how that is the case though?

From what I can see, all of the things that you’re worried about happening in case of an account compromise, are things that happened because your account got compromised, not because PyPI is making it harder for an attacker to compromise your account. None of these risks are new, in fact this change requires you to make it harder for these kinds of attacks to happen, to reduce the risk of these things ^[1].

I agree that 2FA is an additional step that you must take prior to using PyPI and another “thing” to keep track of, and that can be an annoyance if that isn’t part of your workflow already.

What I don’t understand is how having it applied to everyone affects you? In the two hypotheticals, it’s limited to the top 1% like it is now, or it’s applied across the board, you’re still required to do the exact same amount of work.

What does change is that the more people that we have onboarded onto 2FA onto PyPI, the greater the amount of work the PyPI team will have to do to cope with account recovery issues. This is particularly a problem in the “long tail” of PyPI projects/users where they may not actually use PyPI that often at all so they’re more likely to have lost access to their account whereas major projects tend to publish releases more often, so are more likely to keep track of being able to log in.

So, while I agree that absent anything else, that you can make an argument that it’s “fairer” if everyone has mandatory 2FA applied to them ^[2], the flip side of this is that this argument is basically saying that you believe the PyPI developers (who also provide this service for free, and are mostly volunteers) should have to do extra work with their free time, without it actually changing how much extra effort the 2FA requirement would be for you, solely just to make you feel better about it.

I think some mechanism around abandonment could be a useful feature, I called it out in the (4) bullet point in my post 2 posts up from yours. I don’t know if we’ll add it, or what exactly it would look like, but I do think it could be a useful feature.

Another way of looking at this whole thing, is that the PyPI admins don’t want to harm our users, which include you, but also include people downloading software from us. We don’t charge you for this service, so at least take responsibility of implementing your own account security via strong passwords and two factor auth.

As an aside, if you’re using a security key like the ones that are being offered for free, you can’t be phished because those security keys are phishing resistant. That’s one of the ways they’re better than the application-based authenticators. ↩︎
And we may get there! We need to see how this roll out goes. ↩︎

sumanah · July 29, 2022, 4:07pm

Thanks for the in-depth response; I appreciate your perspective and it sounds like we (speaking broadly, not just within Python) need to have a larger discussion to develop more effective tactics for maintainers who wish to strike.

(I’m not necessarily opposed to workers’ strikes that primarily affect people with the least power to deal with the disruptions, such as railway workers’ or teachers’ strikes.)

On the fanfiction site Archive Of Our Own, this is known as orphaning a work.

Orphaning is an alternative to deleting a work which you no longer want associated with your account.

Orphaning will permanently eliminate all your identifying data from the selected work(s). Data is eliminated from the work(s) themselves, and also their chapters, associated series, and any feedback replies you may have left on them, transferring it to the Archive’s specially created orphan_account. Please note that this is permanent and irreversible—you are giving up control over the work, including the ability to edit or delete it, and you are unable to reclaim it.

Orphaning is a way to remove some or all of your works from your account without taking them away from fandom. We hope you’ll use the orphan_account to allow your works to remain in the Archive even if you no longer wish to be associated with them, or have them connected to your account. Orphaned works will be maintained by the Archive to be enjoyed by future fans; existing bookmarks and links will not break.

ofek · July 30, 2022, 1:05am

I think this strays far too much away from the telos of PyPI, and therefore also our purpose as responsible stewards of it.

CAM-Gerlach · July 30, 2022, 5:36am

Just to note, given:

The degree of centralized power a single or small number of maintainers already have and further would gain over their projects and the larger ecosystem
The fact that many projects (including some of the most widely dependended upon ones, such as botocore) are controlled directly by large corporations, while in many others the maintainers are merely the top managers of a much larger group of volunteers who do the majority of day to day labor
The action involved is not withdrawal of continuing labor (as both maintainer and contributors could easily do by simply withholding further work on the package) but rather a sudden denial of access to the published, FOSS-licensed final product

…It seems at least arguable that this action may more closely resemble, at least in many cases, a lockout by management than a mass strike by laborers.

Also, privileging the relatively small number of maintainers (including myself, as a volunteer maintainer of a PyPI top-200 FOSS library)—and in many cases, corporations—whose packages happen to be widely dependended upon, and thus already automatically hold disproportionate power and influence, at the expense of a much greater number of workers who are much less empowered (including many other volunteer developers who may contribute their labor to those projects, and the volunteer maintainers whose own FOSS libraries and applications happen to depend upon such packages) may not be viewed as particularly equitable or just.

Furthermore, implementing functionality explicitly with the purpose of enabling “strike” actions by maintainers entails either community consensus that this is desirable (which from this discussion appears rather unlikely given the diverse range of views on this topic), or requires the relatively small number of people with management authority over PyPI to make a unilateral decision to disproportionately empower a relatively selective echelon of top maintainers of highly depended-upon projects.

Finally, it seems at least plausible that explicitly doing such could plausibly open up maintainers, the PyPI admins and perhaps the PSF and others to the specter of legal liability in the event a maintainer taking such action had damaging effects on a suitably well-resourced and aggressive corporation—and I’m sure I need not explain how disadvantaged those without wealth or corporate backing are in the legal system, even when the claims against them are without merit.

sumanah · August 1, 2022, 2:34pm

Sorry that I was unclear here – I did not mean to say that PyPI or its stewards would in any way be involved in that discussion. I meant to indicate, and should have explicitly said, that I would be taking these thoughts to a completely separate conversation (not in this thread and not on discuss.python.org) among open source workers. Apologies!

mwichmann · August 1, 2022, 5:01pm

The liability situation should be well covered by almost any competent OSS license, which disclaim any all warranties and liabilities, right?

nanjekyejoannah · August 2, 2022, 10:17am

This is a discussion worth having, I have no ideas on practicality but its humane that is if PyPI authority doesnt think they are always right.

We all dont want folks to think we need an option to PyPI but I can see how the tone of some views here especially from respected folks may prompt this.

CAM-Gerlach · August 5, 2022, 6:14pm

In theory, but the reality can be more complicated, at least in the US legal system (I can’t speak for other jurisdictions)—and if a liability waver was a perfect defense against legal action, lawsuits (or threats thereof) wouldn’t be as prevalent as they are here. Furthermore, I was referring primarily to PyPI’s liability, which is governed by the PyPI ToS rather than individual project licenses—that doesn’t contain an explicit warranty/liability disclaimer, and even if it did, courts have not always held ToSes to be enforceable.

It is a rather secondary concern relative to the others, and could potentially be resolved after consulting a licensed attorney (of which I am not) for legal advice (which this isn’t), but does offer some potential for legal hazard. But of course, it seems to be moot point in any case since it seems that wasn’t what was being asked for.

tjreedy · August 6, 2022, 12:34am

Two quick points.

I don’t think that the economics of scarcity applies, or should be applied, to digital copies, which are about at free as or more so as breathable air and drinkable water.

Ownership of copies on owned hardware is still a thing. If I upload a file to pypi, is the copy on pypi hardware a gift (subject to its license) or a callable loan, possibly with automatic conversion to a gift? This is the nub of the discussion. I presume that copies on user hardware can be treated as gifts, subject to license.

CAM-Gerlach · August 6, 2022, 12:39am

An interesting perspective, for sure…

At least per the PyPI ToS and also any FOSS license, it is as much a gift on PyPI (or any mirror) as it would be on a user’s machine. I assume the only question is if we can find some overriding justification for treating it otherwise.

rbtcollins · August 6, 2022, 4:09pm

Sorry if I missed it in this large thread, but I think a key right to preserve is the ability to walk away from something. If we remove project deletion, which is just one way to prevent name reuse, we should permit orphanage or something similar.

I agree with the security risks of name reuse and think we should solve that

An alternative to deletion blocking is just rejection of the use of a previous name.

Rob

dstufft · August 6, 2022, 7:26pm

Yea, I raised that earlier, which has a lot more context, but I roughly collapsed down to 4 specific ideas:

Donald Stufft:

Off the top of my head, I see a few possible options:

Do nothing, as there are at least three possible work arounds:

Users on PyPI can change their name, email, avatar, everything except for their actual username, allowing them to replace it with some form of anonymization.

They could create a new user account with an anonymous username, transfer all of their projects to that, and remove their “real” account from those projects, which would then allow them to delete themselves.

From an end user point of view, project level yank is pretty close to deletion since it removes the project from all user facing content, and the simple API is basically just a list of files with some related metadata (hashes, python-requires, etc). That doesn’t let the person get rid of their PyPI account, but if they just want to distance themselves from a project, it’s pretty close.

Add the ability to change your username on PyPI, but otherwise do nothing [3].

Allow users to hide their association with a project in the public UI/API [4].

Enable some way for a user to “abandon” a project, removing it from their account but not deleting it [5][6].

Jacob-Stevens-Haas · August 8, 2022, 4:16pm

Generally, it feels there’s a Pareto-like threshold of how many downloads vs how much content a project would need to be considered a non-squat.

I particular, I’m wondering how many namespace squats match “zero releases/files” and how many have a placeholder file. The latter seems like it would be a common type of “hello pypi/twine” mistake for people who are unaware of the rules about invalid projects, but also might be common for abandoned/indefinitely-sidetracked projects.

I had a pleasant experience today reaching out to a package owner whose company had squatted for a year on a common, single-word package name. They were happy to delete the project to free up the name for me. I know there’s a PEP541 process for changing ownership, but maybe the following bullet is a nice addition to facilitate the cordial resolution of namespace squats without involving PYPA maintainers:

Projects that are older than 72 hours, that have been downloaded by a known installer (e.g., not mirroring tools or browsers) less than 100 times across all releases and project life, may be deleted without any other restriction.

I don’t know how accurate the “known mirroring tools/scrapers” list is, of course.

dstufft · August 8, 2022, 6:20pm

So, the above rules in my post allow deleting files if they match specific criteria, and if you’re able to delete all of the files, then you’re obviously then able to delete the project itself then. Those hypothetical rules are:

Donald Stufft:

Projects may not be deleted if they have any releases/files associated with them [1].

Releases/Files that are a PEP 440 pre-release may be deleted at any time without restriction.

Releases/Files may be deleted within the first 72 hours of release without any other restriction [2].

Releases/Files that are older than 72 hours, that have been downloaded by a known installer (e.g., not mirroring tools or browsers) less than 1,000 [3] times in the past month, may be deleted without any other restriction [2:1].

Otherwise, deletions are not available without contacting the PyPI admins [4].

Any deletions that are allowed, follow the same caveats as they do today:

Irrevocable

Projects are released back to the overall pool to be registered by anyone

Files, once deleted, may never be re-uploaded.

So, I think the “current” suggested rules are actually more lenient than what you’ve suggested, but maybe the wording is confusing. Internally in PyPI we can expose actions that let you roll up multiple deletes into a single action, it’s just broken apart in the proposal above to make the lines clearer.

mwichmann · April 14, 2023, 7:22pm

So we just had a removal that broke some things rather impressively. The situation is described here, including an added postmortem that admits “we screwed up” (of course it took some prospecting to find this, since there is NO entry left on PyPI, that could point you to web pages, provide a description, etc. Even download stats are gone since the package was nuked):

People who were using this, like one of my projects, had a high probability of having the usage embedded in a CI setup - provisioning a test image does a pip install codecov and so it’s unlikely you see any kind of deprecation notice, even if one was somehow embedded in the package, as long as the coverage build kept running. I’m not interested in blaming Sentry - it’s done, that wouldn’t help anything.

Do we have guidance on how to politely withdraw a package, if you actually concluded you must? (I followed this thread at the time, though now I’m too lazy to reread it in full). The only thing that came quickly to hand in a search was this, which is obviously just one developer’s suggestions:

uranusjr · April 14, 2023, 9:34pm

Yanking (PEP 592) would be a better option in most situations. (It was only implemented by PyPI in late April 2020 so it’s likely the author of your linked post wasn’t aware of it at the time of writing.) If you yank all versions of your project, all new installing requests would fail, unless they specify the exact version they want to install. With an appropriate yank message attached, this should practically signal the intention of you wanting people to avoid using the package, but still providing a relatively low effort opt-out for transition (by adding a codcov==2.1.12 line to the build process). Since yanking does not free up the package name, it would also stop the follow-up issue of a potential malicious third-party taking control over the name.

By the way, this post from the Codecov Discourse uses the term “yank”, but I’m assuming (based on the observable side effects and wordings from first-party messsages) you used the more destructive removal mechanism instead of yanking? Or did you actually use yanking and it still resulted in more than desired disruption?

dustin · April 14, 2023, 9:48pm

Correct, the entire project was removed, nothing was “yanked” in the PEP 592 sense.

(BTW, I don’t think OP was the person that removed the codecov project, just someone that was affected by its removal)

mwichmann · April 14, 2023, 10:42pm

Indeed, affected. I’m bemused by this bit:

Our intent was to remove a deprecated, rarely-used package from active support.
As this package was unmaintained and limited in use, there are no plans for the codecov package in PyPi to be made functional again.

The codecov package wasn’t “rarely-used package” or “limited in use”, it was downloaded 1.3 million times according to https://pepy.tech/project/Codecov , which places it in the top 1% (maybe top 0.5%) and I expect enough for PyPI to designate it as a “critical project” requiring 2FA for maintainers.

Like others in the Discourse @dustin pointed to, we were not using this directly in a GitHub action - apparently the assumption is it was only used in one way - we were using it in AppVeyor (launched through github). Never saw anything suggesting this was going away. As I said before, I don’t want to gripe specifically about this specific event, despite being irritated by it. This thread started out proposing that something (later modified to “something heavily used”) shouldn’t be able to be completely removed, obviously we’re not there.

Clearly this one was handled badly. So the question again is: should the owner of a package decide they want to remove it, for reasons that are sure to seem very compelling to them, do we have guidance (and a way to get that guidance in front of people) on how to proceed in going away in a manner that isn’t quite as disruptive?