When a malicious package is identified, I strongly believe we have a moral obligation to notify people that they have been hacked. I work in the industry, and cleaning up from a breach is never fun - but one thing is universal, the longer the attacker has access, the more expensive it is to clean up. Additionally, failing to notify people that they have been breached may even be against California and EU law.
Malware distributed by pypi needs to marked as malicious and notify users with pip that they have been hacked. One option here is that pip upgrade should throw an error, and there should be a way to remove the malware package from the system.
Without notifying people, the criminals that are profiting from pypi’s lack of security - will continue to do so because of the lack of evidence to bring them to justice. I would like to see pypi work closer with law enforcement to share details about these attacks, “if you see something say something.”
I’d also like to see the stats of how many people are hacked each month. I think that hiding this information is bad optics. This lack of transparency is and ultimately bad for the community, because the squeaky wheel gets the oil. Hiding how many people are hacked, allows some people in the community to dismiss pypi’s continued problems with distributing malicious code. This as a kind of “wolf in sheep’s clothing,” and I have seen very talented senior engineers get hacked because of a typo.
We provide download statistics for all packages, but inferring “how many people get hacked” from that is challenging, because “one download” doesn’t mean “one install” (PyPI has a large number of static mirrors) and “one install” doesn’t necessarily mean “one hack”.
I’d agree with @pf_moore (in the linked issue) that silently removing something isn’t a great way to go. Alert? By all means - though not sure what form that should take. Remove? Maybe… there’s rather too much of people doing things to other people’s computers for their own good. But removing silently is not only bad form - the owner of a computing device has a right to know what has been done to it - but also risks not being enough, a well crafted piece of malware, once activated, could well replicate itself into another form that won’t be eliminated by pip uninstalling the original package, so “silent” is really bad here.
Are you implying that PyPI has a moral obligation to retain a full list of every package you have downloaded? That sounds like a bit of a privacy issue to me.
OTOH if you just mean that there needs to be a list of “most recently removed malicious packages”, that’s something that seems a lot more reasonable. It isn’t going to notify you though - you would still be responsible for monitoring the list to see if there are any problems.
Do you mean the number of people who download those packages, or do you need them to be more thoroughly tracked in order to find out if any compromise happened? Again, this is definitely a level of tracking that I would not want to see happen.
To add to this, I agree that PyPI should provide an API to report what malicious packages have been removed, and installer tools (pip, pdm, poetry, etc.) should alert if those packages are present in a users environment, the API is likely to ultimately be more beneficial than the alert.
As, presumably, any well crafted malicious package once installed will remove the part of the code from the installer tool that alerts the user.
Whereas an API could be used for many different use cases, such as AV tools, firewalls, etc.
I will gently remind everyone that no matter “moral obligations”, improvements don’t get done if nobody works on them, and PyPI is running short-handed (plus, if you read https://blog.pypi.org, you can’t deny that they are already putting a lot of effort into security with the budget they have). The same holds for pip. The pip and Warehouse issues are open and awaiting implementations.
As a gentle reminder of context here, the only pip command that scans the whole user’s environment (every package that is present in the environment) is pip check, which can be run manually by the user to validate their environment.
While it would be possible to use a PyPI API like the one described to determine if there are malicious packages present, pip would likely implement that under pip check, so the user is still responsible for initiating the check. And of course, it’s not actually that hard to write a standalone checker that lists all the installed packages and checks that against a PyPI list - so a manual check (such as might be run as part of an audit) doesn’t need to be built into pip.
However, pip does run a pip check after completing an install, so having the scan included as part of pip check would offer a certain amount of additional security. But note that we don’t guarantee that we’ll always run pip check every install - it’s costly, and we do it to ensure that newly installed packages are compatible with installed ones, so if we ever find a way to achieve that compatibility without running a full pip check, we might choose to do so.
Hasn’t Pip, since the new resolver was added, run a check at the end of each install?
I’m pretty confident that if you install something that conflicts with your existing environment Pip tells you at the end of the install without the need to manually run pip check.
Sorry just got off a red eye flight when I wrote that. I clearly misunderstood the sentance “the only pip command that scans the whole user’s environment … is pip check”.