When a malicious package is identified, I strongly believe we have a moral obligation to notify people that they have been hacked. I work in the industry, and cleaning up from a breach is never fun - but one thing is universal, the longer the attacker has access, the more expensive it is to clean up. Additionally, failing to notify people that they have been breached may even be against California and EU law.
Malware distributed by pypi needs to marked as malicious and notify users with pip that they have been hacked. One option here is that pip upgrade should throw an error, and there should be a way to remove the malware package from the system.
Without notifying people, the criminals that are profiting from pypi’s lack of security - will continue to do so because of the lack of evidence to bring them to justice. I would like to see pypi work closer with law enforcement to share details about these attacks, “if you see something say something.”
I’d also like to see the stats of how many people are hacked each month. I think that hiding this information is bad optics. This lack of transparency is and ultimately bad for the community, because the squeaky wheel gets the oil. Hiding how many people are hacked, allows some people in the community to dismiss pypi’s continued problems with distributing malicious code. This as a kind of “wolf in sheep’s clothing,” and I have seen very talented senior engineers get hacked because of a typo.
We provide download statistics for all packages, but inferring “how many people get hacked” from that is challenging, because “one download” doesn’t mean “one install” (PyPI has a large number of static mirrors) and “one install” doesn’t necessarily mean “one hack”.
I’d agree with @pf_moore (in the linked issue) that silently removing something isn’t a great way to go. Alert? By all means - though not sure what form that should take. Remove? Maybe… there’s rather too much of people doing things to other people’s computers for their own good. But removing silently is not only bad form - the owner of a computing device has a right to know what has been done to it - but also risks not being enough, a well crafted piece of malware, once activated, could well replicate itself into another form that won’t be eliminated by pip uninstalling the original package, so “silent” is really bad here.
Are you implying that PyPI has a moral obligation to retain a full list of every package you have downloaded? That sounds like a bit of a privacy issue to me.
OTOH if you just mean that there needs to be a list of “most recently removed malicious packages”, that’s something that seems a lot more reasonable. It isn’t going to notify you though - you would still be responsible for monitoring the list to see if there are any problems.
Do you mean the number of people who download those packages, or do you need them to be more thoroughly tracked in order to find out if any compromise happened? Again, this is definitely a level of tracking that I would not want to see happen.
To add to this, I agree that PyPI should provide an API to report what malicious packages have been removed, and installer tools (pip, pdm, poetry, etc.) should alert if those packages are present in a users environment, the API is likely to ultimately be more beneficial than the alert.
As, presumably, any well crafted malicious package once installed will remove the part of the code from the installer tool that alerts the user.
Whereas an API could be used for many different use cases, such as AV tools, firewalls, etc.
I will gently remind everyone that no matter “moral obligations”, improvements don’t get done if nobody works on them, and PyPI is running short-handed (plus, if you read https://blog.pypi.org, you can’t deny that they are already putting a lot of effort into security with the budget they have). The same holds for pip. The pip and Warehouse issues are open and awaiting implementations.
As a gentle reminder of context here, the only pip command that scans the whole user’s environment (every package that is present in the environment) is pip check, which can be run manually by the user to validate their environment.
While it would be possible to use a PyPI API like the one described to determine if there are malicious packages present, pip would likely implement that under pip check, so the user is still responsible for initiating the check. And of course, it’s not actually that hard to write a standalone checker that lists all the installed packages and checks that against a PyPI list - so a manual check (such as might be run as part of an audit) doesn’t need to be built into pip.
However, pip does run a pip check after completing an install, so having the scan included as part of pip check would offer a certain amount of additional security. But note that we don’t guarantee that we’ll always run pip check every install - it’s costly, and we do it to ensure that newly installed packages are compatible with installed ones, so if we ever find a way to achieve that compatibility without running a full pip check, we might choose to do so.
Hasn’t Pip, since the new resolver was added, run a check at the end of each install?
I’m pretty confident that if you install something that conflicts with your existing environment Pip tells you at the end of the install without the need to manually run pip check.
Sorry just got off a red eye flight when I wrote that. I clearly misunderstood the sentance “the only pip command that scans the whole user’s environment … is pip check”.
What I fail to understand is why things like MAT (Malware Analysis Tool), or antivirus scanners aren’t run against code placed in a quarantine environment before being publicly available to download. Why isn’t there a policy in place for all installlers with package dependencies… be it pip, npm, nuget etc?
What I fail to understand is why things like MAT (Malware Analysis
Tool), or antivirus scanners aren’t run against code placed in a
quarantine environment before being publicly available to
download. Why isn’t there a policy in place for all installlers
with package dependencies… be it pip, npm, nuget etc?
I suppose a solution like that might help increase the variety of
new malware, but that’s about it. Any determined attacker targeting
distribution on PyPI could write their malware and locally test it
against MAT until they have it in a state that the tool currently
doesn’t detect as suspect, then upload it. It raises the bar, but
probably not by as much as you’d hope.
Unskilled attackers will get their naive attempts to upload known
malware blocked, sure, but is the volunteer interest or funding
priority there to build and maintain such a system, and is it going
to have enough of an impact to outweigh that investment?
Security decisions pretty much always come down to a cost/benefit
analysis.
I was thinking there are already websites out there such as virustotal - that test code against a slew of malware detection suites at once. All I am saying is isolating something prior to installation is more preventative. I understand the complexities of this, as say you took something like Flask and installed it - the issue may be in one specific version. But knowing those and being able to blacklist them seems a step in the right direction.
There are a number of groups who already download and scan new packages as they are published, and report findings back to PyPI. Anything that’s detected shouldn’t last long. As bonuses, it’s cheap for PyPI to run, and doesn’t impinge on the open-source-ness of the public index.