These questions do have good answers, but I feel like we are digressing here. These are best answered in a thread about PEP 458 / 480, not here.
I feel like VirusTotal could be useful here. Given that this is probably the most comprehensive collection of malware in existence, it’s probably worth uploading source files to run them again multiple AVs to detect potentially malicious content. This would still require manual review however, since VirusTotal only returns # of detections, not a definitive yes/no.
Also, the VirusTotal public API is free but rate-limited, which might be a deal-breaker.
On the topic of scope for this task, I think we want to distinguish it from these other goals (which are fine goals, just, I think they are out of scope):
- detecting security vulnerabilities in the Python being published (although there is some prior art on this, and services that will scan an open-source project for free)
- scoring the trustworthiness or risk factors of the Python being published (e.g., whether it carries an unacceptable amount of dependencies, makes ill-advised system configuration changes, etc.)
- any issues related to the hijacking of existing packages, and typosquatting existing package names
What I think is in scope related to the last point is discovering that a package author’s account has been hijacked because we detected something suspicious about a package that the attacker attempted to publish. There are other ways to prevent and to detect account compromise and there are great suggestions for those in this thread, but I think only content analysis should be in scope for this Q4 RFP.
Unfortunately, even when reduced to the goal of analyzing content, I think this is too big of a problem to be solved categorically. We advise a heuristic approach for defining malice and detecting it in Python packages. We’ll have to think about what are the most impactful analysis heuristics that can be implemented in the timeframe and budget of this effort.
There are a bunch of strategies to choose from, and they each have pros and cons: a statistical classifier approach, a pattern/signature approach, a scoring-and-threshold approach, etc. There is an unavoidable maintenance cost in running a system for adversarial detection, because the adversary will keep adapting to evade it. I think proposers should be asked to honestly estimate the maintenance burden of their proposed solution.
I conferred with @woodruffw about what other package managers have done for malicious content detection and the answer was that we were not aware of much. What we are aware of is that the Google and Apple app stores have both invested heavily in runtime analysis sandboxes and static analysis approaches for detecting malice in their app stores. The difference there being, they can run their detections in secret, and adversaries can’t develop an evasion in advance without disclosing it in a submission. So that’s another question for any proposed solution: Will it be effective even if the methods are open-source? Or does it require partial secrecy? Will an attacker be able to lab-test their evasions in secret or will they have to risk detection by submitting them to a live service?
Lastly, we think a good question for proposed solutions is regarding the cost of handling false positives. Who reviews the alerts and approves packages in the case of a false positive? Can you put the burden on the package author to explain why their package is benign, or instruct them what to change in order not to be regarded as malicious?
Thanks to everyone for participating in this discussion! The RFI period has closed, and replies in this category have been disabled.
Based on the feedback, we’ll be updating our scope before opening the Request for Proposals period next week along with a new discussion category.
If you’re interested in participating in the RFP sign up at https://forms.gle/redWdNhwMqzRG1jC8 to be notified when it launches.