Using SourceRank score to warn or limit packages

Feedback…
I have been thinking about solutions to the malicious package problem. I am wondering if there are option in the pip configuration to define checks.
A simple few simple examples.

  • Warn if a package with a SourceRank score less than some value is being installed.
  • Warn if the package age is less than 7days
  • Raise an error if the SourceRank < 2 and package is 1day old

My bigger idea would be to have a feature to define external (web API) for this configuration that might be defined by a company policy or security screening provider.

I’ve been thinking about a similar solution that restricts what packages can be downloaded internally requiring a package to be at least 90 days old and should have some minimum level of popularity, i.e. some number of downloads per month.

The way I was going to implement it though is have something like a PyPi proxy that all users in the company must use to access PyPi. This bypasses needing to add extra complexity to Pip and makes it agnostic to any other tool a user might need to access (e.g. if users want to use Poetry or whatever).

Would be good to see an open source tools that does this but I’ve just started thinking about it so maybe it already exists?

1 Like

I wonder what the easiest/best way to implement this would be.

For a company a proxy might be ok but for many, users maybe not.

Adding this to something to project.toml in a project repo or Python .env.

I was thinking of making ‘saferpip’

If “we” design a configuration and apis to verify the requirements, then it could be used with a proxy or local setup.

Thoughts?

I believe there are commercial offerings for curated / vetted package repositories. In open source context, I guess it is mostly bots that monitor source code repositories, check for the usual places that list dependencies (pyproject.toml, requirements.txt, and so on), and post messages to warn about potential issues (Dependabot, Snyk, and so on). There is Assured Open Source Software, I am not sure exactly where it fits, what it does.

At the installer level (like pip, as you asked), I think I recall reading about some alternative installer that would kind of do some checks somehow before installting, I can not seem to find it anymore and I do not recall the details, maybe it was qip-installer.

I think maybe it is worth looking into the following as well:

You might be remembering the discussion about having pip-audit
inline with package installation in order to block downloads of
package versions with known security vulnerabilities:

1 Like

This is essentially what we use at work. None of it is open source (there’s a desire to see that happen, but it’s caught between the people who want to just get the thing working properly first and then worry about the public side of things :wink: )

Beware of strategies that are hard for legit people but easy for bad actors. These suggestions would create highly difficult barriers to entry for small packages, yet for the malicious packages, a “downloads per month” statistic can easily be achieved by just spamming fetches on a bunch of cheap rented servers.

A minimum age MIGHT be of value, but only if you can be 100% confident that any maliicous package would be discovered and removed within that time period. And while I have great admiration for the PyPI admins and the cleanup work they do, there’s always the possibility that someone finds a way to slip something past them. Plus, it could be dodged by uploading a plausible package first, waiting out the time, and then uploading a new version that has all the malicious payload.

Before searching for solutions, it’s best to pin down exactly what problem you’re trying to solve. Is it typo squatting? Is it malware in general? (A hard problem to solve and probably needs to be broken into subproblems.) Is it dead packages getting compromised and replaced with completely different code? Solutions are different.

1 Like

I would argue it provides value if you believe that some percentage of malicious packages can be discovered within some reasonable length of time. As the likelihood of critically requiring a completely new package is very small in a business context, and if critical those can be put on an allow list.

Sure, but this increases the cost to any bad actor, not only do they now have to evade any techniques being used to find malicious packages on PyPi they must spend money downloading those packages which increases their visibility on PyPi download statistics.

If all bad actors are forced to do this it creates a new signal for potential malicious packages, ones that suddenly exploded in downloads.

And if required packages aren’t very popular it seems reasonable to put them under extra scrutiny before allow listing them.

Correct me if I’m wrong here but I believe 1 and 3 are just strategies of thinking about how bad actors might spread 2?

I think they’re all valuable questions but ultimately there’s a very limited set of information which you can actually verify to be true about a package without inspecting the package itself.

Edit: Apologies I clicked Post instead of Quote so I had to edit this post quite significantly.

Chris, I agree with everything you said.
The problem to solve, ideas.

  1. How do I define a set of rules I want to follow to screen packages perforce install or update.
  2. How can a project define a set in f rules.
  3. How can a copy define rules. I know the option for a proxy or other service exist, but I don’t think there are a set of rules.
  4. Can we call an external Api to “approve” a package. Possible a commercial service.

The software can warn or reject an install.
Maybe we could sync with PIPy :slight_smile: our settings or frequently install packages.

There are many possible heuristics that could be used. The idea here is that each person, company or project could have their own set of rules.

I might be possible to also experiment with the package maintainer signing the software and linking to some identity. For example GitHub account. If you can connect a package to a person, this cuts down of intentionally malicious project.

Maybe over the weekend I’ll try to sketch out some details.

It’s worth noting, also, that you can easily create your own whitelist of packages. The simplest way to do this is to only ever pip install -r requirements.txt and then be very careful with editing that file (which for many people is a lot rarer than deploying a new test/dev instance of something). So all of these considerations should be in the light of that.

Not everyone will do that, of course, and a lot of people are going to be typoing pip commands and getting the wrong package; but unless these sorts of barriers-to-entry are made global (which would be HIGHLY punishing to new package authors, and a terrible blight on the Python community), they’re only going to apply to the paranoid anyway. So it’s a question of what strategies provide the best protection with the least unwanted problems.

1 Like

It would be interesting to have some stats. If there were a set of rules that would catch xx% of malicious or wrongly typed and had a false positive rate of yy% when typing pip install AwesomePackage and just warned the user and ask for confirmation. what would the xx and yy be that this would be a supported default?
75% , 10% ?
I would be happy to see a warning asking me to confirm.

I don’t know about xx, but yy would have to be extremely low. And that’s nearly impossible. If you could know for sure which packages are a problem, you’d just remove them; and if you block things you aren’t sure of, you’re blocking a huge number of legit packages.

Remember that someone always has to be the first to download a brand new package.

Here is a project called pipctl which uses OSV’s vulnerabilities to constrain which packages can be installed. The main aim of the tool is to control the resolution process, hence the name. It was discussed here, but pip maintainers were not open to having such functionality. I can imagine the tool being extended and provide SourceRank or other information to resolve application dependencies (such as whether the given package is signed, quality aspects, …).

I think you talk about Thoth - see this tutorial.

Just thinking out loud here, but is there a “typical” profile for package downloads? Suppose you have a nice new legit package. I suspect it would mostly attract the attention of web crawlers to start with. After that initial flurry, things would settle down unless and until it gains a following, or becomes a key dependency for some other PyPI package (think of many flask-whatever packages). I don’t think the download profile of a malicious package would look like that. There would likely be the initial flurry of crawler hits, but the rented server download profile would look different than the typical package, and any sort of package dependency graph would either show it as isolated, or only connected to other (potentially low quality/harmful) packages.

Sounds like a job for AI… :wink: In any case, packages flagged by such a tool could then be handed over to humans for further analysis.

1 Like

There are plenty of people who download and detonate/scan/monitor new packages as they are published, so most obvious malware gets detected directly within a day.

An index proxy that delays new releases by 1-7 days is going to prevent you getting most of the bad stuff. If you add in a manual approval step for packages you’ve never seen before, you’ll deal with almost all of it.

All that’s left at that point is compromised credentials leading to a hijack of an existing package that goes undetected, which is exceedingly rare and almost impossible to do anything about on the consumer side (besides pinning to known good versions). The delay is still the best option for protecting yourself against these.

1 Like

I think this is the main point - once there’s a good starting point created by any form of automated analysis, handing the results over to humans for review is an important next step. After all, I’m assuming this idea is targeted at the sorts of user who don’t already review the packages they use in detail.

As @steve.dower pointed out in the thread linked above by @fridex, there’s a business opportunity[1] for someone who wants to set up such a service filtering PyPI. And if I understand it right, that’s sort of what Thoth (also linked by @fridex, one of the maintainers) does.

But I don’t think any such solution (especially a purely-automated one without any human review) is going to be accurate enough, or sufficiently universal, to warrant being an automatic feature of the base tools. It’s something that should be built on top of them.


  1. Or maybe open source fame and fortune :wink: ↩︎

3 Likes

Stepping in here to discuss the system we (Mantis) have crafted, as I believe it more or less satisfies the requirements and constraints you’re describing (and more importantly, we’re steadily stepping towards an open source codebase).

We assertively scan all new uploads to PyPI, and use an organizationally developed crib of yara rules to detect a variety of different malicious behaviors within packages that may be found here. In our experience thus far, false positives create an environment where such a system, if made official in its entirety without individual review prior to publishing, could sufficiently degrade the trust and integrity in numerous packages.

My team and I are in fairly constant communication with other third party package security providers, and false positives constantly present a continually evolving issue that, frankly, I’m not sure we can expect to get a sufficient hold on without requiring manual review in all circumstances.

From our perspective, I think tools that support third party solutions to this (REF: Malware detection and reporting infrastructure to support 3rd party reports · Issue #12612 · pypi/warehouse · GitHub) are the most practical solution to implement a scoring or warning system, whereby we can distribute verdicting/scoring of packages across multiple services to prevent the likelihood of false positives, while also distributing the tremendous workload of manually parsing packages before making any sort of verdict.

We tweaked our yara to pass on a ‘weighting’ parameter, which is evaluated based on the likelihood that a behavior is unique to malware. I would propose that this scoring and warning system might present a significantly more effective warning system with the caveat that yara itself is just regex at the end of the day, and suffers from many of the same syntactical drawbacks.

However, that being said, there are certain packages that flag on our service that are undoubtedly malicious based purely on the number of behaviors they exhibit, and I believe that at least on a base level, that level of functionality can be implemented with little to no impact on legitimate developers.

TL;DR: I can’t imagine we ever get away from human review, but Issue#12612 kind of encompasses how we could effectively craft a ‘warning’ for potentially malicious packages by “crowdsourcing” security intelligence.

2 Likes