Proposing a community maintained database of PyPI package vulnerabilities

oliverchang · April 23, 2021, 6:17pm

Hi!

I’m from Google and my team has been working on some efforts to improve vulnerability management for open source packages.

In particular we’ve started to build a database of vulnerabilities that affect PyPI packages. CVEs are notoriously difficult to match to open source packages and versions, so our goal is to define a standardized shared vulnerability interchange format with precise version/naming that makes them much easier to consume.

An example vulnerability entry would look something like this (more examples here)

id: PYSEC-UNDECIDED-2021-0001
package:
  name: httplib2
  ecosystem: PyPI
summary: Vulnerability in httplib2
details: httplib2 is a comprehensive HTTP client library for Python. In httplib2 before
  version 0.19.0, a malicious server which responds with long series of "\xa0" characters
  in the "www-authenticate" header may cause Denial of Service (CPU burn while parsing
  header) of the httplib2 client accessing said server. This is fixed in version 0.19.0
  which contains a new implementation of auth headers parsing using the pyparsing
  library.
severity: HIGH
affects:
  ranges:
  - type: GIT
    repo: https://github.com/httplib2/httplib2
    fixed: bd9ee252c8f099608019709e22c0d705e98d26bc
  - type: ECOSYSTEM
    fixed: 0.19.0
references:
- https://github.com/httplib2/httplib2/security/advisories/GHSA-93xj-8mrv-444m
aliases:
- CVE-2021-21240
modified: "2021-02-12T14:56:00Z"
published: "2021-02-08T20:15:00Z"

We’ve built out a proof of concept for a workflow that automates most of the work necessary to generate these entries from existing CVE feeds. Once this gets going it should result in very minimal ongoing human maintenance work, and we are happy to contribute time to bootstrap this.

Would you be open to having this database live under Python Software Foundation · GitHub as a community owned database of vulnerabilities?

Our ultimate wish is to see this community database flow into PyPI’s API/UI and eventually the pip command so users can tell if their dependencies are vulnerable. We’ve already started engaging with the PyPI team on this.

Thanks,

Oliver

brettcannon · April 23, 2021, 6:45pm

Moved to the PSF category as the PyPA doesn’t control the “psf” org on GitHub.

nhatkhai · April 25, 2021, 1:15am

I like this idea. My simple mind take it as whitelist server that pip install could support so by default it only install package that in the list. But user should also able to disable or use difference server of their own. So event other company could have their own whitelist server and tell pip to use it instead of default when needed

This also always difference level of security could the introduced

nhatkhai · April 25, 2021, 1:17am

It would also make legacy eggs arbitrary code execution during pip install or import a no longer a problem in term of security

ammaraskar · April 25, 2021, 1:44am

Just for inspiration/precedent, the Rust ecosystem has a similar system for packages on crates.io: Advisories › RustSec Advisory Database and GitHub - RustSec/cargo-audit: Audit Cargo.lock files for crates with security vulnerabilities acting as an auditing tool for dependencies.

oliverchang · April 26, 2021, 1:43am

Indeed!

Ruby also has a community maintained database of vulnerabilities: GitHub - rubysec/ruby-advisory-db: A database of vulnerable Ruby Gems

And there is a Go proposal/prototype to do the same: vulndb - Git at Google

tiran · April 26, 2021, 3:42pm

Thanks Oliver! I like the general idea. I’ll try to find some time to read your proposal and give feed after I have dealt with 3.10 feature freeze frenzy.

@brettcannon I think @oliverchang’s proposal belongs in the packaging category to draw in more attention from packaging folks.

There is already a commercial security database for Python packages at https://pyup.io/. They release their database once a month at GitHub - pyupio/safety-db: A curated database of insecure Python packages

brettcannon · April 26, 2021, 10:10pm

@tiran depends if this is more for the project on GitHub or coming to a general conclusion. I read:

as the key question. If it’s more for the idea then I can move it back.

oliverchang · April 27, 2021, 1:13am

@brettcannon Yes that’s the key question from me

For more general packaging/warehouse discussion, I opened an issue at github/pypa/warehouse/issues/9407 (I’m not able to post the full link), or should there be another discussion topic on here for that?

uranusjr · April 27, 2021, 2:57am

Just a note, the psf and pypa orgs are managed by different people and have different criteria to join. If joining psf does not end up making sense (I don’t personally have an opinion but don’t have a say anyway), it may make sense for you to join pypa instead if you are inclined to. See also: PyPA Members, And How To Join — PyPA documentation

Blackward · May 1, 2021, 2:41pm

Hi Oliver,

I love the idea of such a database! I hope either the PSF or the PyPI guys will love it too…

Thumbs up!

Cheers, Dominik

dustin · May 12, 2021, 7:41pm

I talked to @EWDurbin about this offline and we both agreed that this probably would be more appropriate under the pypa org instead.

Can a mod move this back to the Packaging category so we can give this a little more time for discussion with the members there?

Assuming this doesn’t get any significant opposition there, I’ll probably kick that off sometime next week.

brettcannon · May 13, 2021, 10:16pm

Moved over to Packaging

dustin · May 20, 2021, 4:27pm

Here’s an example of the repo @oliverchang is proposing to create: GitHub - oliverchang/python-vuln-examples

Here’s a sample of what an advisory would look like: python-vuln-examples/PYSEC-2021-63.yaml at main · oliverchang/python-vuln-examples · GitHub

I’ll be starting the pypa-committers vote to create this as a new repo, pypa/advisory-db on Monday!

Blackward · May 20, 2021, 6:07pm

Hi Dustin,

cool! I in particular like " The goal is to have the pip install (and an additional pip audit ) command automatically report vulnerabilities out of the box." - that would be supercool

Cheers, Dominik

nhatkhai · May 21, 2021, 2:02am

And flexible for some don’t care about it at some point in time too.

dustin · May 24, 2021, 4:28pm

Per PEP 609 I’ve started the PyPA vote to create https://github.com/pypa/advisory-db (borrowing the same name as https://github.com/RustSec/advisory-db)

ehashman · May 24, 2021, 10:43pm

The advisory repo looks like it’s being kept up to date by osv-robot · GitHub which as far as I can tell looks like a Google-run bot. Is the source for that bot available somewhere? Will a non-Googler be able to continue to maintain this repo if Google chooses to stop sponsoring the work?

alex_Gaynor · May 24, 2021, 10:59pm

python friends. I help maintain the advisory-db for RustSec, and also contributed a smidge to the OSV spec. If there’s any stuff I can share around our experience running a vulnerability DB, please just ask!

dustin · May 24, 2021, 11:02pm

@ehashman Great question. The source for the robot (and the rest of the OSV tooling) is here: https://github.com/google/osv (the robot specifically is here: https://github.com/google/osv/blob/master/docker/worker/worker.py)

The goal of the repo is that while it can be backfilled/bootstrapped by this robot, individuals can also submit advisories directly to it as well.

The project is in its infancy but Google is making an effort to ensure it is not entirely Google-driven, e.g. there is a proposal that helped define the vulnerability format that had many community contributors: https://tinyurl.com/vuln-json