Proposing a community maintained database of PyPI package vulnerabilities

Hi!

I’m from Google and my team has been working on some efforts to improve vulnerability management for open source packages.

In particular we’ve started to build a database of vulnerabilities that affect PyPI packages. CVEs are notoriously difficult to match to open source packages and versions, so our goal is to define a standardized shared vulnerability interchange format with precise version/naming that makes them much easier to consume.

An example vulnerability entry would look something like this (more examples here)

id: PYSEC-UNDECIDED-2021-0001
package:
  name: httplib2
  ecosystem: PyPI
summary: Vulnerability in httplib2
details: httplib2 is a comprehensive HTTP client library for Python. In httplib2 before
  version 0.19.0, a malicious server which responds with long series of "\xa0" characters
  in the "www-authenticate" header may cause Denial of Service (CPU burn while parsing
  header) of the httplib2 client accessing said server. This is fixed in version 0.19.0
  which contains a new implementation of auth headers parsing using the pyparsing
  library.
severity: HIGH
affects:
  ranges:
  - type: GIT
    repo: https://github.com/httplib2/httplib2
    fixed: bd9ee252c8f099608019709e22c0d705e98d26bc
  - type: ECOSYSTEM
    fixed: 0.19.0
references:
- https://github.com/httplib2/httplib2/security/advisories/GHSA-93xj-8mrv-444m
aliases:
- CVE-2021-21240
modified: "2021-02-12T14:56:00Z"
published: "2021-02-08T20:15:00Z"

We’ve built out a proof of concept for a workflow that automates most of the work necessary to generate these entries from existing CVE feeds. Once this gets going it should result in very minimal ongoing human maintenance work, and we are happy to contribute time to bootstrap this.

Would you be open to having this database live under Python Software Foundation · GitHub as a community owned database of vulnerabilities?

Our ultimate wish is to see this community database flow into PyPI’s API/UI and eventually the pip command so users can tell if their dependencies are vulnerable. We’ve already started engaging with the PyPI team on this.

Thanks,

Oliver

13 Likes

Moved to the PSF category as the PyPA doesn’t control the “psf” org on GitHub.

1 Like

I like this idea. My simple mind take it as whitelist server that pip install could support so by default it only install package that in the list. But user should also able to disable or use difference server of their own. So event other company could have their own whitelist server and tell pip to use it instead of default when needed

This also always difference level of security could the introduced

1 Like

It would also make legacy eggs arbitrary code execution during pip install or import a no longer a problem in term of security

1 Like

Just for inspiration/precedent, the Rust ecosystem has a similar system for packages on crates.io: Advisories › RustSec Advisory Database and GitHub - RustSec/cargo-audit: Audit Cargo.lock files for crates with security vulnerabilities acting as an auditing tool for dependencies.

2 Likes

Indeed!

Ruby also has a community maintained database of vulnerabilities: GitHub - rubysec/ruby-advisory-db: A database of vulnerable Ruby Gems

And there is a Go proposal/prototype to do the same: vulndb - Git at Google

1 Like

Thanks Oliver! I like the general idea. I’ll try to find some time to read your proposal and give feed after I have dealt with 3.10 feature freeze frenzy.

@brettcannon I think @oliverchang’s proposal belongs in the packaging category to draw in more attention from packaging folks.

There is already a commercial security database for Python packages at https://pyup.io/. They release their database once a month at GitHub - pyupio/safety-db: A curated database of insecure Python packages

1 Like

@tiran depends if this is more for the project on GitHub or coming to a general conclusion. I read:

as the key question. If it’s more for the idea then I can move it back.

1 Like

@brettcannon Yes that’s the key question from me :slight_smile:

For more general packaging/warehouse discussion, I opened an issue at github/pypa/warehouse/issues/9407 (I’m not able to post the full link), or should there be another discussion topic on here for that?

Just a note, the psf and pypa orgs are managed by different people and have different criteria to join. If joining psf does not end up making sense (I don’t personally have an opinion but don’t have a say anyway), it may make sense for you to join pypa instead if you are inclined to. See also: PyPA Members, And How To Join — PyPA documentation

3 Likes

Hi Oliver,

I love the idea of such a database! I hope either the PSF or the PyPI guys will love it too…

Thumbs up!

Cheers, Dominik

2 Likes

I talked to @EWDurbin about this offline and we both agreed that this probably would be more appropriate under the pypa org instead.

Can a mod move this back to the Packaging category so we can give this a little more time for discussion with the members there?

Assuming this doesn’t get any significant opposition there, I’ll probably kick that off sometime next week.

1 Like

Moved over to Packaging

3 Likes

Here’s an example of the repo @oliverchang is proposing to create: GitHub - oliverchang/python-vuln-examples

Here’s a sample of what an advisory would look like: python-vuln-examples/PYSEC-2021-63.yaml at main · oliverchang/python-vuln-examples · GitHub

I’ll be starting the pypa-committers vote to create this as a new repo, pypa/advisory-db on Monday!

1 Like

Hi Dustin,

cool! I in particular like " The goal is to have the pip install (and an additional pip audit ) command automatically report vulnerabilities out of the box." - that would be supercool :slight_smile: :crazy_face:

Cheers, Dominik

1 Like

And flexible for some don’t care about it at some point in time too.

1 Like

Per PEP 609 I’ve started the PyPA vote to create https://github.com/pypa/advisory-db (borrowing the same name as https://github.com/RustSec/advisory-db)

1 Like

The advisory repo looks like it’s being kept up to date by osv-robot · GitHub which as far as I can tell looks like a Google-run bot. Is the source for that bot available somewhere? Will a non-Googler be able to continue to maintain this repo if Google chooses to stop sponsoring the work?

6 Likes

:wave: python friends. I help maintain the advisory-db for RustSec, and also contributed a smidge to the OSV spec. If there’s any stuff I can share around our experience running a vulnerability DB, please just ask!

6 Likes

@ehashman Great question. The source for the robot (and the rest of the OSV tooling) is here: https://github.com/google/osv (the robot specifically is here: https://github.com/google/osv/blob/master/docker/worker/worker.py)

The goal of the repo is that while it can be backfilled/bootstrapped by this robot, individuals can also submit advisories directly to it as well.

The project is in its infancy but Google is making an effort to ensure it is not entirely Google-driven, e.g. there is a proposal that helped define the vulnerability format that had many community contributors: https://tinyurl.com/vuln-json

3 Likes