Towards a `pip audit` subcommand for vulnerability analysis & management

dustin · July 25, 2022, 8:53pm

Background
As part of my day job on Google’s open-source security team, I’ve been working for the past year with a team from Trail of Bits to author, implement and maintain https://pypi.org/p/pip-audit/, a standalone tool for scanning Python environments for packages with known vulnerabilities. This is the first such tool to integrate directly with PyPI for vulnerability data and be intended to be an entirely community-owned and operated project.

We’re at the point now where pip-audit is more or less feature-complete & bug-free, and there’s a potential path towards elevating the pip-audit project as a subcommand of pip itself. I’ve outlined a rough roadmap for this path here. We (the pip-audit maintainers) would like to welcome discussion on the roadmap as well as get buy-in from the wider community on the overall plan.

Our goal
Our ultimate goal is to make a) useful vulnerability tooling that is b) free to use, c) community owned and operated and and d) canonical and easily available to every Python user. We’ve already achieved a) and b) and to some extent c) (the project is open-source, and we plan to request a transfer to the PyPA org) but we think the most effective way to achieve d) is by making pip-audit a subcommand of pip itself, due to pip’s wide user base.

Integration
Given that our goal has always been to become a subcommand of pip, we’ve taken a lot of care to design pip-audit in a way should make integration as easy as possible:

We’ve designed pip-audit’s CLI in a way that is in line with pip’s existing CLI and UX and with commands that should feel natural to the average pip user: commands for auditing environments or requirements files with pip-audit should feel very similar to existing commands for installing with pip
We’ve taken great care in taking on additional sub-dependencies that would unnecessarily increase pip’s footprint, and aligning with pip’s existing sub-dependencies where possible
We’ve designed pip-audit itself in a way that should make it possible to mount it as a vendored subcommand of pip, rather than re-implementing it from scratch. This means that it should be a low maintenance burden on the pip maintainers, as ongoing development & maintenance could continue to happen in the pip-audit repository by pip-audit maintainers (with pip maintainers getting the final say in what makes it into pip itself, of course).

Our ask
The point of this discussion is to share this potential plan early on with the community and get feedback, specifically on the following questions:

Do you think support for known vulnerability auditing something that would be valuable to have in pip?
Does the feature set of pip-audit align with what you’d find valuable?
Is the proposed roadmap the best path forward?

Myself and @woodruffw are also happy to answer any questions that aren’t covered by the pip-audit docs about how pip-audit works or about the goals/plan here.

Thanks in advance!

bernatgabor · July 26, 2022, 7:25pm

I personally feel that pip does one thing: discover, download and install packages. IMHO the audit feature sounds very useful but orthogonal to its goal. I don’t see why it would need to become part of the pip. I’d prefer to live alongside it, perhaps as package-audit package?

That way who wants an audit can install this new package and can run it, but don’t need to live alongside it and burden the pip package with even more code. Is this package at all pip related? Wouldn’t it work under any python installation (independent if was installed by apt, yum and pip, installer, poetry, conda or any other package manager)?

pradyunsg · July 26, 2022, 7:39pm

I think so!

I’m not the right person to ask – but yes, I think this looks like the correct set of features.

TBH, I’m not too keen on this development model. I’d much rather that pip-audit’s code move into pip as pip audit, and then evolve like any other pip command beyond that point.

FWIW, I’d be happy to trust one or more of pip-audit’s maintainers with the commit bit on pip, to help with continued maintainance of this (and other pieces, if they have the bandwidth/interest) of pip.

fungi · July 26, 2022, 7:53pm

Looking at the trailofbits/pip-audit and pypa/advisory-database
documentation, it seems like the end goal is to have security
advisory information served from PyPI and reported to end users at
the time they’re trying to install things, but to also be able to
later re-audit those same environments for newly discovered
vulnerabilities. Having two separate codebases do the reporting
(one at install time, one after installation) could result in a lot
of code duplication or even more vendored libs in pip.

bernatgabor · July 26, 2022, 8:10pm

Guess this is the part I do not agree with then. If I ask an installer to install a package, I’d not like to get a security audit, just install it. I’ll handle security auditing in parallel when I need it. I’d rather not make pip even slower if that’s possible.

dstufft · July 26, 2022, 8:32pm

I think this makes sense as a pip sub command, and as part of the installation process (though I also agree with @bernatgabor that care must be taken to avoid paying steep performance penalties for it.

I also agree with @pradyunsg that I think the best path forward would be directly integrating it as part of pip’s code base, rather than a weird, embedded thing. Though if we want this to be available to more than just pip, maybe the right approach is a library for the “core” parts, then a pip command that uses that library ^[1]. As part of that I think using pip-api seems wrong as once it’s part of pip, it can just use pip’s internal APIs directly.

Adding to the repository API seems like a reasonable approach, devil is in the details for exactly how much information we want to put in there, but that’s a discussion for that PEP.

I have not looked at the code at all to know if this makes sense or not. ↩︎

pf_moore · July 26, 2022, 9:50pm

Just a minor point first - I’m not sure where the idea that there would be an audit when things get installed came from - that’s not mentioned as far as I can see in the original post. But I will say that I’d be strongly against it if anyone were to propose it (for basically the reasons @bernatgabor states).

Can I throw this back at you? What are your reasons for thinking this should be added to pip rather than living as a separate application? You seem to be asking whether we (the packaging community and the pip maintainers) think it would be worth adding, but on the assumption that unless we have reservations, it would be added. I’d frame it in the exact opposite direction - pip is large and complicated enough already, it’s up to you, if you’re suggesting we add this to pip, to explain why we should be willing to consider it.

As a separate pip audit subcommand, I’m not convinced it fits within the scope of pip. I see pip more as “managing packages on my machine” and this is more about “querying information from a package index”. It’s a fine distinction, and one that’s far from hard and fast. But if we do consider it as in scope, then I’d want to follow our usual principles, of using standardised interfaces and avoiding implementation-specific approaches. As I understand it, pip-audit uses the PyPI JSON API to get vulnerability information, so I think we should look very seriously at standardising the JSON API^[1] before we integrate this into pip. That way, other indexes have a clear target to work to (for example, I could imagine devpi wanting to mirror vulnerability data as well as packages from PyPI).

I definitely agree with @pradyunsg that if it’s to be part of pip, it should be added to pip’s codebase and maintained within pip - trying to make it work as both standalone and part of pip will be suboptimal for both. Let’s choose one and stick with it. Apart from anything else, I don’t think we want to be in a position where pip-audit gets fixes added and released and they take up to 3 months to get into pip (which is what vendoring could result in) - that just gives a bad message about using the pip subcommand rather than the standalone version.

One other thing I think we should be careful of here, is whether the pip team has the resources to deal with security-related issues. I am no security expert (which demonstrates my point ) but if pip started flagging a project as having a vulnerability and we got an angry project developer asking for urgent help “because it’s a security issue”, which we couldn’t give, we could very easily end up in a bad situation where “pip developers cannot respond to security problems” ends up all over twitter… I, for one, don’t want to be put in a position of having to deal with something like that. And yes, I know that in the scenario I describe, “speak to the owner of the project being flagged, or to PyPI if you think the alert is wrong” should be a reasonable and sufficient response. But people aren’t like that, sadly.

I haven’t looked very much at pip-audit, but one concern I’d have here is people thinking that “no warnings from pip audit” equates to some form of approval or assurance from pip that things are OK. On the other hand, if we bury the information in disclaimers, it’s going to be a pain to use. It’s hard to tell from the pip-audit issue tracker, but how much “real world” use has this seen, and what sorts of end user issues get raised? I’d like a better feel of the expected support burden that this is going to add to pip, and if that means collecting data from pip-audit as a standalone project for a while, I’m fine with that.

Maybe it could go into the (JSON) repository API, rather than the (not standardised) PyPI JSON API, but that’s a minor detail. Either way, it still needs to be standardised. ↩︎

dustin · July 26, 2022, 10:28pm

I think finding a way to do this this is fine with me as long as we can figure out what should happen to the features that I’m not expecting pip to want to adopt, like support for non-PEP 691 APIs, but I think these are minor enough that the option to just drop support for them is on the table.

I’d suggest thinking about pip audit as tooling to determine what to install and whether to install it.

This is the current status quo.

Yes, this project essentially has a direct dependency on pip.

I’d say that running an audit at install time is explicitly not a goal for pip audit: I think via npm’s attempt to do the same thing and the reaction to that feature, we can conclude that that’s not a good idea (npm audit: Broken by Design — overreacted).

I think this makes sense as well!

This is still in draft but essentially ‘use pip’s internal APIs directly’ is exactly what pip-api would do if it was vendored: Support being vendored by di · Pull Request #138 · di/pip-api · GitHub

I kind of addressed this in OP, but the main reason is that we want to make this “canonical and easily available to every Python user”.

(A minor reason is that this depends heavily on pip, for now we can sustain that maintenance burden but I think long term it would be much easier to maintain if it were part of pip itself in some way, similar to other tools like pip-compile that depend on pip.)

Paul Moore:

As a separate pip audit subcommand, I’m not convinced it fits within the scope of pip. I see pip more as “managing packages on my machine” and this is more about “querying information from a package index”. It’s a fine distinction, and one that’s far from hard and fast. But if we do consider it as in scope, then I’d want to follow our usual principles, of using standardised interfaces and avoiding implementation-specific approaches. As I understand it, pip-audit uses the PyPI JSON API to get vulnerability information, so I think we should look very seriously at standardising the JSON API[1] before we integrate this into pip. That way, other indexes have a clear target to work to (for example, I could imagine devpi wanting to mirror vulnerability data as well as packages from PyPI).

I think this is addressed in the roadmap, we’re not expecting pip to take a dependency on non-standardized APIs.

This is a really good point that I failed to mention – thanks for raising it.

Paul Moore:

One other thing I think we should be careful of here, is whether the pip team has the resources to deal with security-related issues. I am no security expert (which demonstrates my point ) but if pip started flagging a project as having a vulnerability and we got an angry project developer asking for urgent help “because it’s a security issue”, which we couldn’t give, we could very easily end up in a bad situation where “pip developers cannot respond to security problems” ends up all over twitter… I, for one, don’t want to be put in a position of having to deal with something like that. And yes, I know that in the scenario I describe, “speak to the owner of the project being flagged, or to PyPI if you think the alert is wrong” should be a reasonable and sufficient response. But people aren’t like that, sadly.

Based on our experience with the project over the last year, this seems unlikely: given that pip-audit only reports on known vulnerabilities that have previously been disclosed to the project maintainers, and that lots of other tools and technologies will report on the same vulnerabilities (in the same format), I think it would definitely be unreasonable to expect pip developers themselves to be responsible here in any way.

That said, users will do just about anything if you let them so it probably will happen. The pip-audit maintainers can definitely help with this along with related maintenance, and if there’s other resources that would make this better I feel confident we can find a way to make them available.

We address this somewhat in our security model. I think the biggest risk here is that users confuse ‘audit for known vulnerabilities’ with ‘do some static analysis’, and think that pip audit will protect them against unknown vulnerabilities in novel code.

Adoption is definitely much lower than I would imagine we would get if this was a pip subcommand instead, but so far we’ve found that being very clear that this is about known dependencies and doing a little education about the security model has resulted in users having a pretty good understanding of what pip-audit does and how it can protect them.

So far the most common end-user issue over the last year is that they don’t want to remediate known vulnerabilities (for a multitude of reasons) and want some means to essentially ignore the results from an audit. But since this is effectively the same as never running an audit in the first place… we’ve pushed back on this. Otherwise, the support burden has been fairly low.

pf_moore · July 26, 2022, 11:27pm

Thanks. To be honest, I don’t really have much sympathy for the “we want it to be available to all Python users, so bung it in pip” argument. That’s very much a personal view, so the other pip maintainers may disagree, but I think pip is already overloaded with functionality and we should be streamlining, not adding more.

In particular, the whole point of pip is to make using other packages seamless and straightforward. What about pip install pip-audit is too much to expect people to do? That’s a genuine question, I suspect I have an idea of what you’ll say but I’d like to be explicit - my suspicion is that “putting it in pip” is an attempt to apply a technical solution (“you don’t have to install it”) to a social problem (people don’t want to bother with audits unless they are told to, and will grab at excuses like “it’s not installed” if they can). If there is a technical reason why pip install pip-audit is a problem, maybe we should solve that problem for the general case, rather than avoiding it just for this one package.

A tool like black seems to have managed to become ubiquitous without being a pip subcommand. Why can’t pip-audit?

fungi · July 26, 2022, 11:40pm

I guess where the disconnect is for me then is, how does it go about
letting you know what/whether to install unless it’s hooking into
pip’s installation dep solver in order to figure out whether a
vulnerable version of some transitive dependency is about to be
pulled in?

bernatgabor · July 26, 2022, 11:49pm

I’m personally happy and prefer the status quo.

I’m not sure why. I mean unless you want to add some pip install --fail-on-security-vulnaribility or pip install --skip-on-security-vulnaribility flag I don’t see why you’d be using any pip specific logic here . If you’re just checking what’s installed and if they have known security vulnerabilities (what an audit should be IMHO) then I see no reason to involve pip in the equation. Perhaps you can explain why this tool needs to depend on pip? E.g. why would it be not possible to run it on a python installation via apt or conda?

dstufft · July 26, 2022, 11:59pm

I think of this another way, a lot of times people look at pip as a tool just for installing packages, but I think that’s wrong.

If that’s all it was, then commands like pip index, pip wheel, pip show, pip list, pip inspect, etc should not exist. In fact, you could argue that pip audit is really nothing more than pip list --audit with some extra features (support for -r for instance), which --audit is a hypothetical flag like --outdated where, instead of querying the repository for new versions, pip would be querying the repository for vulnerabilities associated with the version.

I have a hard time coming up with an objective criterion for inclusion in pip that includes things like pip list or pip index or pip show but doesn’t include pip audit.

bernatgabor · July 27, 2022, 12:06am

For what’s worth pip also includes pip wheel and when we were designing build pip explicitly refused a pip sdist saying that they’d prefer migrating non-core functionality out of pip. So refusing a pip audit would follow that direction (not to add any more functionality besides what’s already there, and only update core parts: download/install). But AFAIK I’m not a pip maintainer so ultimately this question is not for me to answer.

dstufft · July 27, 2022, 12:08am

Sure, but some of those commands mentioned were added in the most recent release

pradyunsg · July 27, 2022, 8:32am

I still want that.

IIRC, it wasn’t explicitly rejected, but rather that we wanted this functionality to also exist outside of pip.

sbidoul · July 27, 2022, 9:10am

pip install --dry-run --report and pip inspect were added so tools can leverage the main pip algorithms without being implemented in pip or using it’s internals. So in a way, yes, we added small, very generic, easy to maintain features so we can avoid some further scope creep.

To give an example, pip list --outdated could easily have been implemented by composing pip inspect and pip install --dry-run --report - --quiet --upgrade.

And @woodrufw has already shown interest in using these new feature so pip-audit does not need the pip internals anymore, so I took that as a sign that we were on the right track.

Yes, it is not easy to draw the line, and the boundary has changed over time and will probably change again in the future.

My (very personal) feeling is that, at this point in time, the pip team is so small, and there is so much to do to just to make the existing pip features consistent with each other, to support new standards and new python versions, to deprecate legacy behaviours, etc… that I tend to cringe at the idea of adding new large features that could live outside, since just the review would divert us.

Now, don’t take me wrong: I fully agree that pip-audit is a very useful and important feature, and making it available in an ubiquitous way is desirable.

But I think that at this point in time, there must be better options than including it in pip.

For instance, to make it ubiquitous, could we imagine to add it to get-pip, or ensurepip ? And/or should pip grow a way to have subcommands as plugins, like git does ?

pf_moore · July 27, 2022, 9:56am

That precisely aligns with my feelings here.

There’s not so much a clear definition of pip’s scope, as an extremely limited pool of resource, which is all volunteer, and we need to be extremely careful to (a) not demand people commit time they don’t have, or don’t want to commit, and (b) support people working on what interests them, as that improves motivation and retains people. (a) is why we’re reluctant to add new features when asked, but (b) is why we do so anyway

I have reservations here, as I don’t think it would be healthy to have pip maintainers who are only interested in one feature, and don’t feel responsible for pip as a whole. I’m also uncomfortable with the “offer a maintainer to get a feature” feel of this. On the other hand, if the pip-audit maintainers wanted to become pip maintainers in general, and were willing to be responsible for the pip audit command as simply one part of being a pip maintainer, then that would be a different matter - we really could do with more people. (I’d be very surprised if @dustin in particular had the bandwidth to become a pip commiter, though!)

dustin · July 27, 2022, 3:12pm

I think both can be true: I can absolutely say the same thing for PyPI but I’d never say that should prevent us from adding new features. I can appreciate that a new feature comes with an essentially never ending support burden, but there is a line somewhere that has to be drawn to continue to give users the features they need, and strike a balance between being minimizing support and making a useful tool.

The technical reason is that we want to align with how pip works and it’s functionality, but pip doesn’t have an importable API, which makes developing tools like this really cumbersome. We have to end up doing weird stuff to wrap the CLI, vendor parts of pip into our project, and there’s still a ton of rough edges.

But: I’m not asking for the pip team to fix this, partly because I realize that this would be an even larger support burden than any one subcommand I could propose, and partly because I think even if we all decided to go down that path and do it ‘right’ it would probably be a long time before we had this available to us.

I’m not sure this is an apt comparison: black has nothing to do with pip or packaging. A better comparison would maybe be “why has pipenv and poetry become popular without being pip subcommands” and the answer is because they a) offered features that pip doesn’t have and b) either decided not to become part of pip or were told they couldn’t be.

The single biggest complaint I see by Python packaging users, again and again, is that we have a very fractured ecosystem with lots of differing and competing tools. On one hand: I like this, because it means that we have an ecosystem that’s so able to facilitate new ideas that anyone can go and create their pet project if they want to. On the other hand, it means we (across the ecosystem) have an incredibly complicated and arduous learning curve for new users (and existing users!).

I think a single tool as a panacea is very likely a fallacy but I do see value in offering a single, canonical tool that is widely distributed and offers the most critical features for users to use the ecosystem most effectively. My argument is that knowing if you’re installing something that has a vulnerability or is malicious is, today, one of those features.

To be clear, pip-audit has never used pip internals. We use https://pypi.org/p/pip-api/, which wraps pip’s CLI with an importable API. It comes with a maintenance burden and limits what we can do, but we wanted to respect that pip doesn’t have an importable API itself.

Yes, this is exactly right. We depend on pip’s dependency resolver, as well as related things for requirements file parsing, and reporting on the current environment.

I could imagine this, and would be open to it, but would still want to work on including the subcommand in pip directly as this doesn’t really do anything to resolve the technical challenges of building a separate tool against pip.

The “externally maintained subcommands” idea was kind of what I was trying to approach with the original proposal to vendor pip-audit into pip and mount it as a subcommand.

I hesitate to say a plugin system would be a good way to solve the problem here, since we’re talking about overall maintenance burden and I imagine this would require pip to not only maintain an API for plugins but also the plugin system itself, manage compatibility across pip versions and plugin versions… etc.

I probably should have led with this: I very much don’t want to add any increased work on any existing pip maintainer (aside from what would be unavoidable to land the feature) and my assumption was that regardless of how this is implemented, we’d definitely want to continue to own any maintenance/support around the subcommand.

I have bandwidth, but: I historically haven’t tried to contribute significantly to pip because I tend to pitch in in places where either a) I have a unique ability to help or b) things don’t seem to be going well. My impression of pip’s current state is that neither of those are true (with the exception of a dire need for vulnerability management features, which I’m working on ). If that’s not the case, you could pretty easily convince me to help out.

That said, I also have the ability to pay for development work. Almost all development on pip-audit was paid work and ongoing maintenance of pip-audit, regardless of where it lands, as well as related work is already in a statement of work with @woodruffw’s team that will continue into 2023. It would be very reasonable for me to include additional pip maintenance in that if we’re working towards a pip audit subcommand.

And: I can also pay maintainers, especially when the work is somewhat related to open source software security, which I could argue all pip development is. I’m assuming the existing maintainers aren’t necessarily in a place where they can turn $$ into “more maintenance” but if that isn’t the case I’d very much like to talk to you: di@python.org

pf_moore · July 27, 2022, 3:55pm

Just picking up on this single point, I’m not aware of any pip users who have ever asked for an audit capability in pip. People have complained about end to end security risks in the Python packaging ecosystem, but to my knowledge no-one has ever raised an issue with pip that would have been addressed if we had a pip audit command.

So while I appreciate that sometimes users don’t know, or can’t describe, the features they need, I still have a hard time believing that this is something pip’s users want. I’d be OK with it being described as something that might be good for our users even if they don’t want it (IMO security-related features often fall into this category) but even then, it’s only one of many such possibilities, and it’s one that can stand alone, unlike others (e.g. TUF) so I’m still skeptical.

I’ve probably said enough at this point - I think my reservations are clear. I don’t want to drown out other people’s views, so I’ll leave it at that.

mwichmann · July 27, 2022, 3:59pm

Just to point out It’s far from unprecedented if you look at the Linux
packaging space where many distros have an application that manages the
local package installation and database and an application that manages
external indexes and repositories (rpm vs yum, dnf, zypper; dpkg vs apt,
etc.). But you could also argue pip has already crossed over into the
“working with exteral repositories space” so the parallels are sketchy…

(post-edit: grr, I think I ran into multiple Discourse warts here, I went for the email reply because it wouldn’t let me quote sensibly in the web interface, then the email seems to have stripped the quoted bit entirely, this was to the way-back-in-the thread comment from @pf_moore As a separate pip audit subcommand, I’m not convinced it fits within the scope of pip. I see pip more as “managing packages on my machine” and this is more about “querying information from a package index”. It’s a fine distinction, and one that’s far from hard and fast.)