Towards a `pip audit` subcommand for vulnerability analysis & management

dustin · July 27, 2022, 4:23pm

I think there’s a bit of a chicken/egg problem here: if pip is fundamentally opposed to adding new features and subcommands, and doesn’t have a path towards it, users aren’t going to ask for it. I also think you’re right that most security-related features don’t get explicitly requested by end users.

That said there’s a few issues that I think if you squint at them they are related:

Provide audit hook events during package installation · Issue #8938 · pypa/pip · GitHub asks about creating hooks to do some similar things that pip-audit does in tools external to pip
A bug database and a bug signalling method in pip or withdrawing buggy versions from repos · Issue #8315 · pypa/pip · GitHub is about “bugs” but I think they’re really talking about vulnerabilities
Feature request: Automatically uninstall malicious packages taken down from PyPI · Issue #5777 · pypa/pip · GitHub is about taking some action on malicious projects (disclaimer, I requested this, but long before we started work on pip-audit.)

pradyunsg · July 27, 2022, 5:51pm

We have this unblocked now, FWIW – pip inspect and pip install --report should provide all the information that folks could want out of pip’s resolution or environment inspection logic.

pf_moore · July 27, 2022, 5:53pm

Thanks for the links. One comment I note from the 3rd issue (warning: this is taken from a different context, please don’t assume it’s directly related to this thread without understanding the context!)

Given there’s a server side and a client side here, and IMO we should have this standardized, we probably should at a minimum discuss this on distutils-sig, if not produce a PEP for it.

I think this is a good point, and as has been mentioned here already we should probably start here by standardising the API for getting the vulnerability data. There are a lot of other good points in that discussion around mirrors, and who can give authoritative answers on what files have vulnerabilities, which we should probably consider too.

For example, I note that the PyPI JSON API has “vulnerabilities” as a key at the project level. Surely vulnerabilities should be flagged at release level at a minimum, if not at file level? Consider the example from that 3rd pip issue - if a company mirrors PyPI on its local index, and adds a wheel for a project that fixes a known vulnerability that upstream hasn’t dealt with yet - surely an audit should report that the project is safe if the build uses that local wheel, but not if it uses the upstream code?

That’s the sort of issue we should be discussing and resolving in a PEP for exposing vulnerability data.

pradyunsg · July 27, 2022, 6:01pm

And, to the broader points being made:

I don’t think pip should reject all incoming feature requests, but there’s certainly a balancing act here. I think that a pip audit command that’s pulling information from the package index is definitely reasonable to include.
I’m wary of including complete vulnerability data in the" regular" index responses, and would prefer to put them separately (or, keeping them as a separate file per project that can be requested or something like the marker in the METADATA serving PEP). It’s information that installers don’t need to use typically, and response sizes were a consideration during the PEP 691 discussion and I expect that’d be true even when the proposal for this comes through. We can discuss the details when the PEP is proposed, but I do think having this information come from the same index or a different source are both reasonable.
I think the pip-audit maintainers have done a fairly great job of doing things in a “in-contract” manner, when using pip and interacting with it. I want to explicitly note that I appreciate that.
I don’t think we’re at a point where pip can/should discuss adding APIs or plugins. I think we should have plugins, but I think having that discussion now would derail the topic at hand, through the nature of scope-creep like that. It’s a good idea, and we should talk about it – separately from this discussion. It’s a strict superset.

dustin · July 27, 2022, 6:51pm

Still not an importable API as far as I’m aware though, right?

I have zero intent of making pip use non-standardized APIs, but starting work here only makes sense if we have some buy-in that it will be used first.

I’m not following you here, the legacy JSON API has vulnerabilities at the release level, e.g. https://pypi.org/pypi/django/3.2/json has a ‘vulnerabilities’ field that contains all vulnerabilities present in that release. Practically, recording vulnerabilities is probably too fine-grained for most use cases and wouldn’t be necessary for 99% of vulnerability reports that currently exist. (Not that there isn’t an edge case where this could be necessary, it’s just unlikely.)

I appreciate the thoughts here but I think we can table this until we have a draft PEP.

Thanks!

sumanah · August 1, 2022, 3:00pm

@dustin Thanks for starting this conversation.

It sounds like part of what we’re figuring out is: what do users want/expect when they’re working with pip, and how much additional friction would it cause for them if they have to invoke the audit command one way versus another way?

Additionally: beyond maintainer capacity, what should our criteria be for including particular commands and not others within pip? Do we need to support consistency in the user’s mental model of “this is the kind of thing one uses pip for”, and if so, what are our users’ mental models about that?

These are user experience research questions. Has the pip audit team done UX research work you could point to that would shed light on these questions? Or is there perhaps research from the 2020 effort that could help guide us here?

woodruffw · August 7, 2022, 4:51am

This isn’t a totalizing reason, but one good reason that I can think of (as one of the primary pip-audit developers): users often have remarkably messy local development environments. That means multiple copies of Python, multiple copies of pip, virtual environment and pipx and pyenv wrappers, etc.

When a user does pip install pip-audit at the moment, they’re effectively communicating to us that they want pip-audit to be in the same environment as the surrounding pip, and that pip-audit’s functionality is possibility limited to however old the surrounding pip is (since we use pip-api and some custom shelling out of our own). We have workarounds for this, but (IMO) the developer experience of pip audit is much cleaner than pip install pip-audit && pip-audit. Essentially, fewer impedance mismatches to worry about.

And to second: my team at my company (Trail of Bits) is more than happy to perform maintenance work on pip, both within the context of pip-audit and in a broader sense!

I’m definitely biased, but I’ll say this as a user of other package management ecosystems: I more or less expect some amount of auditing functionality in my package installer. npm audit is probably the most widely used example (and has plenty of flaws, as described upthread), but I also regularly use cargo audit. That’s a funny case since it’s technically a third-party command, but it has a major Rust WG behind it (RustSec) and doesn’t have the same DX problems as a separate pip-audit command since cargo has allowed third-party commands from the beginning (not that I think pip should!)

The point about additional friction is a great one: I don’t want a prospective pip audit subcommand to suffer the same security fatigue fate that npm audit does. My first blush idea for integration was to have pip audit be 100% explicit, at least to begin with – users should have to explicitly invoke it, rather than it being a side effect of a pip install ... invocation. That might make sense to change over time, but I think that would be minimally disruptive for an “MVP” while also getting auditing functionality into as many developers’ hands as possible.

Again showing my bias , but I think of pip as a “package management system.” It already has functionality that’s strictly outside of package installation, but all current functionality (AFAIK) has something to do with querying, managing, or checking the state of Python packages and package distributions.

From there, I think pip-audit would qualify for subcommand inclusion on the basis that (1) it operates entirely on the same objects and state as pip ordinarily does (i.e., it does not broaden the scope of things pip concerns itself with, even though it adds new code), and (2) it exposes functionality that exists in other package management ecosystems, like npm and cargo.

I recognize, however, that “other package tools do it!” is not necessarily a precedent that the pip maintainers wish to establish. But I think the combination of prior art and the limited domain of interest make pip-audit a reasonable candidate in particular for inclusion.

woodruffw · August 7, 2022, 4:59am

Forgot to say: I’m +1 on whatever development model is easiest and involves the least friction for pip’s maintainers: if that means embedding pip-audit’s source code and moving development to the pip repo rather than vendoring, then I’m all for that!

kpfleming · August 7, 2022, 11:02am

I for one am 1000% on-board with using pipx for managing outside-a-dev-environment Python tools. I used to use pyenv for this but it became too cumbersome, and now that I’m using pipx life is so much better

In fact I think this model is much cleaner, as it means that when I run hatch or pre-commit or whatever they aren’t affected by the version of Python I’ve chosen for the dev environment I’m working in.

pf_moore · August 7, 2022, 11:38am

And for tools that need to run in the environment they are managing, that’s easy enough - you can just add yourself (and any dependencies you need) to sys.path and then reinvoke yourself using runpy. Look at pip’s code to use itself when setting up an isolated build environment (also used in the new --python argument that will be in the next release) if you need to see how to do that.

woodruffw · August 8, 2022, 2:53pm

I like pipx too! I think pip-audit does make as much sense as an “outside-of-dev-environment” command, however – its default behavior (without any flags) is to audit the current environment, which makes the most sense when the user is is an active virtual environment. pip-audit also whichever version of pip is active and first on the $PATH and feature-tests it (via pip-api), which in turn means that we generally want to take advantage of a user’s more updated pip (when they’re in a virtual environment).

Yep – I don’t think it’s difficult, per se, just that the user/developer experience there has more friction/ambiguity than it needs to. pip audit would IMO convey intention in a way that pip-audit doesn’t on its own.

pf_moore · August 8, 2022, 3:34pm

I’m not sure I agree. It depends how you design the invocation of pip-audit. If you were to call it auditenv and give it an --environment flag, which defaulted to the currently active environment, auditenv --environment .venv seems to pretty clearly convey the intention to me.

Yes, you named it pip-audit and designed it the way you did because you imagined it being integrated with pip. But what I’m saying is that you’re not locked into that design, if you want something that conveys intention well.

By the way, we’re developing features in pip that mean someone could have a single central pip zipapp, and use the --python flag to pick the environment. In that case, I’d argue that pip --python .venv audit expresses the intention much less clearly than auditenv --environment .venv. And even as things are now, I’m not that sure that .venv\Scripts\python.exe -m pip audit is particularly clear, either

uranusjr · August 16, 2022, 8:01am

Stepping back from whether and how should pip accept new features, I want to figure out how the tool would fit into the user workflow. We’ve established that doing auditing during pip install is not a good idea, and from the API pip-audit provides, it seems like auditing should be done against either an environment containing installations, or a specification of such (e.g. requirements.txt), as a separate, explicit step. In this scenario, the difference to include the tool inside pip would make, from what I can tell, is that the user can install one less tool, but with the community moving more to CI and tools like Nox and pre-commit, I feel like that’s an increasingly minor concern.

It seems that the only kind of people that would actively benefit from the inclusion into pip (either as a hooked subcommand, or direct vendor) would be those that don’t use any of the above task-running tools, and also know little enough to not know to pip install pip-audit on their own and run a command (pip-audit) that is not a “standard Python tool” (in quotes)—and honestly, I’m not sure telling them to run pip audit blindly is a good idea to begin with, for the same reason why auditing during pip install is not actively pursued. So the conclusion I’m reaching personally is that including the command in pip does not really provide any significant benefit.

This leaves the technical side of things—pip audit does currently rely on pip functionalities, and vendoring it in pip would help mitigate a lot of the maintenance burden. But as mentioned previously in thread, we could also achieve the same goal by splitting out pip functionalities into a new library, and expose better/new functionalities with the existing API. This is probably the direction I would prefer to take.

dustin · August 16, 2022, 10:08am

I think doing this by default (like npm) is not a good idea, but providing a way to enable this or set a policy that enables this is a good idea.

pf_moore · August 16, 2022, 10:15am

I don’t think that’s been established at this point. I’m not even clear what “audit on install” would look like? Would the install fail if vlnerabilities were found? What if the vulnerability was in a build-dependency of something that was being built as a deeply nested dependency of the project being installed? How would this be reported in a meaningful way, and what would the user’s options be? And what if the user had confirmed that the vulnerability didn’t affect their specific use case - how would they override the audit?

Maybe opt-in auditing on install is useful in some scenarios, but I don’t think it’s an unqualified “good idea” in the absence of a well thought out UI.

uranusjr · August 16, 2022, 10:46am

Right, and in theory I think having something during installation would be a good idea, since that’s the only chance we can provide this information for a lot of people. But the correct interface for this would be more difficult, and as long as we don’t figure it out, we won’t need to bundle the feature into pip, and thus the conclusion doesn’t change.

I guess perhaps this indicates we should keep the tools separated for now, and do the followings instead?

Find a way to advertise pip-audit so people start using it like flake8, mypy, etc.
Communicate what pip-audit needs so pip can provide more streamlined API (either CLI or potentially a “private” Python interface?)
Figure out how we can better design on-install auditing and eventually incorporate auditing feature into pip install

EWDurbin · August 16, 2022, 2:43pm

I am in support of functionality in pip itself to be able to expose/raise when packages being installed from an index have known vulnerabilities. Similar to --require-hashes, I’d love to opt-in to --error-on-vulnerability or something.

However, I’m also now wondering if exposing that information via a separate API from the simple index is optimal. If PyPI knows about a vulnerability on a release, what’s keeping us from adding that as part of the Simple API spec via an attribute like requires-python. This can then trivially be part of 691 as well.

I prefer to think of pip as primarily a client of the Simple API, thus if PyPI wants to communicate something to pip, we should do it via that specified mechanism. I recognize that pip is also a consumer of our “legacy” JSON APIs where the vulnerability reports are already published, but if we want something established via specification that is not the right place .

Given that, if the intention is for this functionality to be a primary part of pip, I’d propose that support for reporting known vulnerability information is added to the relevant Simple API PEPs and pip grows support to consume it.

dustin · August 16, 2022, 3:02pm

Yep! This is the “new PEP to describe additional vulnerability fields for PEP 691” item in the roadmap.