Supplying vulnerability-based constraints to the resolution process

fridex · January 26, 2023, 10:59pm

Hi all,

here is a small prototype that can resolve application dependencies respecting vulnerabilities. By default, the tool tries to produce a lock file with all the packages resolved to versions without vulnerabilities, if possible. Users can optionally supply a listing of acceptable vulnerabilities in the application dependencies. OSV is used as a source for vulnerabilities.

The tool is just a prototype, it uses backtracking resolver as implemented by pip-tools to produce the lock file. There is no intention to introduce just another tool to manage application dependencies, but as there are efforts to offer vulnerability data to the simple API, what would be possible vision of community in this area?

Thanks and have a great day,
Fridolin

dustin · January 30, 2023, 6:00pm

Hey, thanks for sharing @fridex, this looks really interesting.

Personally, I’d love to see installers like pip be able to take vulnerability data into account during resolution and installation. That’s the main goal of that PEP, and so I’d be curious about your thoughts on the current draft and whether it’d suite your expected use case.

Some previous discussion about this is here as well: https://discuss.python.org/t/towards-a-pip-audit-subcommand-for-vulnerability-analysis-management/17681

pombredanne · January 30, 2023, 7:26pm

@fridex Thanks! Let me throw in a few related goodies in the same direction and that you may fancy checking out:

a paper on Non Vulnerable Dependency Resolution by @TG1999
a dependency resolver CLi and lib named python-inspector (GitHub - nexB/python-inspector: Inspect Python code and PyPI package manifests. Resolve Python dependencies.) using resolvelib like pip
a branch that implements Non Vulnerable Dependency Resolution in this tool GitHub - nexB/python-inspector at nvdr blending functional and vulnerable range constraints.

This is using vulnerablecode as a data source rather than OSV, and this also aggregates OSV (which also uses some of the vulnerablecode library for version parsing

PS: Glad to see that you use pip-requirements-parser in pipctl!

fridex · January 30, 2023, 7:35pm

+1, IMHO vulnerability data are very valuable information for users - users should be aware of possible security implications of software they are consuming.

When @dstufft and I discussed the approach mentioned in this topic, Donald had a nice idea - first just print warnings for users. That could be very nice starting point for pip - first, providing vulnerability information to users and then, if they want to avoid vulnerabilities, they can do so by turning on resolution process that would consider vulnerabilities.

Thank you. This topic was also raised with Thoth cloud based resolver in Thoth - an enhanced server-side resolution offered to the Python community. Having this standardised could be nice.

pombredanne · January 30, 2023, 7:43pm

@fridex after diving a bit more in your code, I see you are using pip-tools meaning that you are eventually dependent on the version of pip that’s installed and are likely restricting resolution to requirements for the current Python runtime, os and version. Is this correct?

Also if you are running the resolver through pip-tools, is this actually building wheels through pip?

dustin · January 30, 2023, 9:43pm

FWIW, I’m personally not too sure this would be a great UX – a similar change in the node.js ecosystem is largely regarded as a mistake: npm audit: Broken by Design — overreacted

pf_moore · January 30, 2023, 10:42pm

Speaking as a pip maintainer I agree, this seems like a very bad idea. If a vulnerability is bad enough that we want to prevent users from installing it (or even to warn users that they shouldn’t install it - after all, most users likely don’t have the knowledge to do a reasonable security review themselves), then we should be deleting the release from PyPI. Fix the issue at the root. At the very least, we should have a separate index where we move artifacts with vulnerabilities, so that to opt into vulnerable libraries, users must add pypi-vulnerable to their list of indexes.

Obviously, that’s absurd. But no more so than preventing installation of libraries with vulnerabilities at the front end.

And “just” warning has all of the problems that the linked article mentions. As has been mentioned in a number of contexts, a significant proportion of Python users aren’t programmers, and don’t have the knowledge to reasonably evaluate vulnerability warnings (especially in the face of things like the “Regex DOS” abuses of the vulnerability reporting process that’s going on at the moment).

By all means let’s have a means of doing a vulnerability audit, so that people with the right sort of expertise can sensibly review the risks for a project. But please, don’t make it part of the everyday install command.

dustin · January 30, 2023, 11:28pm

To be clear, I mean the “warn by default on install” pattern is what’s potentially problematic. The original point about being able to take vulnerabilities into account at install-time is still desirable, IMO.

pf_moore · January 30, 2023, 11:46pm

I remain unconvinced. Why not provide a view on PyPI that omits packages with vulnerabilities? That way we don’t lock people needing this functionality into pip. PDM, Poetry and Hatch do installs without exposing pip’s UI, for example.

Or if PyPI doesn’t want to support this centrally, users or organisations who want it can set up a proxying index which filters out vulnerable packages.

mwichmann · January 31, 2023, 12:31am

This loops around to something I poked at before, in the context of the
survey: there may be survey data that suggests that people frequently
use PyPI to find packages, but I remain convinced (*) that many (most?)
“unsophisticated” users install packages because instructions they were
given said to do “pip install foo”, so that’s what they do. A PyPI
filtered view (default or not) is unlikely to do anything for this
audience. If the supposed vulnerable package is a dependency pulled in
by the thing they asked to install, it’s even more indirect.

(*) I see this on beginner lists/forums all the time. The instructions
often aren’t precise enough and then someone wonders why the thing they
installed can’t be imported (wrong Python/pip/paths, all the normal
stuff): the bad instructions are their own problem, but they anecdotally
show how often people install stuff because they were just told to.

pf_moore · January 31, 2023, 1:12am

Such users would not be equipped to understand a vulnerability warning anyway. If we really want to stop people installing packages with reported vulnerabilities (I don’t, personally, but the tone here seems to suggest others do) then we need to remove those packages from PyPI.

Expecting pip maintainers to deal with issue reports saying “I want to install x, which is on PyPI, but pip refuses to install it” isn’t reasonable IMO.

Also, a restriction in pip is easily bypassed, deliberately or accidentally, by downloading wheels and installing offline. Pip’s wheel cache would have this effect even for the default pip install.

steve.dower · January 31, 2023, 11:20am

I’m strongly against PyPI or pip-when-using-PyPI doing any blocking based on vulnerability reports.

My context is that I am frequently dealing with fallout from automated tools that mark an entire package as “critically vulnerable - remove immediately” for something that, on even basic analysis, is not at all vulnerable (most recently, an app at work that embeds a copy of Python for its private non-network use was flagged because of potential regex DoS in IDNA decoding… and while we can say that it’s not impacted, customers also get the flags and are stuck following the tool rather than us).

Package owners on PyPI can yank or delete their own packages if there is something wrong. It might be a nice service for PyPI to email maintainers when a report arises, because sometimes ~~borderline malicious~~well-meaning researchers will request a CVE directly from MITRE without involving the project. But PyPI is the open repository - it should not block access to legitimate packages later found to have bugs.^[1]

You could run a profitable business providing a proxied PyPI that filters out vulnerable packages. There’s a clear opening in the market right now, given that all the existing businesses that do this don’t allow using pip directly against their repositories. (I know a while back the PSF was considering doing this for their own supplemental revenue stream…) Likewise, you could distribute a constraints file to exclude vulnerable versions for those who want to use it. We have the ability to do all this already without messing with the central host or the default tools.

Obvious exceptions here for the ones we regularly take down already for having no functionality. I’m also okay with blanket bans on things in sdist build files (setup.py et al), but that’s way off topic right now. ↩︎

fridex · January 31, 2023, 12:22pm

Yes, the resolution depends on the environment in which it was done. When resolving Python dependencies, the resolved set of dependencies might differ depending on the environment - the resolution needs to find suitable wheels (respecting their tags) and respect environment markers.

If the question is about installing source distributions - to my knowledge, pip-tools will consider source distribution during the resolution, wheels are built with pip once they are installed (but I’m not 100% sure here with this behaviour and its details).

We might want to distinguish here a vulnerability and an exploitable vulnerability. The proposal is more about providing vulnerability information to users and provide a way how to eventually deal with vulnerabilities. It is up to the users whether they would consider this information, whether they would flag a vulnerability as exploitable, and whether they want to take actions to remediate such exploitable vulnerabilities present - it highly depends on the environment where the application is running or, for example, its configuration. Flagging vulnerabilities as exploitable is not something PyPI should do. I also agree with Paul - this is not something all the people who use pip would do or have knowledge to do. Nevertheless, there could be a use case for it so, if Python community finds this interesting, it might be more about finding sane defaults to provide such solution (such as providing optional flags and keep the current behaviour as is).

I agree, it is very interesting area. Maybe Python upstream could consider providing a standardised way how to deal with vulnerabilities (and their exploitability) and let industry flag eventually exploitable vulnerabilities. Also, there is non-zero investments in providing and maintaining such standardised solutions to consider.

If other ecosystems implemented it in a wrong way, Python ecosystem might learn the lesson and provide a better UX. Also, npm still provides these features.

pf_moore · January 31, 2023, 1:24pm

IMO, the “sane default” is to make vulnerability reporting a separate tool that can be used by the people who know how to use it and understand what they are doing. Anything that isn’t opt-in is pretty much guaranteed in my experience to give people a negative view of the subject.

And I don’t believe that the pip maintainers are, or should be expected to become, sufficiently expert in the matter of vulnerability analysis to deliver or maintain an appropriate solution here. Which is why I don’t think this should be part of pip. It should be handled by specialists writing a tool for an audience they understand and can target appropriately.

fungi · January 31, 2023, 1:35pm

With my long-time open source vulnerability coordinator hat squarely
on, I’ll just say that in my opinion (and the opinions of many of my
colleagues) there’s no such thing as an “unexploitable
vulnerability.” It may be a bug worth fixing, but if it can’t be
exploited then it’s not a vulnerability. We already all deal with
far too many alarmist reports from so-called security researchers
who simply ran a static analyzer or fuzzer against our software and
shopped the results around hoping to make a quick buck off a bounty
or to make a name for themselves.

Talk about security vulnerabilities vs other security-related bugs
and hardening opportunities if you must, but if a bug can’t be
exploited then how is it a security vulnerability?

steve.dower · January 31, 2023, 5:36pm

Yes, except we can’t. That can only be determined in the context of where and how the code is being used.

Our vulnerability reporting system we use at work is notify-only for most issues (anything blatantly stealing credentials or doing damage is a different case - I’m talking about “potential DoS” type CVEs). The engineers consuming them have to acknowledge and respond to the notification.

I don’t think we would, but if we were to integrate this data source into our system, we’d want a command to run automatically during builds to scan all the dependencies and send a report into our issue tracking system. This is most appropriate as a separate, information-only tool, and not one that interferes with actual use.

This is what MITRE does, and does quite well. There’s nothing for “Python upstream” to do here - we already work with MITRE to provide notifications via their CVE database (and I believe one of the responsibilities of the security developer in residence the PSF is hiring will be to help manage this, potentially even beyond the scope of the core runtime).

bunny-therapist · June 20, 2023, 11:48am

This is quite similar to a package I made last year: security-constraints · PyPI
My package does not take a requirements file as input, it creates a constraints file which can be given to pip (or pip-compile, through a requirements file) with the “-c” option.