Addressing dependency version confusion exploit chain

For comparative production readiness in regards to e.g. DNF and APT,
Python packages could gain per-manifest-file hashes for eggs and wheels (which don’t execute setup.py as the package manager user) so that debsums or rpm -Va, a database(s) of installed packages and their global/user/conda_root/venv/condaenv install paths, and a way to detect package install path collisions.

Is it not the case that - to exploit ah this - the version number can be the exact same as the actual version number because once you’ve specified an additional --index-url/--extra-index-url, you’ve trusted that source with equal priority as the default index server?

Is pip sufficient with e.g. sudo and umask, or find -exec chmod (edit: and semanage fcontext for SELinux extended attrs)? That’s not portable and it’s not built into pip, a tool that’s useful for read/write development and building one-file software releases.

Solutions for this, AFAICT:

  • add channel priorities (to the CLI, config syntax, and pip._internals)
  • add per-channel allow and deny package lists
  • require per-package release signing keyrings
    • Where do I specify which GPG keys are authorized to release a package (pkgx) on PyPI named my package URI; but not other packages such as e.g. pip?

I’m sorry to be so blunt, but you are talking nonsense.

Distro packages are not less likely to contain malicious code because they contain checksums and are signed. They contain fewer malicious code, because they go through a lengthy review and validation process. Packages are scanned, verified, tested, and checked before they even reach a staging area. Multiple humans and multiple independent systems do manual and automatic verification of code. Users have to become proven and trusted packagers before they able to get involved.

The process is even more rigorous and complex for enterprise distributions. To give you an example: RHEL 8 is based on Fedora 28. F28 was released on May 1, 2018. RHEL 8.0 on on May 7, 2019. RHEL contains only a fraction of the Fedora packages and it took a year to finalize QE. FIPS and CC validation took even longer or is still ongoing.

It would cost tens of millions of USD to apply the a similar level of scrutiny to top 1,000 packages on PyPI. It would also mean each release on PyPI would have to be delayed by days, weeks, perhaps even months.

PyPI and pip are an open ecosystem with all the pros and cons of an open ecosystem. For the most part they work like design and intended. It’s by design that everybody can upload code to PyPI.

1 Like

No, installing python code with pip will not set the correct file permissions on the files. If your install instructions are just pip install pdbpp, that does not leave a user with a secure installation because the user who installed the code, who’s running the code, can overwrite the code on disk: there are no file permissions or security contexts to prevent that from ocurring when you install with pip.

No, installing python code with pip will not set selinux context labels for a per-application interpreter.

Many OS package maintainers do just repack code from PyPI whenever they get around to it: they subscribe to mailing lists, watch GitHub repos for releases and security issues, and may have some special training in reviewing the diffs of package archives (or diffs between git revs; the code before packaging build runs) for security issues. Packaging release lag is a concern: who is responsible for reviewing and repacking this in a timely manner? Is the conda-forge version up to date because there’s a bot that sends PRs to the feedstoc repo when it detects a new version on PyPI? Are there any additional static analysis, dynamic analysis, or mandatory reviews of each Python package or downstream package release? There could be additional release keys for each origin and downstream software artifact release channel: (channel uri, pkg uri, [valid_release_keys], date, signature)

Is there a CPE identifier for the software(s) contained in the package? What is the contact information for the downstream maintainers? FWIU, oss-fuzz provides free review for a few hundred projects and feeds data into OSV, which should also gain additional per-package-version vulnerability information from PyPI (where we can’t yet do anything like dnf upgrade --security to only install security releases, or only repack security releases of python packages for stable OS distros where only backports are admitted).

debsums and rpm -Va check that the hashes of the installed files match what was released in the signed DEB/RPM package: this only checks file integrity of the installed files. There’s a different check for the compressed package archive hash (which is also GPG-signed in competent package repos). Indeed, file integrity checks do nothing to ensure software quality assurance.

Can this (OT) issue be solved without per-package release keyrings? I don’t think that we can detect or prevent “package shadowing” (pip install pdbpp shadows stdlib pdb, for example), or “package overriding” by configuring a different index-url (or MITM), or overriding by version number in an additional index without (1) signed per-package release keyrings; and (2) signed per-package-manifest-file hashes.

With respect to the current discussions, I’d like to ask (and probably should have stated at the original post) this.

Given we know that many orgs out there do not follow the best practices as laid out in this thread even if we announce it clearly and repeatedly:

What are some practical ways we can enhance their security?

I know that we are mostly volunteers, so we’re probably interested in things that have the highest impact for the least effort.

Things that I have seen pop up in this thread:

  1. Undocumented non deterministic behaviour, e.g. the installation order determined by the resolution time of their respective locations

  2. Announce within pip CLI when someone is trying to do something insecure with clear sign posting to documentation

  3. Make the pip freeze and installation from the freeze file more ergonomic, e.g. as simple as making a new command that dumps the versions straight to a file called “safeversions.txt” or something equally obvious.

I don’t think that we can detect or prevent “package shadowing” (pip install pdbpp shadows stdlib pdb, for example), or “package overriding” by configuring a different index-url (or MITM), or overriding by version number in an additional index without (1) signed per-package release keyrings; and (2) signed per-package-manifest-file hashes.

Why is that? Is it because setup.py can do arbitrary things to the python install that we can’t easily detect before hand?

Those organisations can invest in the Python Packaging ecosystem, so that the projects in the ecosystem have the required resources to do something about the issues at play. :grimacing:

That is NOT pip’s behaviour, but a description of how a different package manager does things.

Sure. We’d need to identify what cases this occurs in, perform code review on these and figure out how to write clear documentation for the same. That, IMO, is a lot of work; especially because there’s a lot of improvement to be made in general in Python Packaging documentation as a whole.

It feels like you are describing the pip-tools workflow, which… exists already and has some great tooling built for it too.

1 Like

That’s fine. A misunderstanding on my part.

… pip-tools

Sure.

If people think there’s no more low hanging fruit that’s fine.

Because any package can just overwrite any file: there’s no detection of path collisions.

Because any key can sign for any package from any package index. You are not able to discard packages without valid signatures unless you require all packages to be signed. You cannot differentiate between (PyPI, pip,22) and (local,pip,22+payload) without having distributed release signing keys that local doesn’t have.

(Note: Someone decided to moderate my previous post here. I’m not going to spend more time on this. I’ve already listed some solutions to the problem; which is hopefully better defined for you all now.)

And of course, every distro is different in how much of this validation is done. The more reviews and validation, the more people will complain that system packages are out of date and that authors of the software aren’t in control any more.

I’m not very optimistic about a hypothetical curated.pypi.org: regardless of where on this spectrum it would choose, it would wrong for someone.

1 Like

With dependency links, this kind of attack could have been avoided, right?

Yes, and the same is possible if you pin to specific URLs as well, with PEP 508 url dependencies.

pkg @ https://example.com/whatever.tar.gz

I’d like to clarify a few points about TUF’s protection against this attack. PEP 480’s version of TUF maps specific developer keys to a package, so solves this for those packages. However (as pointed out by @tiran), if it’s an ‘unclaimed’ package in PEP 480 (one without a specified developer key), or if the repository is using PEP 458, the client would have to do some configuration to specify that only the version on their system should be used.

This can be done using terminating mappings, like those used in the TUF client. These terminating mappings allow the client to specify, for example, that package A should be found on their local repository, and only their repository, but all other packages should be found on PyPI. Using this logic, they will not look for these local packages on PyPI, and so will not be vulnerable to this attack.

TUF’s Terminating mappings could be added to pip as part of the TUF integration to verify PEP 458 metadata.

2 Likes

Please start a new topic rather than resurrecting a three year old one. Search first, as there has been way more writing than this, including PEPs, about this topic.