PEP 817 - Wheel Variants: Beyond Platform Tags

We are happy to announce that “PEP 817 – Wheel Variants: Beyond Platform Tags” has been merged and the full text of the PEP is now available at: PEP 817 – Wheel Variants: Beyond Platform Tags | peps.python.org.

History of Conversation

TL;DR

PEP 817 – Wheel Variants: Beyond Platform Tags is a proposal that addresses limitations in how Python packages handle hardware-dependent builds (GPUs, CPU instruction sets, BLAS variants, etc.). PEP 817 enables automatic selection of optimized wheel builds based on system hardware. Instead of separate indexes, package names, or manual selection, users run pip install <package> and the right variant is automatically installed.

Proposal to help guide this conversation

To help this discussion remain organized, we’re committing to the following:

We’ll post a summary every 1-2 weeks (depending on discussion volume) capturing the major points, concerns, and themes being raised across the thread. These summaries will help new readers catch up and ensure no voices are lost in a lengthy conversation. We commit to representing all viewpoints fairly and accurately - please call us out if you feel something has been misrepresented, and we’ll correct it immediately.

We hope this will be helpful for anyone to quickly follow and catchup on the conversation.

The Problem (Why This Matters)

Current State of Pain

The Python packaging ecosystem struggles with hardware-dependent packages:

  • PyTorch publishes 7 different variants (CPU-only, multiple CUDA versions, ROCm, XPU) but users must manually select: pip install torch --index-url="https://download.pytorch.org/whl/cu129"
  • CuPy publishes 55 different packages (cupy-cuda100, cupy-cuda101, …, cupy-rocm-6-3) because there’s no standard way to express variants
  • JAX requires users to play with complex combinations of extras: pip install jax[cuda13] (12 different extras exist, many overlapping)
  • NumPy/SciPy cannot easily offer BLAS/LAPACK variants (OpenBLAS vs MKL) without duplicating wheel builds or package names.

Workarounds and Their Costs

Each workaround has serious drawbacks:

Approach Cost
Separate indexes Manual installation steps, security risks (combining indexes), separate infrastructure
Package name variants (xgboost vs xgboost-cpu) Dependency confusion, potential file conflicts, name-squatting attacks
Bundled “mega-wheels” Excessive binary size, wasted bandwidth, exceeds PyPI size limits
Extras mechanism (jax[cuda12], jax[cuda13]) Non-exclusivity, broken defaults (pip install jax is unusable without extras)
Source distribution workarounds Requires source build, security risk (arbitrary code execution), no --only-binary support, breaks installers’ caching assumptions

This fragmentation is especially painful for scientific computing and AI/ML, where ~40% of Python developers now work according to the 2024 Python Developers Survey.

The Proposed Solution

The design proposed in this PEP matter to end users (simpler and more robust installs, smaller downloads, increased performance for some packages) as well as to package maintainers (simplifies packaging, extensible design as new hardware and complex dependencies show up).

Please see the quotes in the Motivation section of the PEP for different perspectives of why this proposed design matters.

What This Looks Like in Practice

Before (PyTorch today):

pip install torch --index-url="https://download.pytorch.org/whl/cu129"
# or
pip install torch-cu129  # hypothetical separate package name

After (with PEP 817):

pip install torch
# Installer detects your CUDA 12.9 GPU and installs torch-2.9.0-...-cuda129_openblas.whl
# If no GPU: installs torch-2.9.0-...-null.whl (CPU-only)

Implementation Status & Path Forward

Reference Implementations

  • variantlib: A library with a reference implementation of all parts of the proposed standard that we expect will be used by many packaging tools and installers.
  • uv client: Astral’s package manager has variant support (currently in a separate branch as a prototype)
  • WheelNext Index: Community initiative with wheel index demo
  • Build Backends: (provided as example - overall fairly simple changes)

@mgorny @konstin @rgommers @atalman @charliermarsh @msarahan @seemethere @barry @dstufft @aterrel

21 Likes

While re-reading the PEP section on the modified wheel filename, I was wondering if you have considered always requiring a build tag for wheel variant files. This would simplify parsing wheel file names: A wheel variant file would be recognizable by the number of file name components alone.

Well, I can say that i didn’t occur to me and I don’t think anyone made such a suggestion before. It’s an interesting idea. I’d be slightly worried that build numbers are rather rare today, and we’d be proliferating them. I don’t think that’s wrong, it’s just that it feels like an artificial limitation, and I wouldn’t go that way if it’s not strictly necessary.

1 Like

I agree. Thanks for the explanation

Congratulations on the PEP! It’s a lot to understand, but I’ve done my best to give it a quick but considered read and I have these questions, requests, or nitpicks:


Under Modified wheel filename:

One of the core requirements of the design is to ensure that installers predating this PEP will ignore wheel variant files.

Have you tested the new variant wheel filename format against pip’s old non-standard filename regex:

I removed this regex in pip 25.3, but it’s likely to be in the wild for a long time.


Under PyTorch CPU/GPU variants:

If a GPU runtime is available and supported, the installer automatically chooses the wheel for the newest runtime supported.

I find this sentence a little confusing between the responsibility of the provider and the installer. An installer won’t have any concept of what the “newest runtime” is, that is surely the responsibility of the provider, and the provider gives ordered variants the installer. E.g. If the provider gives the wrong runtime ordering the installer will have no way to validate this, right?

In general this PEP needs to be really clearly separate what the provider is responsible for and what the installer is responsible for and not mix up those terms, even if an installer vendors or implements it’s own provider logic conceptually they should be distinct.


Under Security implications

It is expected that a limited subset of popular provider plugins will be either vendored by the installer

I don’t agree that this PEP should be setting this expectation on all Python package installers.

For example, has a pip maintainer volunteered to do this for pip? We don’t current vendor popular build backends, which would have a similar benefit, I don’t expect we will suddenly have the maintainer capacity to vendor and maintain variant providers. I therefore don’t think it will be “expected”,

External plugins requiring explicit opt-in should be rare, minimizing the workflow disruption and reducing the risk that users blanket-allow all plugins.

Similarly, I don’t agree with this assertion.

For pip, under the current wording of the security implications of this PEP, I would expect users wanting to use variants would always have to explicitly opt into install time plugins.


Under Providers

It is RECOMMENDED that said tools vendor, reimplement or lock the most commonly used plugins to specific wheels. For plugins and their dependencies that are neither reimplemented, vendored nor otherwise vetted, a trust-on-first-use mechanism for every version is RECOMMENDED. In interactive sessions, the tool can explicitly ask the user for approval. In non-interactive sessions, the approval can be given using command-line interface options. It is important that the user is informed of the risk before giving such an approval.

If I am reading this correctly this implies that it is recommended for tools to act differently between interactive and non-interactive sessions. But as a tool maintainer I’m generally against behavior differences between different session types, as bad heuristics lead to difficult to reproduce bugs, and even when working correctly it surprises users as it can work on the CLI but then not work in a script.

Further, the security model suggested leads to surprising behavior, let’s say a user runs an install with --allow-provider foo and then they forget about that and foo doesn’t get updated for a long time so they stop providing --allow-provider foo, then one day things break in an unexpected manner because foo releases a bug fix.

In general I would prefer these recommendations be toned down or removed.


Under ABI Dependency Variant Namespace (Optional):

Tools that do not implement this feature MUST treat the variants using it as incompatible, and SHOULD inform users when such wheels are skipped.

Please lower SHOULD to MAY, if naively implemented this would likely result in thousands of warnings once such wheels get published, and even if the tool only warns once that this happened what is the action the users should take? Don’t use the tool?

3 Likes

I have taken a very brief read of the PEP, so please take the following comments with that in mind. But I’ll start by saying that I echo all of @notatallshaw’s comments.

I very strongly agree with this comment. In particular, I don’t think the pip maintainers have anything like the bandwidth to vendor provider plugins. If I understand correctly, vendoring provider plugins would involve being responsible for the security risks around those vendored plugins. There are some serious problems with that - pip issues releases on a 3-month cycle, and if a provider plugin has a security issue, I would want that fixed immediately, not after a 3-month wait. That’s only possible if we don’t vendor, when fixes will be picked up immediately they are released[1].

From the “Extended wheel filename” section:

Installers that do not implement this specification MUST ignore wheels with variant label when installing from an index, and fall back to a wheel without such label if it is available. If no such wheel is available, the installer SHOULD output an appropriate diagnostic, in particular warning if it results in selecting an earlier package version or a clear error if no package version can be installed.

By far the biggest class of installers which don’t implement this spec are older versions of installers, which aren’t aware of wheel variants at all. I don’t understand how you can require that such installers SHOULD issue a diagnostic about anything to do with wheel variants.

More generally, the PEP cannot logically require anything of tools that don’t implement the PEP. I feel that the PEP authors need to reconsider what they are trying to achieve here (which may just be “better diagnostics”, in which case maybe we simply need to accept that we can’t have better diagnostics from old tools, and work out how to deal with what we can get…)

In “Providers”:

When installing or resolving variant wheels, installers SHOULD query the variant provider to verify whether a given wheel’s properties are compatible with the system and to select the best variant through variant ordering.

Has any work been done on the performance implications of this requirement? Currently, pip’s resolver (I don’t know about uv’s) can select the best wheel for a given name/version without needing to download the wheel, or any metadata outside of that provided by the simple index API. The variant provider mechanism appears to impose additional costs here, in terms of both running the provider plugin, and downloading wheel metadata to access the variant properties.

I note from later in the spec that the “index level variant metadata file” allows downloading variant data without downloading the full wheel. This reduces the performance penalty, but it’s still an additional HTTP request, beyond the fetch of the index data. Also, there’s no mention of an equivalent for the core-metadata element of the JSON simple index (and its HTML equivalent), so I’m not sure how tools will determine that the variant metadata file exists, short of unconditionally trying to fetch it (which is yet more overhead).

One thing I’d really like to see added to the PEP is a worked example of installing a package in the presence of variant wheels. To be helpful, this would need to sketch out the process of dependency resolution, identifying the points where variant data needs to be checked. I’m interested in this because at the moment, I find it really hard to understand the high level picture - the PEP has lots of individual details, but I can’t see how they all work together in the installation process.

Sorry, but that’s as much as I can offer in terms of review at the moment. The PEP is pretty complex, and my free time to work on packaging is currently severely limited, so I don’t know when I’ll be able to do a more in depth review. I hope this is useful in the meantime.


  1. Yes, we have this issue with all of the packages we vendor, but the risks around provider plugins seem notably higher to me. ↩︎

1 Like

Thanks @notatallshaw and @pf_moore for having a first read and sharing your initial feedback. Let me focus in this reply on the most important point (vendoring) only.

As discussed in the previous wheel variants discussion thread, “installer authors don’t want to vendor” was our starting assumption. However, the security/vendoring tradeoff was fairly extensively discussed in that thread, and @pf_moore did say the following (from this message):

Speaking personally, I’d be against including the code (in the sense of being responsible for it ourselves) but vendoring is much more plausible.

Your points all apply, but if I’m trading them off against the risks and problems around adding a mechanism whereby pip downloads and installs plugins on demand, based on package metadata, then I’d be willing to consider vendoring.

I don’t think that is where we want to be - I think the proposal should require specific designs from installers. I’m not talking about UI (command line flags, defaults, etc.) but I do mean everything else - should selectors be downloaded and invoked dynamically, or should they be static based on a fixed whitelist?

As mentioned in the initial post of this thread, the key takeaway from that whole 200+ post discussion was to change the previous wheel variant design from being build backend-like to “installers must not run them automatically, and should vendor the most popular providers”.

That would not have a similar benefit, because build backend are allowed to run without the user opting in to them running (i.e., pip install pkgname works), while wheel variants are not.

In addition, build backends are (at the very least) an order of magnitude more complex than variant providers. It’s also fine to set extra requirements on vendoring, such as to even be considered for vendoring a provider should be pure Python, self-contained (not have any dependencies), and pass whatever checks (linting, type checkers, security scanner, etc.) are deemed needed by installer maintainers.

Similar for vendoring effort, it’d be fine to say “the provider author has to do the work”, or even add more constraints like whoever does the vendoring work must commit to maintenance effort, be part of some vetted group (PyPA, the PEP authors, WheelNext, a special Pip team, or whatever[1]). It doesn’t have to cost the maintainers much effort.

All that said, what providers get vendored is ultimately up to the maintainers of course. I could imagine pip vendoring only 3-4 plugins (CUDA, x86-64, arm64, and maybe ROCm) while uv vendors more of them, taking a different position on the effort-vs-usability spectrum.

Your footnote is the most relevant here. I really don’t think the risk with providers is any higher, since they’re relatively small pure Python packages[2], which are going to evolve only slowly (mostly adding config values for new hardware, and bug fixes - there aren’t any features to add). And, again, if it looks too risky/complex for a particular provider, maintainers are free to reject it.

This vendoring topic is pretty critical to the whole design. Do my answers and the context of the previous discussion help change your initial takes here?


  1. You could even suggest that if the one-off initial review effort is deemed too high, the provider authors or PEP authors may have to find the funds to pay one Pip maintainer. or a dev that the Pip team personally trusts, to check the work. ↩︎

  2. “small” is always relative; but let’s say collectively small compared to the 250 .py files and ~90,000 LoC currently in pip’s _vendor folder in total ↩︎

4 Likes

Yeah that is fair, there are a lot of moving parts, and even writing a summary in words is ending up quite long. I think a diagram with annotations may be the most useful. We had some, but didn’t keep them up to date as the design evolved over the past year. How about we post one here specifically to answer your question, and if that seems useful then add it to the PEP?

1 Like

They clarify the position the PEP is taking, but I’m not sure how much they change my position. Also, I appreciate that my view on vendoring seems to have changed, but in reality it’s simply that I am now considering additional factors that weren’t as obvious before.

Maybe if there were some concrete examples of what plugins would look like, and how they would be vendored, I would be better able to articulate my concerns. But I’ll do my best here.

  1. Are we genuinely OK with changes to plugins (even security fixes!) only being available on a 3-month release cycle, and only by upgrading your copy of pip? Given the security implications of running 3rd party code as part of wheel installs, that seems like it would be a concern to me.
  2. Who will review plugins to ensure malicious code hasn’t been introduced? This is the reason I don’t think variant plugins are the same as our other vendored libraries. The pip maintainers do not review vendored updates, we assume that the vendored library’s normal review and testing processes (as well as use in other, non-vendored, contexts) will pick up any issues. Given that variant plugins have no practical use outside of being vendored into installers, and they don’t have a “non-vendored” user base to catch issues, we’d be basically bypassing all of the protections we rely on for other libraries that we vendor into pip. I’m not happy with that, as any incidents would (not unreasonably) be blamed on pip’s review and checking processes.

Also, the PEP doesn’t say this, but it strongly implies that plugins will only be available in Python. If that’s the case, how will uv (a project written in Rust) vendor them? Will they need to run them via a Python interpreter? And assuming so, what are the implications on uv having to call plugins via Python during the wheel selection phase? Because that’s quite an overhead, I imagine the uv team would be keen on aggressively caching results of plugins[1] - does the PEP enforce requirements on plugins that allow such caching? For example, I see nothing in the PEP that requires a plugin to return the same values every time it’s called - obviously, this is a reasonable assumption to make, but we have numerous examples of existing standards that omit such guarantees and as a result, we have edge cases where caching gives the wrong answer. (One such example is even quoted as a technique in the PEP - caching can break the “variant selection via sdist” approach).

The prototype implementation for uv mentioned in the PEP doesn’t seem to be complete enough yet to run benchmarks to assess the performance impact of the variant selection code (both where variants are picked, and where variants are present but not selected). I’d really like to see some performance figures before we agree on the proposed design, as it’ll be very hard to change the design later if it turns out to impose unacceptable overheads.


  1. they get a lot of their speed from caching ↩︎

Maybe the abstract talk about provider makes it hard to understand what’s being proposed: One big goal for the PEP is to support selecting the right GPU for package. What we’re proposing is effectively that pip adds code to detect GPUs and uses this information for wheel selection. This code can be shared and reused by package managers implemented in Python, similar to packaging.

The code for the GPU providers for the current main GPU vendors (as in, those supported by pytorch) is on GitHub, it shows what code we’re talking about. Developers from GPU vendors created them or participated in authoring them, there’s wider community involvement, this code doesn’t need to written by the pip maintainers.

There’s also other providers in the GitHub org that show what functionality can be implemented. Note that these are of course all prototypes for the PEP (just like the uv branch), once we have an agreement on the design we can polish them to the desired quality.

My assumption is that we’ll eventually have one shared, pure python GPU provider that can be vendored by pip/poetry/pdm/you name it. I don’t foresee any security vulnerabilities specific to those providers, they should be much less security critical than let’s say the network stack or unzipping, given that they only read system information, but don’t process user inputs.

The PEP intentionally is not more concrete about the providers, we don’t want to specify something now only for it to be insufficient in a few years again when there’s new hardware. My hope is that we can treat the default provider set similar to e.g. manylinux platform support, where we coordinate on DPO/packaging.python.org. Currently, I see two providers that I’d include by default/vendor: A GPU provider, with nvidia/AMD/intel support, and a CPU extensions provider with x86_64 and aarch64 support. The goal is that we can fill this big gap in the ecosystem with the minimal effort on the package management tooling side that allows cross-tool support and that doesn’t lock us in into specific hardware assumptions that may change in the future.

uv will reimplement the providers it “vendors” in rust, similar to how it reimplements packaging libraries and specifications. Note that unlike source distributions, provider output is global (system-wide), and there should be few providers, so the overhead is much smaller than e.g. source distributions with dynamic metadata.

Can you be more specific about what case your thinking about wrt to performance? For a standard uv lock, this shouldn’t change the performance measurably, for uv sync, it’s an extra download of one immutable, small-JSON file per variant package, for uv pip, it’s an additional request for each package version of a variant package. For non-vendored providers, there’s a large overhead for going through a Python interpreter, similar to a prepare_metadata_for_build_wheel call.

Maybe. But conversely, maybe the fact that the PEP talks about providers in such abstract terms, opens up questions and issues that wouldn’t be so difficult if we were only talking about one or two specific cases.

It’s always a balancing act between generality and precision.

So here’s a suggestion - why not add the providers to packaging? It seems like it would sit alongside the platform tag code just fine, from what you’re saying. Maybe that’s not a reasonable suggestion, but if we can establish why it isn’t, we might be a bit closer to understanding what you’re asking of the pip maintainers.

Also, I’m uncomfortable about thinking of this in terms of “effectively asking pip to add code”. We’ve[1] spent a lot of effort trying to ensure that our standards allow for multiple implementations, and while I’m currently responding very much with my “pip maintainer” hat on, from a standards perspective, I do have to ask how all this will affect other tools that need to identify compatible wheels. For example, an in-house audit tool that looks at a set of requirements, determines what wheels would be installed, and reports them for review. Or a mirroring tool that ensures that all 3rd party packages needed by a set of internal projects, and downloads the necessary wheels to a local index. Yes, maybe all such tools just use something like pip install --dry-run --report. But we can’t assume that, and it’s explicitly a goal of our standards process to avoid requiring it.

Just out of curiosity, will uv support dynamically loading providers written in Python as well, for cases which it doesn’t vendor? Or will you simply not support those? (Does the PEP even allow not supporting anything but vendored plugins? I haven’t checked that detail.)

Speaking in terms of pip’s resolver, the way we work is roughly that when we select wheels, we fetch the package index page (using the JSON API) for a project, and we look at the available wheels. We can tell which ones are valid on the current system by looking at the wheel filename, and the “python-requires” metadata (available from the index page). So that one fetch is all we need.

My understanding of the variant system is that we’ll now need to fetch a variants.json file for every version of that package that we’re considering, and for each wheel in that version, fetch the packaged variant.json to see what variants the wheel supports. For projects with many versions, or many wheels per version, that could be a lot of extra fetches. And even if the HTTP cache covers some of these, it’s still a lot of extra work. For some problematic cases in pip, such as resolves that backtrack through botocore[2] we can be checking many thousands of versions, each with multiple wheels. The overhead will mount up in cases like that.

Maybe I’ll turn out to be wrong - I’ll freely admit that I’m mostly speculating here - but I’d like some reassurance that the design has considered possibilities like this.

At this point, I feel that I’ve expressed my views. Rather than dominate the discussion with one person’s perspective, I’m going to step back for a while and let other people weigh in.


  1. both as a community, and me personally ↩︎

  2. A particular “bad case” :slightly_frowning_face: ↩︎

1 Like

Sure - there’s no reason for this not to live in packaging (assuming the packaging maintainers are interested).

This was written in reply to the concern of the burden this PEP puts on the pip maintainers. For other tools, it depends on what you’re implementing. If tool just e.g. filters to py3-none-any and manylinux* cause you let’s say deploy to a docker container, not much changes. If you want exact mirroring, speaking as someone who has reimplemented this stack, this is already highly non-trivial and not routinely implementable without library support (and similar to other PEPs, we aim for good library support). That is to say, I don’t think the addition of variants changes this a lot this of non-packaging tools.

There’s one {name}-{version}-variants.json per package version (not per wheel). So for packages that use extension modules and have variants to these extension modules [1] [2], it’s correct that you need to download an additional JSON file from the index to select the wheel, and that this has a performance overhead. This file is immutable like other files on the index (it even has a hash), so it’s immutably cacheable, but the performance overhead remains for the cold cache fetch for the packages that do use variants. This is something we’re very interested in whether it could be optimized! Currently, we’re trading future-proofing and flexibility for this extra fetch.


  1. Pure python packages could use variants, but I don’t see why, here we’ll assume they don’t ↩︎

  2. Many extension modules won’t use variants, e.g. they don’t use GPUs or they use runtime dispatch for vector instructions, or they just aren’t that performance critical ↩︎

1 Like

Yes, we did. Basically, this will work as long as tools check for the correct number of components, and that the build number component starts with a digit (if present). I was quite surprised to see that all the tools we’ve inspected actually were strict about the latter point.

That said, I don’t think we’re particularly married to that design. If circumstances arise that require changing the wheel name, it should be easy to change in the spec (at the cost of breaking compatibility with the prototypes).

Well, strictly speaking the provider has no idea what wheels are available, so in the end it’s the installer doing the choosing (and the provider providing a list of supported runtimes).

This would be a really bad security practice. Having to explicitly accept providers when installing any variant package would quickly result in users “blanket accepting” any providers, and therefore make the whole opt-in process pointless.

This was merely given as an example, but I think it’s fine to remove that. I agree with your point.

I feel like we’re moving in circles here. The previous approach to the PEP merely pointed out the potential security issues and left deciding how to proceed to the installers. This was met with strong opposition, to the point of people assuming we weren’t sufficiently concerned about security (even though it was all there). It’s already “toned down” by making it a recommendation rather than a requirement, I don’t see how we could go down from there.

I don’t really understand why changing this would actually prevent people from “naively implementing” this. Are you assuming that “naive” implementation won’t cover anything below “SHOULD” level?

Yes, I suppose that’s one option for the users.

Let’s ask the converse question: what should happen if the tool chooses not to support variant wheels and does not inform the users about it? The way I see it, users get suboptimal and potentially surprising results.

Say, users want to install torch, but the tool doesn’t support wheel variants, so it falls back to the version that doesn’t work for them. I think it’s better that the users get a clear message “we do this deliberately”, rather than get confused why things didn’t work as expected, and report bugs to the PyTorch project and/or the tool.

1 Like

This confuses me. Why would a tool not support variants if they are standardised? I’ve never seen a packaging standard before now that tries to say what tools that decide not to support it should do. Can you give an example of a tool that would deliberately choose to not support variants? And explain why the community would continue using such a tool?

Well, we obviously don’t expect tool maintainers to go back in time and to prepare their tools for the specification that didn’t exist yet. This section is specifically intended to address future versions of tools that deliberately choose not to implement the PEP.

Well, obviously you can’t both have the cake and eat it. It’s inevitable that adding additional functionality will have performance implications, though we did our best to keep them minimal. Please also bear in mind that variant wheels are likely to be only used by a reasonably contained subset of non-pure Python packages, so we aren’t really talking about adding overhead to every single package query. Admittedly, in some scenarios (think dependency trees involving PyTorch) there are likely to be multiple packages with variants involved, but the overhead needs to go somewhere.

At least my demo implementation recognized the file from index by filename, much like pip did recognize wheels at the time. We’re definitely open to adding something here, but bear in mind that the specification was specifically meant to work on “dumb indexes” much like installers support today.

I don’t really understand this point. Does pip currently have a policy of not shipping security fixes for its vendored dependencies outside the 3-month release cycle?

This is a fair point, and I was not aware of this policy. I think our main assumption was that these plugins will be relatively simple code-wise and not changing frequently, so at least eyeballing them explicitly would not be a problem.

I’d personally lean towards a separate project in pypa/ namespace, if only to avoid arbitrarily putting additional tasks on packaging maintainers. In the end, I think that’s a reasonable approach here: have a single repository with trusted maintainers who will be responsible for preparing a standard provider plugin bundle, and have installers either depend on it or vendor it. If I understand correctly, that would fit into pip’s vendoring model.

2 Likes

Not as such, but the problem is that we don’t have any form of “maintenance branch”, so our options for security fixes are limited. I don’t think there’s ever been a case of a security issue where we weren’t in a position to wait for the next scheduled release for the fix, so we’re in unknown territory here.

If variant plugins are not expected to have significant security risks, then I think pip’s situation is fine. I just have a hard time reconciling that expectation with the discussion in the security implications section of the PEP…

The text in the PEP is vague but I can remember what it relates to from previous long discussion threads. Let me restate the point it is making explicitly in terms of pip.

In general it is known that pip install <arbitrary text> is not secure because if there are sdist-only packages then pip downloads and builds them and that means arbitrary code execution. The primary thing that makes pip install foo safe is that pip guarantees to get what is on pypi under the name foo and then pypi guarantees that only certain people are allowed to control the files under the foo name. This can go wrong though if there is a supply chain attack like someone gets hold of the foo name on pypi or one of foo’s dependencies.

Some people want a bit more security and can use pip install --only-binary=:all: numpy in which case pip guarantees to install only wheels and does not run any arbitrary code. In this case in the event of a supply chain attack pip install --only-binary=:all: numpy might install malicious code but it would not execute malicious code as part of running pip itself. Someone could take advantage of this to do something like:

  • Use pip install --only-binary=:all: foo on machine A to setup some packages.
  • On a separate machine B with lower privileges (perhaps in a sandbox) execute code using the installed packages.

Now if there is a supply chain attack B can run malicious code but A will not. However if pip runs arbitrary providers when installing wheels then a supply chain attack also places A at risk of arbitrary code execution.

If the providers had always been there then this would not be any new security issue and just the person using --only-binary=:all: would be using different options like pip install --no-arbitrary-code or something. However if people are currently using --only-binary=:all: with this security expectation then a change to allow arbitrary provider code execution when installing wheels is a backwards compatibility break resulting in a security regression for systems that might currently be assessed as secure.

This is why the PEP addresses this:

the proposal explicitly requires that untrusted provider plugin packages are never installed without explicit user consent.

That basically solves the security issue but leaves us with another problem that users cannot just pip install torch without some extra steps because they need the providers. The PEP proposes to solve that by having pip vendor the providers that are needed for the most popular Python packages.

The security risks here are not to do with the vendored providers. The risk is the non-vendored providers and the PEP proposes just to require that a user opts into those. The “explicit user consent” could have many forms but perhaps:

$ pip install foo
...
To install foo we need to install and run the x86_64 provider.

Do you want to install and run the x86_64 provider? [y/N]
3 Likes

Is this supposed to be --only-binary?

Yes, thanks. I edited to correct that.

I never actually use --only-binary myself so --no-binary is just the one that I am more familiar with (and they are both bad option names).

1 Like