PEP 817 - Wheel Variants: Beyond Platform Tags

I’d be careful with statements like that. It is inevitable that in every discussion some groups will be overrepresented, while others will be underrepresented. However, I don’t think that it’s fair to claim that the result was skewed if we have no way of measuring that. It may be as well that many people having the same concerns didn’t voice them, or did not participate in the thread at all. That doesn’t make their concerns any less valid.

In the end, we are working on a consensus that tries best to address the concerns of all the stakeholders, irrespective of how verbose they are. Admittedly, the compromise we arrived at may not be perfect, but I believe it arrives at a reasonable balance between security and user experience. It also provides a reasonable degree of flexibility in implementation: vendoring, reimplementing, keeping an allowlist of safe provider packages all fit the bill.

2 Likes

It’s not. I’ll come back to this point.

First of all, I’d like to say that I think the focus on vendoring here is becoming a distraction, and we risk ignoring the rest of the proposal as a result. PEP 817 is a big, complex proposal, addressing an important problem, and it deserves thorough and careful review. I genuinely hope that we can find a good solution here, but we’re not going to do that if everyone burns out on one aspect of the discussion before looking at anything else.

As I understand it, the “vendoring” discussion is looking at one specific part of the puzzle here - namely that this proposal changes the existing expectation that selecting a wheel from a set of candidates can be done statically, without any need to run “3rd party”[1] code.

That’s a trust issue, and as such is in some very fundamental ways, not a technical question. We need to understand what end users (and in particular, security-focused end users) consider to be trusted code in any of the various contexts where “selecting the correct wheel” is the problem. Installation is the obvious candidate here, but other possibilities exist - auditing and code review tools are another one, for example.

The problem with seeing “vendoring” as a solution is that it suggests a change in who is being trusted - the trust users place in the installer (for example) is being extended to the packages that the installer vendors. But this is mistaken - not all vendoring models extend trust to cover vendored code, and pip’s in particular does not. For pip, vendoring is purely a distribution mechanism to handle bootstrapping - vendored packages are neither more nor less trustworthy than normal dependencies[2].

IMO, vendoring is irrelevant here. Whether pip (or any other tool) vendors variant plugins should make no difference to the trust model, or to the PEP. It’s entirely a tool-specific implementation detail, and must be treated as such.

Clarifying some of my earlier points (thanks to @rgommers for fishing these out of the old thread), while acknowledging that my thinking may have changed since I originally posted them:

Speaking personally, I’d be against including the code (in the sense of being responsible for it ourselves) but vendoring is much more plausible.

Note that I’m using vendoring here in direct contrast to “being responsible for it (the code)”. Specifically, vendoring “is much more plausible” because it doesn’t imply responsibility. That matches my comment above, that vendoring shouldn’t suggest trust.

Your points all apply, but if I’m trading them off against the risks and problems around adding a mechanism whereby pip downloads and installs plugins on demand, based on package metadata, then I’d be willing to consider vendoring.

Again, this is a technical view. Vendoring is simpler for pip to implement, and therefore involves less risk (of bugs in the implementation mechanism). It doesn’t change how pip treats plugins in terms of ownership or trust.

I think the proposal should require specific designs from installers. […] should selectors be downloaded and invoked dynamically, or should they be static based on a fixed whitelist?

Note that this isn’t looking at vendoring directly. It’s pointing out that the PEP should specify a design that solves the trust issues, and tools implementing the PEP should follow that design in order to ensure that they correctly handle trust. From what I recall, this statement was in response to the idea that it should be a tool choice how to handle trust (i.e., tools should make the decision as to whether dynamic invocation of plugins is safe). I don’t think the PEP can delegate a question as important as trust and security to individual tools.

Vendoring is an implementation choice orthogonal to questions of a trust-focused design. This is because, as I noted above, projects can have differing vendoring policies, with wildly different trust implications. Pip accepts no extra responsibility for vendored dependencies, whereas uv’s version of vendoring is reimplementation in Rust, which naturally takes full responsibility for the resulting code.


  1. Whatever that means ↩︎

  2. And indeed, some of our redistributors devendor pip - a practice that we don’t support, but we do acknowledge the existence of. ↩︎

1 Like

Agreed. But it’s important that the PEP authors manage the discussion to ensure that all stakeholders have a chance to express their views. I’m not suggesting the authors aren’t doing that, but it’s not easy - and often a little extra effort can reap significant benefits.

Some concrete suggestions:

  1. Postings on social media pointing interested parties to this discussion. A lot of people don’t follow Discourse, and even those that do, might think they aren’t going to be impacted when they are[1].
  2. Explicit outreach to specific groups/individuals. Examples here would be the Poetry and PDM authors (both of whom implement their own resolvers, I believe), and security specialists (both for their views on the security questions here, and to offer input on whether tools like audit scanners might need to understand variants).
  3. Regular posts summarising the state of discussions and the list of open questions. This might make it easier for new participants to catch up without having to read hundreds of posts, or repeat old discussions.

Speaking as one of the “verbose” posters, I’m glad you’re aware of, and actively seeking to address, this issue. But I do keep an eye out for the more infrequent posters, and frankly there don’t seem to be that many participating here, yet.


  1. Opt-in prompts being an obvious example ↩︎

1 Like

OK, but what’s the solution for projects that still need to upload new variants after the initial release? Immutability will be a real issue for them.

This is orthogonal to the specification. Nowhere it says anything about the immutability of the JSON files. If anything, this is entirely an implementation detail of the index server. While our goal was to give some leeway to index servers, particularly by not requiring any specific implementation effort beyond allowing upload of a JSON file, I don’t think we should go out of our way to invent workarounds for a corner case of uploading new variants of existing package versions on an index server that permits publishing the JSON file, but doesn’t permit updating it.

In fact, I dare say that a complete server implantation would process the wheels as they are uploaded and automatically generate or update the index-level JSON file.

But if the spec requires that the index server allows the file to be updated as new variants are uploaded, that’s a requirement on index servers that support the PEP, and needs to be part of the spec. Conversely, if updating isn’t needed, the spec needs to cover how installers handle cases where the file isn’t up to date with the available wheels.

1 Like

As far as I know - at least for pypi - when a file is being upload a series of check is being done.

Well in a variant world, a few steps would be added, namely verifying that “the variant configuration of the new file” can extend not conflict with the existing one. In other words you can add to the variants.json never change anything existing.

We created a tool with variantlib to do that, if it helps to make it clear I can provide an example.

Now as I was stating before this entire behavior was left out of the PEP for 2 reasons and on the index to decide whether they would support for 2 reasons:

  1. pypi uses some pretty extensive caching through the CDN. From what I understand, I’m really not sure it’s scalable for them to allow mutability

  2. As I was mentioning before publishing a new architecture means almost always adding a new set of dependencies. And that will break installers like poetry / uv. So it really should be avoided.

There is no “technical limitation” to publish a variant after the first wave of publications. We did not feel it was our role to specify that corner and more an index decision to support or not. It’s possible but there are good reasons not to allow it.

I think it is broadly agreed that it would be better if installers did not attempt to build from source e.g. if pip defaulted to --only-binary=:all:. There is a pip issue about this. Part of the reason it would be better is security but actually the more impactful reason is the poor UX that so many end users get when pip tries to build something that is not pure Python. As I understand it if there were not backwards compatibility concerns then there would not be much debate about changing the default behaviour. The difference here with variant providers is just that they are a new thing and so not constrained by backwards compatibility to have an insecure-by-default behaviour.

Importantly for this PEP from a security standpoint what you need to get across is that this will not make anything significantly less secure than the status quo. You need to convince people who are concerned about security that you are considering these things carefully and that you do care about their concerns. This is a large and complex PEP and it is difficult to model the whole thing mentally so anyone trying to understand this wants to be able to have some trust in the authors of the PEP that they were considering things like security carefully in the design. Right now you are giving the impression that you just don’t care much about this and that you dismiss the views of people who do as invalid which is not helpful for building confidence in the PEP.

7 Likes

Actually, I’m sorry, I got ahead of myself and missed the fine details. Let’s roll back a little.

In order to create new variants, you need to modify the sources. If you modify the sources, you are effectively creating a new version. You really oughtn’t start publishing new wheels with the same version number built from different sources (and PyPI won’t let you upload a new source distribution anyway). This is a moot point.

1 Like

I still don’t understand this. Please explain, in as much detail as you feel is necessary to get through my obvious misunderstandings, what a package developer should do if they find themselves in a situation where they have published a set of wheels (including variants) to PyPI, along with the appropriate variants.json, and they now need to publish an additional variant wheel, for something that’s not covered in the existing variants.json.

Saying that PyPI doesn’t allow that doesn’t answer the question - it simply ignores it. We can’t realistically expect package developers to pick a new index server in this case. And “don’t do that” isn’t an answer either. It’s equivalent to saying that the PEP doesn’t support this scenario, and if that’s the case the PEP needs to explicitly say so, not duck the issue by saying “it’s up to the index”.

1 Like

This actually isn’t a moot point. Whether you should or not is different from whether you can and the potential impacts this has on the ecosystem.

I’m in a somewhat privileged position at work, we run our own internal index, and we have people who actually review things that get pulled in. So I get to sit here and say we wouldn’t be as immediately impacted

But as supply chain attacks become more common, I think this is bound to be a used property that was discounted by those who just thought “well, you shouldn’t do it anyhow”, rather than “what happens if someone does?” at some point.

This seems like a way that would be relatively easy to somewhat stealthily impact packages that pinned their dependencies on exact version + used --only-binary:all:, but didnt go so far as to lock hashes.

This would be solved by making releases immutable, but we don’t have that guarantee currently and the pep working on that has been ongoing for a very long time.

I don’t see vendoring as solving this. Better options were proposed in prior rounds, like TOFU for providers, as well as a way for users to pre-specify trusted providers. These options handle where users are placing trust for which components better, and also don’t place a 3 month delay on updating providers.

I think the rate at which hardware is supported is a red herring here, but since the authors have argued this is essetnial to be supported, vendoring works against the author’s stated interests as well.

It’s stated as a problem if one library uses one implementation and another users another. the problem isn’t explained in a way that indicates why existing solutions aren’t enough and expects that people who aren’t working on those problems just accept that it is a problem that can’t be solved without it.

This is exactly the problem I’m talking about. There’s no detail, the authors don’t want to actually get into enough detail for others to see why it’s a problem that should be addressed and package ABIs like this weren’t supposed to be a detail that leaks in wheels. The whole point of wheels was bundling and isolating the binary dependencies so that they were somewhat portable. A package relying on another package having the same internal detail isn’t doing that even if we add a coordination layer for that on top with variants and providers.

@mikeshardmind has already said some of what I would have here. Vendoring as a solution is a misunderstanding of the concerns that were presented. Maybe they weren’t communicated as clearly as they could have been, I don’t know how to solve that part of it. Many trusted authors writing and maintaining packaging tools had already told you they aren’t the experts on the code needed to implement variant selection. Asking them to vendor the code and just trust it, and to also update it frequently isn’t actually helping the trust issue, it’s diluting the trust placed in core ecosystem tools.

You also had another issue expressed about running arbitrary code during simply deciding which packages to fetch. While this changes the scope from arbitrary, it doesn’t actually handle all of the reasons this was an issue, and the pep authors didnt actually ask if vendoring would solve it. They went away for a few months, and are balking at the idea that those who do care about security don’t see vendoring as the right answer.

Do you mean 694? That doesn’t make releases immutable.

I think there are two choices, not necessarily mutually exclusive.

  1. You can treat {name}-{version}-variants.json as mutable on the index and allow clients to upload new versions as new variants are published.
  2. The index itself could dynamically update this file on upload of new variant wheels, by extracting a wheel’s dist-info/variants.json file and merging it into the index’s {name}-{version}-variants.json file.

The first requires indexes to treat those files differently than wheels and sdists, which are immutable; the second requires an index to do some work on every variant upload.

Oh. Sorry, I was under the impression that was one of the goals of having upload sessions. If it isn’t, then this is even more a concern because there’s not even a plan to be moving in that direction.

694 makes releases atomic. While a publishing session is still open, in fact the artifacts in that session can be modified[1]. After publishing the session, the artifacts are immutable, just like they are now.


  1. e.g. you use the session’s stage to test your release before publishing, and if you find a problem in one of your wheels, you can replace it ↩︎

I skipped over mentioning this before but it seems that a lot of the contentious parts of the PEP come from this last step. It is not hard to imagine an alternate version of the PEP where it has the variants and has the provider plugins but everything is just off by default and there are no blessed or vendored provider plugins. The only downside is just that the instructions to install pytorch would need one extra step or flag or confirmation somewhere e.g.:

pip install --enable-providers torch

If you had that working though then you basically have all of the benefit of the PEP and it would solve all of the problems that are listed in the PEP just with the wrinkle that people have to type something like --enable-providers. I imagine that pip maintainers would be a lot happier with this rather than having a PEP that basically says that someone else is going to tell them what to vendor and I also imagine that this satisfies most/all of the security concerns.

If pip were to implement the PEP as it stands then I assume that it would do so anyway by adding a feature flag --enable-providers=yes/no with the default being off at least initially so users have to opt in to the new feature until it is well tested enough to be on by default. Maybe it would be better to have a PEP that just tries to get to that point. Then later when there are variant packages on PyPI and in use and the plugin providers are established the discussion about making some plugins become built-in installer features might be easier when everything is more well established.

1 Like

A global on/off for the feature isn’t enough for the security aspect. ideally, you’d have something like:

pip install --variant-provider variant_package_name

and

pip install torch --use-installed-variant-selectors

Specifically registering it as a variant provider as part of install would be the way to manage that this is something being installed essentially into the package manager’s environment as an extension to use as part of package management, and not updated as part of updating packages.

You would also need a restriction that any variant selectors can’t also be normal packages or something else of that nature to prevent that from being updated automatically by a package including a later version as a dependency, otherwise it undercuts that separation of management.

1 Like

And both of these require a change in behaviour from PyPI. I find it a little strange that the PEP doesn’t make it clear that the requirements of the proposal alter a fundamental and well-known feature of PyPI, that files are immutable. I’m willing to assume that it was an oversight on the part of the authors, rather than a deliberate omission, but it doesn’t inspire confidence that the implications of the PEP have been thought through fully.

This is what I find most frustrating about the PEP and the discussion. There’s very little clear explanation of what changes are being required of tools and services. It’s all buried in implications and options, and it’s left to the reader to work out what it all means. When we do get clear proposals of what tools must do (for example this, and the idea that installers are expected to vendor provider plugins) the requirements turn out to be pretty difficult to accept.

2 Likes

As a technical point, can we not frame all of these suggestions in terms of pip’s UI? The same considerations apply to uv, PDM and Poetry, and it would be helpful if we had some visibility of the fact that we need a UI for all of these installers, not just for pip. I appreciate it’s not always easy - for example, I don’t know the syntax for Poetry or PDM - but someone will need to work out such details before the PEP can be accepted.