PEP 825: Wheel Variants, Package Format (split from PEP 817)

Following the suggestions given in the previous PEP 817 thread, we have decided to split PEP 817 into a series of smaller PEPs, with the hope that this will make it easier to comprehend the concept and discuss it. We have opened a pull request for the first PEP in series, focusing on the changes to the wheel file itself.

Pull request: PEP 825: Wheel Variants: Package Format by mgorny · Pull Request #4819 · python/peps · GitHub
Preview: PEP 825 – Wheel Variants: Package Format | peps.python.org

For more high-level introduction to the topic, please see the previous introductory post. This PEP specifically focuses on the low-level details necessary for variants wheels to work, that is:

  • adding variant label to the filename
  • storing variant properties in the file
  • exposing variants on the index
  • ordering/selecting variants
  • introducing variant-conditional dependencies via environment markers
  • exposing variant wheels in pylock.toml

The PEP keeps variant properties abstract, deferring their governance and determining their compatibility to a subsequent PEP, along with building wheels. We’ve also significantly cut motivation down (the original is kept in PEP 817 for reference). We’ve tried to make the “specification” part easier to comprehend, and removed the duplicate “rationale-overview”, in favor of a more focused “rationale” section.

Compared to the previous iteration of PEP 817, we’ve also corrected the variant ordering algorithm to handle corner cases better.

We’d really appreciate your feedback, both concerning the technical aspects of the proposal, as well as anything that needs to be clarified or explained better.

We haven’t decided on the exact content of next PEPs yet. We feel like this new PEP is at the right level of granularity, but we’d like to learn from the feedback before deciding on the next ones.

10 Likes

Thank you for the work you’ve done splitting out this part of the spec. This PEP is a huge improvement over the previous one. I’ve read it all (although I had to skim the second half, as I didn’t have time for a full read right now) but my first reaction is that this is a very reasonable and well-written proposal.

I’ll need to go through it in more detail, and I’m sure questions will arise in the subsequent discussions, but it looks like a solid PEP at this point.

4 Likes

If multiple related subsequent PEPs are planned, it might make sense to reserve consecutive numbers early, makes it far easier to refer back to them. See PEPs 634, 635, 636.

1 Like

Small question

It is unclear to me if in *.dist-info/variant.json and in the index level {name}-{version}-variants.json I have to, am allowed to, or am not allowed to have a variant listed named null (assuming I have a null variant)?
For the dist version, it says that it should contain the label of the variant wheel it’s in, and for the index version, it says it should contain all variants (which include the null variant I’d think).


In #variant-ordering you specify:

For every namespace, the tool MUST obtain an ordered list of compatible features, and for every feature, a list of compatible values.

The list of values should also be ordered.

Thank you for all the work on splitting out this PEP from 817. I found it much easier to follow and understand the proposal better.

1 Like

I agree with the previous posters that this new version is much clearer! I especially like the variant markers section, it should make it much easier to select appropriate wheels for specific systems.

I’ve got some questions about the variant.json which I think all stem from the requirement that it be frozen, which seems to lack a clear justification (which should probably be added):

  1. What should happen in the presence of multiple indices, especially if the variant.json differs between them (e.g. you have PyPI and piwheels)?
  2. For locally-built wheels, what should the variant.json have (if anything)?
    1. A follow up is how this would interact with private/non-PyPI indices where there are additional builds outside of the set that maintainers have uploaded to PyPI (e.g. for newer hardware/baselines or where PyPI is not yet ready for an architecture for which multiple variants would make sense; I’m thinking here about RISC-V where there seem to be lots of different profiles, some of which may be “weaker” that the riscv64 manylinux baseline).
  3. Why is the ordering per package, rather than a global setting? I would imagine for features like BLAS you want to be able to configure this globally (and block specific variants like mkl) rather than leave this to the package maintainer.
  4. Why lock the hash of the variant.json, rather than including the contents directly within the lock file (which would mean the user of the lock file would not care if the variant.json file on the index changed)?

If a tool supports installing wheels from two registries for the same version of a wheel, it should also select the variants.json from the registry of the installed wheel. Or even simpler: If you merge two registries internally, also merge their variants definitions. There are a lot of implementation questions involved, but they can all be solved on the tool level.

Not for PEP 517 build (for now, to not break tools), but if you explicitly request a variant, you can build one. Building is for a future PEP :wink:

Is that different from registry merging?

The are two different levels of selection: The per-package ordering is the default, it applies if you just do a pip install foo and ensure that it installs the same best wheel for everyone, allowing maintainers to express preferences. Globally, you can globally exclude mkl (up to tool author what exactly they want to support here) and this will apply to all tool resolutions.

This isn’t a strong decision, but lockfiles can already be large and this file felt too large. For example, exporting the airflow workspace to pylock.toml results in a 1.8MB lockfile with 10k lines, the size of this is already a bottleneck. On the flipside, this of course means additional network requests on a cold cache.

2 Likes

I prefer the current decision in the PEP and agree with this that lockfiles can already be large enough to bottleneck. I think taking the small network hit on cold cache is an easier price to pay over a bottleneck for lockfile size.

The idea is that if you publish a null variant, then its *.dist-info/variant.json has a "variants”: {“null”: {}} entry, and therefore the index-level metadata has such an entry as well. If you don’t publish it, then it’s technically valid to include it in index-level metadata (as the spec doesn’t technically prohibit listing variants you didn’t end up publishing or removed somehow), but normally you won’t. In other words, it works the same as any other variant, except for explicit requirement for that label being used for an empty set of properties.

Do you think we need to make that clearer in the PEP? I think the current rules imply that, but indeed it may be better to make that even more explicit. Imply as in:

  1. Every variant wheel must have a label and a metadata file.
  2. The in-wheel metadata file must have a variants object with its label-to-property mapping.
  3. The index-level metadata file must have a variants object for all variants published.
  4. A variant with zero properties must have a null label.

Made that explicit, thanks!

Perhaps this is something that needs to be clarified in the PEP, but the algorithm covers the default ordering, i.e. in absence of user preferences. We’re trying not to impose specific UI considerations on tooling, but the general assumption was that tools can provide ways to override the ordering or let users select specific variants.

I’m slightly confused by this statement. Are we talking about variant.json file in wheel, or index-level {name}-{version}-variants.json?

If the former, then it’s as “frozen” as the wheel itself.

If the latter, then the specification does not technically require it to be frozen (as this is not something we can really enforce with custom indexes). However, we do suggest that it’s frozen, and in the same sentence we give the justification:

The file should not be changed once it is published, as clients may have already cached it or locked to the existing hash.

From what I gather, the ordering in a namespace is global, but packages select which namespace is more important and can override (parts of) the ordering within each namespace for the selection of their variants.

From the 2 other responses it seems that a tool is allowed to let you override the ordering that the package has selected, but that the PEP as written seems to contradict that.

I think this should be clarified, yes.

Extracted from a review thread on the PEP PR:

I’m personally more concerned about unreadable variant names than slightly longer wheel names, so I thought I’d bring this up for wider visibility (I’m not advocating for no limit either of course; but 32 instead of 16 for example).

Otherwise, given that we already run out of space with 16 characters for “simple” examples (combining 2 axes), projects that run into requiring more axes will have to – effectively – reinvent the variant “hashing” that was already ruled out too as user- and maintainer-unfriendly.

7 Likes

The PEP currently says

I’d also consider disallowing variants of 1 character, and perhaps those starting/ending with . or _ as well. Otherwise, the following would all be quite non-obvious variants when looking at the wheel names

...-{platform tag}-x.whl
...-{platform tag}-_.whl
...-{platform tag}-..whl
...-{platform tag}-.a.whl
...-{platform tag}-1_.whl

The regex could then be ^[0-9a-z][0-9a-z_.]{0,30}[0-9a-z]$ or similar.

2 Likes

Even if I share the intent of making the variant-label readable & understandable, I don’t believe this proposal a good idea.

It’s making the regex significantly more complicated (which makes it harder to understand) and it’s not a good “fix” for the problem you highlight. There are millions of ways to make the variant-label hard to read: ["a.......", "aaaaaaaaa", "a_________", "a._._._"].

Honestly at some point if a package maintainer wants to make it unreadable, the only people they frustrate is their own users. Common sense would tell you not to do that …

If people wanted they could already use ridiculously long package name or repeating letters, I don’t believe it’s a “standard dark pattern” being widely problematic in the ecosystem. And we didn’t put in place complex package regex in place to allow/reject a package. There is one and it’s trying to be reasonable.

The proposal you made just feels to me like “making things harder while not really fixing or significantly reducing the problem in any meaningful way”, just moving the goal post a few inches/cms.

Might be wrong, but I really don’t think this is a positive trade-off here.
Yes people shouldn’t do that (fairly obviously why), but if they do, well … it’s on them and people will file github/gitlab/etc. issues.

Maybe the PEP could add something along the line of package maintainers SHOULD use variant labels that carry meaning for their audience. And I would probably leave it at that.

1 Like

If that’s the only worry, then I have no problem with that impact. The regex is for the spec and maintainers, it can make sense to pay the “price” of it being a bit more complicated[1], in order to achieve more user-visible benefits elsewhere.

People can and will disagree about the trade-offs :smile:

I know it’s arbitrary, but I’ll still note that 3 out of the first 4 “bad” examples you reached for would be rejected under my proposal. :slight_smile:

My point was less about intentional obfuscation though (I agree that this cannot be prevented), but more about making the “variantness” in the wheel name harder to overlook accidentally.

Sounds reasonable in any case!


  1. BTW, I think even my longer variant is super-simple for anyone who had to deal with regexes more than superficially ↩︎

I think if we make “simple enough & common sense” (which is what you are proposing essentially): It has to start and finish by a letter / number, maybe it’s reasonable enough. With a strong recommendation of “please be reasonable …”, if people don’t want to be reasonable… Well you said it yourself… Nothing you/we can do…

Now 30 total length is way too long it’s gonna cause trouble (windows has pretty strict requirements for maximum path length).

In general yes, but if we specify something like the regex I proposed to validate, then

...-{platform tag}-_.whl
...-{platform tag}-..whl

could be rejected from ever getting uploaded, which is IMO one corner-case less that other various tools (much less humans) have to deal with.

OK, with windows mentioned I can finally see a rationale emerging. Windows paths need to be <=260 characters including all containing folders (though that limitation can be removed from a system by anyone with admin access).

Taking an example from the PEP and expanding the variant label

>>> len("numpy-2.3.2-cp313-cp313t-musllinux_1_2_x86_64-x86_64_v3_openblas.whl")
68

I really don’t see how we’re close to problematic lengths here, even allowing for some existing folder hierarchy.

As someone who has windows systems and has run into these problems, this is something to get used to / work around[1], or bite the bullet and enable long paths. It does not seem like a strong enough constraint to me to justify capping variant labels at a length that’s not even expressive enough for minimal examples.


  1. e.g. move out in the folder hierarchy for whatever operation you want to do ↩︎

1 Like

I can guarantee you this is an existing problem. It’s not just the filename, it’s the full path.

uv and poetry folks have bumped into this a few times actually.

I don’t think going much beyond 16 is safe personally … I don’t have personal & direct experience with the matter. But 16 is enough IMHO and reasonable without challenging the Windows gods to a stare contest.

I understand that it’s about the full path.

The problem exists with or without variants; it suffices for users to have deeply nested folder hierarchies. Such users will continue to run into problems regardless; it’s not a reasonable design constraint. Besides, library maintainers who get bugs about this can publish shorter labels. It’s not the role of the spec to over-constrain this based on such an amorphous limit.

As a professional abyss gazer (for windows and otherwise): bring it on :rofl:

4 Likes