PEP 825: Wheel Variants, Package Format (split from PEP 817)

I understood what you’re proposing, what I’m saying is that given everything else is allowed to be mutable and to be added to over time without any sort of up front declaration makes doing that a confusing footgun for people. It’s going to go against all of their expectations.

Adding constraints like that needs to be done holistically, in a way that actually makes sense in a big picture, not just slapping it on a random piece because that happened to be the last piece that was added.

I also don’t believe it actually allows any sort of optimizations to take place given that you still can’t assume other platform tags aren’t going to be used. At least I’m not able to think of any optimization where knowing that a different variant (and again, specifically variants) might be used in the future is actually helpful– but it does impose at least somewhat of a cost.

It’s not even something that is technically possible to require generically. An installer can’t go backwards or forwards in time to see what used to be there or what is there now. I honestly even think the wording in the current PEP is probably too strong here, and the requirement should mostly be around making sure that the current state of the index is consistent (e.g. you can’t have 2 wheels for the same (project, version) that try to define a given variant label differently.

On PyPI that would naturally imply that you can’t change the definition of a label, because deleting files on PyPI is unusual and you can’t re-use a filename once you do delete it which further makes it unlikely people are going to delete them.

On personally managed indexes it might mean that you redefine those labels constantly (particularly as you’re first implementing variants for a project!), but that’s OK because those are your personally managed indexes and you can already do pretty much anything you want.

1 Like

I’ve just opened a PR with two clarifications (related to the discussion here):

Long story short:

  1. We clarify that the values in variants are sets (that get converted to lists as a JSON limitation), and mandate that they are sorted, so that implementations can use simple == comparison on the whole dictionary without having to convert values to sets.
  2. We update the rules for merging variant metadata from individual wheels into index-level metadata to use a simpler “the result is the same irrespective of the order”. This should be much easier to follow than the previous “resolution results within a subset of variants do not change“.

@pf_moore

Hi all, we - the authors - would like to request pronouncement on PEP 825 . It feels like the discussion has largely converged, the sharp edges have hopefully been sanded down, and the PEP seems ready to roll toward a decision.

6 Likes

I’ll take a look and let you know my decision. For personal reasons, my time is limited right now, so apologies in advance if it takes a little longer than usual to do so.

2 Likes

Fantastic ! Thanks a lot Paul, we appreciate the help.

I’ve started looking at the PEP. At the moment this doesn’t constitute a formal response, but it is flagging some initial concerns I have that will factor into my ultimate decision. For now, I’ve only focused on the change to the wheel filename.

Like it or not, the new filename is not backward compatible. There is no formal constraint that a platform tag can’t be numeric, so a wheel foo-1.0-1-none-any.whl is valid. Add a variant label and you get foo-1.0-1-none-any-var.whl, which will be interpreted incorrectly. And yes, I checked this:

>>> packaging.utils.parse_wheel_filename("foo-1.0-1-none-any-var.whl")
('foo', <Version('1.0')>, (1, ''), frozenset({<none-any-var @ 1209214139968>}))

This is simple to fix - the PEP just needs to require that platform tags MUST begin with a non-numeric character (which is fine, because all existing tags follow that rule). But it needs to be made explicit, both in the PEP and in the updated specs once the PEP is accepted[1].

There’s a more general point here as well - the “backward compatibility” section needs to be a lot more through on the impact of this change. The point of that section is not to claim that everything will be fine, but rather to fairly assess what issues will arise from the change, and establish that the disruption caused is acceptable, ideally discussing how the transition will be managed. For example:

  1. Like it or not, code will parse wheel filenames by counting the dashes. The packaging library does. I know I’ve written lots of code like that. And I don’t check the build number is valid[2]. My code will break. Probably not badly, but that’s not the point. The point is, is it worth breaking that code to implement this PEP? Do the benefits justify it? Probably[3].
  2. The packaging function parse_wheel_filename will need to change its signature which returns the parts of the wheel filename, because it might now have to return a variant label. That’s a breaking change to a public function. What’s their compatibility policy? How do they want to handle that API change? Do they have any idea how widely that function is used?
  3. What about the transition period? Will tools that are not prepared to handle variant wheels need to pin their dependency on packaging when the updated version which does handle variant wheels gets released? Will a delay in implementing variants in packaging cause a knock-on delay in adopting the PEP? Should a PR for packaging be a prerequisite for adoption of the PEP?

To be clear, I’m fine with the solution adopted in the PEP. But the questions above need to be considered when approving the PEP, and IMO it’s the responsibility of the PEP authors to consider those questions, and document their conclusions in the PEP, so that I don’t have to :slightly_smiling_face:

To summarise, there are two concrete actions here:

  1. Add “platform tags must not start with a digit” to the PEP.
  2. Update the backward compatibility section to discuss the breakage the filename change will cause. Ideally without making the section a lot longer - most of the paragraph discussing the filename compatibility can simply be replaced, as once you discuss what breakage could occur, there’s no longer any real need to make the point about “most of the time, it’ll be OK”.

  1. Updating the specs will be a non-trivial task - the existing specs are frustratingly vague, and incorporating PEP 825 without accidentally introducing new constraints, like the one about platform tags, will be hard. Someone should be planning what’s needed there, possibly even starting to prepare a PR for the specs repo right now. ↩︎

  2. Nobody actually cares about build numbers :slightly_smiling_face: ↩︎

  3. Although there’s an argument that we should design an extensible change, so we don’t have to go through this again for the next change, and the benefits are that much higher as we’ve “solved the problem once and for all”. I’d accept the response “that’s too complicated to tackle now” for that one, though. ↩︎

9 Likes

Thanks a lot Paul. Give us a few days to send a PR to the PEP. We’ll tag you in the PR in review so that you can see & review the changes

3 Likes

I’ve done my best to address your requests in PEP 825: Address filename compatibility concerns from DPO by mgorny · Pull Request #4890 · python/peps · GitHub.

I’ve started the dialog with packaging authors in `parse_wheel_filename()` vs. wheel variants (PEP 825) · Issue #1148 · pypa/packaging · GitHub (which is also linked within the updated PEP). Long story short, the main idea is that the current API will reject variant wheel filenames (as it does now), and support for variant wheels will involve some kind of opt-in: either using a new API, or passing a flag that says “I support variant label”.

4 Likes

Next stage of my review (again, apologies this is coming in small chunks over an extended period - real life concerns are limiting my available time). I’ve now read through the whole PEP, so this should cover any significant questions from the PEP text itself. I still have to read the discussions on DPO - wish me luck! :slightly_smiling_face:

For clarity, I won’t be reviewing discussions anywhere other than DPO - if there’s any important comments elsewhere, people should copy them here ASAP.

Index metadata

The location of the index level metadata is described as follows:

The exact URL where the file is hosted is insignificant, but a link to it MUST be present on all index pages where the variant wheels are linked.

This isn’t at all clear to me. If I’m looking at the JSON based index format for a project X, how do I find the URL for the X version 1.0 variants file, x-1.0-variants.json? The text “a link to it MUST be present on all index pages where the variant wheels are linked” doesn’t give any indication of how that would be done.

The quoted text seems to have been written on the assumption that the HTML index would be used, but even then it’s not sufficiently well specified. For example, is the variant file allowed to be yanked? The spec (which says “a repository MAY include a data-yanked attribute on a file link”) would imply yes - because this is a file link. Similarly, all of the other attributes allowed for file links are technically valid on a variant file (although apart from yanking, they are pretty clearly pointless or inappropriate, and so unlikely to be a problem[1]).

Variant indexes MAY elect to either auto-generate the file from the uploaded variant wheels or allow the user to manually generate it themselves and upload it to the index.

While it’s clearly intended, the PEP doesn’t explicitly note that indexes which do not auto-generate the variant index MUST block uploads of wheels that use variants not in the index, and MUST block changes to the index which remove variants that are still in use by wheels in the index.

As an example, I can imagine a simple index proxy which presents PyPI to the user, but allows supplementing it with extra wheels held locally, could be caught out by this requirement.

Acceptance process

As a heads up, I intend for this PEP to be provisionally accepted. In general, I’m reluctant to use provisional acceptance as a tool - it’s caused problems in the past - but in this case the PEP is basically useless without follow-up PEPs that define uses for the variant system. Therefore, a key requirement will be to have a process by which we can reject the variant format after the fact, if we are unable to agree on any follow-up PEPs. Provisional acceptance gives us that.

One particular concern here is the same issue that caused PEP 708 to get rejected. In order to be of use, the variant mechanism needs to be commonly available (i.e., not limited solely to PyPI). We can’t expect key packages like scipy or pytorch to adopt variants if that locks them into only being served from PyPI. And more importantly, we can’t expect users to accept such a lock-in. Specifically, commercial providers like Gitlab, Azure, and Artifactory need to be considered, as well as mirroring solutions like devpi. And another issue that came up in the PEP 708 rejection discussions - many users rely on the ability to serve a YOLO local index by downloading packages from PyPI and serving them with python -m http.server. If (for example) key wheels like numpy/scipy and pytorch started using variants, that strategy would no longer work (as the necessary wheel variant files would be missing). The PEP needs to address this issue.

While I acknowledge that this PEP is solely about the package format, I would like to see a document somewhere that walked through the process of implementing a trivial variant - essentially a sort of “Hello, world” example. In order to avoid needing to address the complexities of platform detection, I’m thinking of something like a variant that is only compatible if the install is being done on a Tuesday. That’s not a requirement of this PEP, but it should be available after this PEP is accepted, and before work starts on any of the follow-up PEPs, to demonstrate how the mechanism as a whole works.

Variant ordering

The “variant ordering” section (step 4) says “For every compatible variant”. This confused the heck out of me, as the PEP doesn’t define what the unqualified term “variant” means. Is it “variant namespace”? Or “variant wheel”? Or something else? The algorithm should be updated to carefully avoid using undefined terms (or alternatively, the PEP should formally define what an unqualified term “variant” means, and use it consistently in the text).

As a more general point, I find it very hard to get an intuitive “feel” for the ordering algorithm from the algorithm description here. Having some examples would be beneficial, IMO (especially examples that illustrate any corner cases that exist).

Lock files

Do we have any explicit feedback from lock file users on the way the PEP interacts with lockfiles? I’m concerned that the installation process for lockfiles was explicitly designed to be a single-pass mechanism with no resolution or other complex logic needed. The requirement to resolve variants seems contrary to that design. I know the PEP says tools SHOULD resolve variants, but it doesn’t say what they can do if they don’t resolve variants :slightly_frowning_face:

The pylock.toml includes a (non-normative) section describing how to install from a lockfile. PEP 825 should probably describe precisely how that section of the spec would change when variants are present.

The use case of auditing a lock file to establish what files will be installed should be considered - variants have the potential to significantly complicate that process (especially if variant compatibility involves an arbitrarily complex detection process that isn’t transparent to the auditor).

Installing

I’m assuming that for installation from a set of local wheels (pip’s --find-links option), something like the “installing from a package index” logic should be used, but with the variant data being read from the wheels themselves.

Has any consideration been given to installing from multiple sources? For example, via --extra-index-url, or having --find-links for some wheels, with the rest coming from PyPI? Will installers have to implement some sort of variant properties file merging process to support this? Has that mechanism been prototyped anywhere?


  1. Although if I wanted to be difficult, I could make a case that the spec says that clients that read the variant file must respect any requires-python attribute attached to the file… ↩︎

2 Likes

Not one of the PEP authors, but I would presume this would fall under the same logic as checking supported manylinux versions or OS or Python version?

Let me give some immediate answers, before we discuss how to change the PEP.

Yes, the text has been written with the assumption of the HTML index format. The intent was basically to integrate the additional file with no changes to the index format itself, so that it could be published with absolutely minimal changes to the existing tools[1]. So the sentence could be read as “include it in the same way as you’d any wheel.” But you’re right that we’ve never really considered how this interacts with all the extra metadata fields that could be present for wheels.

The general assumption for the “not auto-generating” use case is that we’re either dealing with a “dumb” index[2], or we’re dealing with an index but we do not want to force the index maintainers to go out of their way to support variants[3]. So while this is something that indexes MAY (or maybe even SHOULD) do, it’s not something that clients can rely on, i.e. from client’s PoV you need to be prepared that you’d see wheels that don’t meet the JSON (hence “If any of the labels present in wheel filenames are missing in the file, assume that the respective wheels are incompatible.”)


Okay, just to clarify: we’re talking about a document that details the process per the whole design (i.e. beyond what’s specified in PEP 825), according to how it’d be defined in the subsequent PEPs? Because obviously PEP 825 alone is insufficient to create a wheel that would be installable, unless you fill in the gaps (like missing provider system) in a non-standard way.


I’m sorry about that, I think I’ve messed that up as I was trying to clarify things. The mental process creating the term was that “variant” corresponds to a single entry in variants dictionary. It effectively meant the “variant label and its corresponding properties”, in the sense that if you have a bunch of wheels with the same variant label, you group them together and sort the whole group together. So perhaps I should change the note above the list to say something like:

For the purpose of ordering, wheels using the same variant label MUST be grouped together, and then individual groups MUST be sorted according to the following algorithm…

And then, say “group” in place of “variant”? Do you think that would make it clearer?


Yes, that falls somewhere in between “installing from an index” and “installing a local wheel”. Do you think we need to detail that one as well in the non-normative section, or maybe add a note in the index subsection?

I don’t think so, no. I think it’s going to largely depend on how you implement multiple source support (PEP 766 comes to mind here). My first thought (that would need to be confirmed by some deeper research) is that it boils down to where the variant sorting algorithm ends up.

So in the “index priority” variant (per PEP 766 wording), you wouldn’t be merging anything at all, since only one index would end up being used per package. I suppose that the “version priority” variant may involve looking at variants from different indexes. However, in this instance I would lean towards “namespacing” rather than “merging”: you’d process their metadata separately and combine the results for the purpose of ordering. Then you can handle the case when different indexes incidentally use the same label to mean different property sets.

In an exaggerated example: let’s say there are two pytorch-2.11.0-*-gpu.whl files: one on nVidia index, other on AMD index. Both have the same version, label but different properties. You read the gpu properties for the “nVidia index wheel” and the “AMD index wheel”, sort both groups as separate entities (“nVidia index gpu variant” and “AMD index gpu variant”) and select the one that sorts better.


I’ll leave the lockfile topic to @konstin . This is one thing I don’t want to have to ponder about :-).


  1. For example, it is currently possible to publish the index on anaconda.org with zero changes to anaconda’s infrastructure, though admittedly it’s a hack because they don’t verify what files are uploaded. ↩︎

  2. I.e. it’s just a directory on a HTTP server which treats all the files opaquely, and therefore cannot enforce any rules on them. ↩︎

  3. After all, if they wanted to, they may as well auto-generate the file. ↩︎

Not really. The checks you quote are yes/no - does the system support this wheel or not? Unless I’ve missed something, the variant check is to decide which of possibly multiple supported wheels is preferred, and that’s a different (and more complex) decision to make. Which is my point, installing from lockfiles is designed to avoid having to make complex decisions.

I’m not sure it’s even possible to determine at lock time which wheel to use for a given target system. Supporting multi-platform lockfiles without a resolution step was tricky in the first place - variants may have pushed that over into being impossible.

OK. I’ll say right now that I don’t consider it acceptable to make this a HTML-only feature, so you’ll have to add variant files into the JSON index schema.

To an extent, yes. But I’m assuming that nearly all of the complexity can be ignored with a sufficiently-simple variant. So:

The client determines the supported variants via

if datetime.date.today().weekday() == 1:
    return ["example :: day :: tuesday"]
else:
    return []

Creating a “Tuesday-only” wheel is done by creating the wheel, manually adding a variant.json file, and renaming the wheel.

If you walk through an example with something this simple, I feel that it will be a useful end-to-end description of how variants work. At the moment, discussions get bogged down with debates about plugins, cibuildwheel, etc, which are important for individual PEPs, but not for understanding the fundamental flow.

Understood - the terminology is hard to keep straight, and explaining things clearly must have been a nightmare of a job. I’m hoping that by going through with a “dumb reader” mentality, I can help you catch the cases you missed.

I’d have to see it in context to be sure, but yes, that sounds good.

A note in the index subsection should be sufficient.

You have to work with how things work now, which is (in effect) that all indexes (and other sources of files, such as --find-links) must be treated as equal and merged. You can’t, for example, rely on PEP 766 unless you want me to block PEP 825 until PEP 766 is accepted…

I don’t see any way to avoid needing a merging algorithm. Whether it’s implemented in the client tool, or in a common library like packaging, isn’t relevant - that’s an implementation detail. What matters is that every client that supports multiple indexes (including things like adhoc auditing scripts that list what wheels would be installed from a given requirements list) has to have a merging algorithm, and a variant selection algorithm[1].

Which makes me realise - the comment in “How to teach this”, “This PEP is oriented at tool authors” isn’t helpful (in fact, it avoids the question). Sure, it’s oriented at tool authors, but how do we teach this to tool authors? It’s not just experienced developers of installers and build backends who are affected - the current wheel and index specifications are simple enough that people can (and do) write adhoc scripts analysing data. Auditing is an obvious example, but people also do data analysis of PyPI, and probably many other things. These people will encounter variant wheels, and will need to know how to interpret them.

As an example, how would someone[2] who wanted to do an analysis of platforms supported by wheels on PyPI be affected by variants? First of all, there’s now the possibility of multiple cp314-cp314-win_amd64 wheels for a single project/version, which would skew the figures unless accounted for. Secondly, how would you extend that analysis to (for example) report the proportion of NVIDIA-enabled wheels? I’m not suggesting the PEP needs to explain that level of detail itself, but it should be covering things like how does someone find out what various variant properties mean (is there a registry listing all variant namespaces and where the documentation for them can be found?) It’s valid to say something like “we don’t propose offering anything like that, tool authors will have to search out variant implementations themselves” - but I’m going to look less favourably on a PEP that centralises knowledge in a few complex installers rather than supporting a broader ecosystem of adhoc tools as well as the main installers. Avoiding lock-in is a key goal for me.


  1. As well as machinery to determine what variants a system supports, but that’s an argument we’ll be having in great depth in future PEPs… ↩︎

  2. that “someone” was me, a few years back… ↩︎

2 Likes

I can confirm that’s how I designed it; read the lock file from top to bottom asking, “do I install this?”, and if so, which format that’s provided (e.g. wheel or sdist or which wheel, which itself is still a linear pass through the listed wheels).

1 Like

Thanks.

I’m getting increasingly concerned about the added complexity that variants are adding to a number of parts of the packaging ecosystem. I’m still trying to work out what I do about this concern, but it’s definitely an issue for me.

1 Like

I’m not understanding something about this lock file discussion.

Can you not already have multiple distributions (sdist + several wheels) that are valid for your platform, listed in a lock file, and your installer has to prefer one?

How are variant wheels different other than the preference is outsourced to somewhere else? (the whole point of variants)

1 Like

Without variants, lockfiles are read sequentially, and the decision about each file is taken independently of other files. In particular, there’s no preference calculation - that was done at lock time.

With variants, the process isn’t yet 100% clear, because the PEP needs a bit of clarification, but as far as I can tell, you can have wheels for N different variants, of which M are compatible with the user’s system. You can’t do that selection one by one, you have to collect the M variants and then run the ordering algorithm on them to determine which one to install. Two passes - filter, then select.

To be clear, my original post was asking the PEP authors to clarify how variants work with lockfiles, so everything above is speculation at this point. If the authors can explain how to run a single-pass algorithm to install a lockfile, there’s no issue here. If variants require a filter then select algorithm, then this breaks an important design principle of lockfiles, and that sort of breakage needs to be justified (by something better than “we need variants, so lockfiles have to deal with that”).

I think the answer is the same?

Variants aren’t really special here, you have N wheels that are compatible with a system (which can be true today without variants), and there is some preference calculated for them (again, true today without variants, a compiled wheel is preferred over a pure python wheel), and you have to select one of them to install. Lock files encode the preference calculation at lock time so that installers don’t have to determine that at install time in that case.

That’s true with or without variants.

The differences with variants in the mix:

  • The filename alone is not enough to determine compatibility, you need the variants.json data.
  • Instead of the ordering preference being hard coded into the installer (either at install time or lock time), the variants.json data allows the project to influence the ordering preference.
2 Likes

I don’t know if this scales[1], but it seems to me that one possibility is:

  • list the relevant variant categories at the top the lock file
  • the installer reads the variant section and determines which to install (this requires whatever logic is necessary to make that decision)
  • for a dependency that has variants, all the variant wheels are listed (so long as they satisify the lock, which has to be determined by whatever tool created the lockfile)
  • as the installer is doing its pass through the lock file, it filters out any versions that don’t match the variant(s) it selected at the beginning

That still requires using the variant selectors in the process of installation, but I don’t think that’s avoidable[2]. It could balloon the lockfile if there was an exploding combination of variants involved, but I’m not sure if that’s a realistic concern.


  1. or if it needs to–I’m not sure of the balance between “there aren’t that many variants in practice” vs “there could be unlimited combinations of an unlimited number of variants” ↩︎

  2. if it can’t happen, I don’t know that variants work with lockfiles? ↩︎

That seems impossible, if a lock file has an sdist and wheel that are both compatible with the platform does it choose to install both or just one?

If it’s the latter then it is not an independent choice.

If they are independent choices then how does the installer act to install both? Parallel? Sequentially? What order?

I’m pretty sure that currently installers MUST make choice on a per project basis, not a per file basis.