PEP 777: How to Re-invent the Wheel

While talking with people about a wheel 2.0 design, it became very clear that before we could talk about what a wheel 2.0 could look like, we needed to talk about how to get there (beyond just incrementing the wheel major version number!).

This PEP defines a path to making wheel evolution easier, so that future PEPs can focus on the changes to the format and not get bogged down by details of how to deploy the update.

My hope is that once we have a compatibility story, we can move forward with discussions about what a wheel 2.0 should look like. If you’re interested in discussing that, come join us at the wheel-next ideas repo or in the #wheel-next channel on Discord!

Previous relevant discussions:

4 Likes

An immediate but very minor nit. If we’re going to require a new wheel file extension, and it’s not going to be 3 characters anyway, why not just use .wheel?

I’m still digesting the actually interesting parts of the PEP :slightly_smiling_face:

7 Likes

The x suffix in whlx is intended to evoke an advancement of the whl format, but I have no attachment to the naming. I figured that particular detail will get bikeshed here a fair bit. I’m fine with wheel, but for the purposes of the main content of the PEP I don’t think it really matters.

One thing not covered in the PEP is why not store the major version in the file extension, e.g., whl2?

Actually, using .whlx threw me initially as I thought it was a placeholder for the major version digit, so I would either make that explicit or just make the switch to .wheel now in case anyone else gets confused.

2 Likes

Why not .whl2 and then .whl3 etc.?

I don’t want to reveal any spoilers, but the plan is that there is an extension mechanism such that you won’t need to rev the major version number of the wheel spec in an backward incompatible way again.

3 Likes

Is the idea to go back to whl once this PEP is fully implemented? With the understanding that this might take many years.

Although maybe the extension can just be .wheel from then on.

1 Like

Oh I should also add, I think the discussion is about the extension because everything else seems very solid :joy:

I’m not 100% clear on why the top level has to remain a zip, but I assume there’s a good reason. Maybe that should be added?

3 Likes

Yeah, I definitely need to add this to rejected ideas. I have another draft PEP (that Barry alluded to) that I hope to polish soon that would introduce feature flags to wheels (with similar semantics as a major version bump, but allowing for clearer communication of intent). I think feature flags better encode the idea behind some changes, but others definitely seem like a real major version bump.

I think there are three issues with using whl2:

  1. You need to encode the major version in the wheel name going forward, otherwise you’d have the confusing situation of a wheel of major version 3 named whl2
  2. Part of the brittleness of the current wheel spec comes from encoding so much information into the filename. Filenames aren’t well suited for storing complicated structured information. I hope with wheel 2, we can have a wheel format that encodes not much more than the name and version of the distribution. So putting the wheel version into the name goes against this goal.
  3. It becomes a lot harder to define “what is a wheel?” and it requires tools to adapt every new wheel major version. If I’m making a windows file association for wheels, how many versions do I register? How forwards compatible is that?
1 Like

I’ll jump on the bike shed early. Please let’s pick an extension that’s not pronounced “wheel”, which is how everyone I’ve talked with pronounces “whl”. “Did you mean a ‘.wheel’ file or a ‘.whl’ file?” sounds like confusion waiting to happen.

5 Likes

It would stay .wheel or .whlx or whatever we bikeshed going forward. I will be explicit about this.

Thank you!

I chose this invariant because tools will need to read .dist-info/METADATA or .dist-info/WHEEL to be able to tell what the wheel major version is and if they can install a file on disk. Unless we go with .whl2, whl3, etc., this will need to continue to work for all future versions of the wheel specification. I should probably clarify the rationale for this in the PEP.

2 Likes

I can understand that this mechanism needs to be invariant moving forward. If there’s any reason at all to switch to something else it would need to be now, while changing the extension.

That’s not to say that it should change–the only other option I can think of is a tarball and that doesn’t seem obviously better.

3 Likes

Maybe a tar (with metadata files at the beginning of the archive if reading some files is desirable without having to read the whole archive) combined with a stream compression algorithm like zstd? I have no idea though how much reduction in file size this would actually give for real world packages compared to zip.

1 Like

Wouldn’t this be the perfect time to switch to .dist-info/METADATA.json? Since it has a different extension, an installer needs to know about the extension to read it, so might as well change now. Though a METADATA file could/would be required as well for a while for extraction into site-packages. Maybe that could be Python version specific?

3 Likes

Agreed the change would need to happen now. I don’t think we should change it however for a few reasons:

  1. A future wheel version could provide better compression by putting non-metadata files into a .tar.zstd or some other compressed tar file and require installers decompress that in some way. The metadata would be accessible the exact same as past versions, but large shared libraries or other content could be compressed significantly. The outer compression format does not need to change to take advantage of compression.
  2. I don’t think it’s a good idea to boil the oceans on the format, we could make something completely different from a wheel, but that would require significantly more work for tools, and a much more involved migration. Unless there is some reason an outer zip file is a problem (see next point to the contrary), I don’t think it makes sense to change things.
  3. zip files have some nice features tar files don’t, such as random access. pip and uv both use this to do HTTP range requests when supported if an index doesn’t serve the metadata file, and this wouldn’t be possible with an outer tar file.

I’ll include these points in a rejected idea about changing the outer wheel format.

4 Likes

I think that is a topic that would best be put in a wheel 2.0 PEP specifying changes to the file format, not this PEP that specifies how to change the file format in such a PEP. When I do write up the 2.0 format spec, I plan on including a metadata.json file.

6 Likes

Sorry for triggering a big bike shedding argument straight off, but I agree, the rest seems good.

One substantive question I have is around the other places core metadata is stored. Would metadata in sdists and on disk in installed distributions be expected to omit the wheel version, or will it be optional but meaningless in those places? This PEP will need to more formally define the new metadata item (in the same sort of format as the existing definitions - for reference, “Dynamic” is an example of an existing item that is only meaningful in one file format).

1 Like

I was expecting the new extension to be bikeshed, so no worries. Glad you like the rest! Would you be content with .whlx if I added a section going over some of the mentioned alternatives in rejected ideas and clarified that x does not mean the major version when introducing .whlx?

My thinking on this is that it should only be allowed in wheels, served from an index via PEP 658 (when pulled from a wheel), or potentially on disk in the installed directory. I’m not as sure about the last one as the other two. It’s not a big ask for installers to just strip it out at install time, but maybe someone will want to inspect the information? I don’t think there’s a reason not to let it be installed into .dist-info/METADATA, so I think I would err on the side of not making the installation process more complicated.

FWIW I would personally avoid saying that a field MUST NOT appear in another context, but only that it MUST NOT be used to change the interpretation of that format, if found.

If you say MUST NOT, then any tool that wants to validate will need to enforce that rule even if it makes no difference to the operation of that tool. Ignoring extraneous metadata is a simple, forward-compatible default.

4 Likes

My main dislike of the x is that it feels reminiscent of its use in .docx and .xlsx to mean “extended version”, and in Windows SuchAndSuchEx APIs with the same meaning. Because it’s common in Microsoft products, I have a vague feeling that it’s some sort of “corporate over-engineering”. It’s also a dead end, in that if we ever need to do this again, .whlxx just feels silly.

I can certainly live with it, but my main complaint is why not use a readable extension like .wheel? @ericvsmith mentioned the potential for confusing when speaking because .whl and .wheel could be pronounced the same, and I guess that’s a fair point, but I hope we don’t all end up referring to “Wheel-X” files, so I think verbal distinction is just something we’ll need to sort out as we go allong (“New wheel” works just fine for me…)

It is bikeshedding, though, and if you say the PEP’s going to choose .whlx, then that’s your right as the author. I appreciate you taking the question seriously, but I’m not going to make a fuss about it.

My feeling is:

  1. It should be prohibited in sdists.
  2. It should be mandatory in (new) wheels.

PEP 658 metadata files have to match what’s in the file itself - the PEP says:

The metadata must only be served for standards-compliant distributions such as wheels [wheel] and sdists [sdist], and must be identical to the distribution’s canonical metadata file, such as a wheel’s METADATA file in the .dist-info directory [dist-info].

The hard one is installed distributions. I really don’t want to add complexity to the process of installing a wheel - at the moment, it’s “unpack and copy a bunch of files”. If we require modifying the metadata, that means that file needs to be rewritten, and the RECORD file needs modifying to correct the size and hash of the METADATA file. And I bet we’ll end up with mistakes being made resulting in installations where RECORD wasn’t corrected.

Overall, I think we should require that installing a distribution from a wheel must continue to copy METADATA and RECORD unchanged. So the wheel version metadata may be present in an installed distribution. However, while there’s no standard saying how to install a package from anything other than a wheel, there’s nothing prohibiting a user doing that manually. So I think we have to say that the wheel version metadata is optional when a package was not installed from a wheel.

I wonder how distributions will view this? I believe they create their distro packages by building and installing wheels into an isolated area, and then repackaging that into a distro-specific format. I could interpret that as being a case of not installing from a wheel, although I doubt anyone would actually care.

Long story short - IMO for installed packages the wheel version metadata should be optional, but the spec for installing from a new-style wheel should explicitly state that METDATA (and its RECORD entry) must be copied unchanged (so that the wheel version is always present for packages installed from a wheel).

6 Likes