PEP 777: How to Re-invent the Wheel

Yeah another way of looking at it is “the .wheel has backwards-compatible pronunciation” :grin:

More seriously, the ambiguity is short-term and simple to clarify in the moment[1]. In the long-term, it’s a sort of continuity–wheels will continue to be wheels. And the final version is nicer to read, in my opinion.


  1. and I would think a lot of communication happens via text where it’s trivial to be specific ↩︎

2 Likes

Isn’t “wheel 2” (.whl2) still an option, with the meaning of “a wheel with the V2 meta-format, which allows the version of the inner format to be specified”?

1 Like

Technically yes, but I think if someone gets an error indicated that a wheel has an incompatible major version “3” while the extension is .whl2, that would be exceedingly confusing. I’m going to make an update later today for the PEP about this.

3 Likes

I mentioned this in the discord channel before, but there was some pushback by those who liked that it was an existing file format.

The easiest way to get this behavior right now with the ability to change everything else in the future would be a custom file format. Could even just be

8 bit major version header (we’re not gonna need that many breaking versions right?), rest data.

This could mean just strip the first byte, and the rest is an archive file consistent with the version, or the custom file type could have other information too, so long as the major version is at a fixed offset.

One other thought that I had. The PEP relies pretty heavily on the idea that if we put the wheel version in the metadata, then it’s possible for installers to efficiently read that information at index scan time. But that’s only true if the index supports PEP 658. I don’t know how many index implementations do support PEP 658. But I think the PEP should include a discussion of how it degrades if used with an index without PEP 658 support.

Maybe the answer is that an index should reject upload of new style wheels if it doesn’t support PEP 658. But then we get the problem of indexes that don’t have an explicit upload process, but simply publish the contents of (say) an Amazon S3 bucket as a package index. And for that matter, how would pip’s --find-links https://some.server/directory/of/wheels option handle new wheels? Would we need to download every .wheel file that didn’t have a separate metadata file available?

1 Like

Mostly on PEP 658 question, expectations could also change over time. Maybe initially fallback path assuming no PEP 658 support is start, but in ~2-3 years PEP 658 becomes expected for wheel 2 support.

Main reason to not have fallback path forever, is having fallback paths that assume earlier peps are not supported continually also slows the adoption of those peps. My current work place, I do commonly install libraries from internal index that does not support PEP 658. I think team that maintains that will have it as very low priority/backlog for long time if future work continually assumes/nicely handles case of no support. Keeping new features/improvements assuming peps from few years ago are supported is one way to motivate other indices to move forward too.

1 Like

Thanks for asking the question that I was thinking. Thanks @barry for the upcoming solution.

Overall, this looks great (and, e.g., I don’t see any issue with supporting it in uv – it looks very straightforward).

My only question: have we verified that adding a new file extension in this way is in fact a non-breaking change? E.g., would pip or Poetry or whatever else error if the Simple API returned a bunch of files with an unknown extension, or will it ignore them? (I can confirm that uv would ignore them, but if adding the extension will itself cause breakages in the ecosystem, it becomes a lot less appealing as a compatibility solution.)

Since we’re bikeshedding the name, I’ll at least say that I prefer .whlx to .wheel or .whl2. I would’ve just assumed that .wheel is an alias (like .yaml and .yml), and that .whl2 means “major version 2”, neither of which are true. .whlx reminds me of .js and .jsx, as one example.

4 Likes

Thank you for building a basis for a better wheel format!

I see three different time-scales with different goals: Short term, we want to ship improvements that we can add generally append-only on a metadata level, such as better license (file) handling with PEP 639, PEP 753 or adding Wheel-Version to the METADATA. Mid-term, I’d want to address major points in the wheel format, such zstd compression (major perf impact!), json metadata, proper dist-info naming, etc. Long-term, there should be an evolution towards removing virtualenvs/site-packages and correspondingly the wheel format made to unpack into them, towards packages that are isolated from each other and can be loaded separately. (Currently, each virtualenv contains a complete copy of each installed package - what if we could have venv that are just a list of one symlink per package to a centralized cached, or to go even further, Python launchers that read the standardized lockfiles than load packages from a standardized shared caching location the struggle of a dedicated venv - but i’m diverging from the file format discussion).

Personally, rather than nested archives I’d have the index serve a single comprehensive metadata json response, while the package itself is an archive that I only have to probe into for installation (for local wheels that don’t have an index, the performance hit for reading the metadata out of the compressed archive is acceptable). On a side note, i see two separate kinds of metadata, the full set that you render on the index web page, and a much smaller one for resolver and installer (name, version and dependencies including extras)

Finally, future wheel revisions MUST NOT use any compression formats not in
the CPython standard library of at least the latest release. Wheels generated
using any new compression format should be tagged as requiring at least the
first released version of CPython to support the new compression format,
regardless of the Python API compatibility of the code within the wheel.

A major goal would be zstd compression for wheel, especially those huge compiled ML wheels. While I understand the problem that compression algorithms need native code to competitive and we can only properly support this with smaller targets atm by moving it into CPython, this means we cannot start on this for at least 5 years, even if we were to assume zstd landing in 3.14. Many of those large wheels such as torch or tensorflow that would especially profit from zstd compression only exist for linux/windows/mac, all target for which we can also ship zstd as a compiled wheel. There are also packages with ship compiled wheels from platforms that pypi supports and a py3-none-any.whl fallback, so I see room for some some mixed fallback solution, such as a cp312-cp312-manylinux2014.whlx.zstd and py3-none-any.whl.zip.

Since we’re talking about the evolution of the wheel format, i want to share two of the less obvious pain points that we were experiencing related to the current format:

  • Wheels can override each other’s files on installation, non-determinically. When two or more wheels contain files with the same target installation location, commonly happening with namespace packages, they override each other, the later package winning.
  • dist-info naming: We’ve seen a surprising amount of problems due to mismatches in the normalization of name and version between wheel filename, METADATA and dist-info directory.

Most tools polyfill PEP 658 with HTTP Range requests, including pip and uv. They are available out-of-box in most storage backends (and really should be required for all of them) and are not terribly slower than PEP 658, where downloading the whole wheel is non-viable performance-wise. Obviously, I’d still be happy if we could make PEP 658 required in indexes :slight_smile:

8 Likes

I have a soft preference for “.wheel” over “.whlx”.

Other options:
“.bdist”
“.weel”
“.whll”

That makes me think “.whlx” isn’t a bad option. One benefit of “.whlx” over “.wheel” is in spoken English we can refer to them as “wheel X” which is unambiguous.

To add to your list, more regularly than I’d like, I run into files that conflict during installation of single packages. Console scripts (entrypoints) often clash with the setuptools specific “scripts” since they all get unpacked into “./bin”.

So, perhaps we need a way to express in a project’s metadata the dependencies required to uncompress its binary wheels?

2 Likes

Ignore them.

3 Likes

I do have ideas about specifying dynamic library dependencies in metadata. This would allow multiple projects to rely on the same e.g. BLAS or libtorch etc.

I think however this could be a followup feature introduced in a future wheel change. I want to avoid changing too many things at once.

2 Likes

I’ve just posted an update (thanks @barry for merging :grin: ) that responds to the feedback discussed in this thread:

  • clarified behavior around Wheel-Version and rewrote parts to be more formal about its specification
  • clarified that the x in .whlx is the letter x
  • clarified that future versions of the wheel format should also use .whlx
  • Added a paragraph about the impact of needing to read wheel contents when resolving packages, and how this is mitigated both by range requests and future compression speedups
  • Added several rejected ideas:
    a. store the wheel major version in the file extension (i.e. .whl2)
    b. wheel 2 should change the outer wheel container format
    c. This PEP should define Wheel 2.0

Also, I noticed no one bit on the discussion topic, so I’ll prompt it here too: do people think we should allow side-by-side upload of .whl and .whlx?

3 Likes

I am very hesitant to make some wheel features require binary extension modules as a poly-fill. Part of that is because some people use pip coming from CPython, and I’m not sure how shipping that would work. I also like that Python packaging doesn’t require a C compiler to bootstrap :slight_smile:

I would very much like to see zstd adoption though, to the point that I have a branch where I’m working on adding support for it to CPython.

1 Like

Also, I realized I didn’t add a rejected idea for .wheel. I will add that the next time I update the PEP, but I agree that having homo-phonic names for two different file types will be confusing. I don’t expect people to say “new wheel” enough to disambiguate the two file extensions.

Regarding additional file extension name changes, I expect that a future change that doesn’t want to be called .whlx will be an installer format altogether different from a wheel file. In my mind a wheel is a zip file with a .dist-info directory containing metadata information, and some package data in some format dependent on the specification(s). I have a hard time thinking of a change that would require a name change to the extension that would also keep the format I describe. I’ll also add something to the PEP about this in the next update.

2 Likes

the .dist-info metadata directory MUST be placed at the root of the archive without any compression

No compression at all seems a bit wasteful, with things like license files being put in .dist-info.
Also, AFAIK the current default in setuptools is to use deflate for the whole archive, and pip itself publishes deflate-compressed wheels. I don’t think there’s any practical reason against requiring support for it.
In CPython, zlib is nominally an optional module, but I think that should change.

I suggest:

  • Allowing deflate as well as no compression. (I guess BZIP2 & LZMA are unnecessarily progressive for this use case.)
  • Limiting the restriction to .dist-info/METADATA (i.e. the file that contains the version). This would allow new wheel versions to specify other compression schemes for individual files – after careful consideration of course.
    Practically: tools should either check the version in the METADATA file before looking at the whole .dist-info directory, or assume that a failure to unpack .dist-info is due to an incompatible wheel version.

+1 to .wheel by the way :‍)

1 Like

I think the intention with that language is to avoid the nested archive nature that was proposed in other threads, wherein the data was being compressed in a tarball, within the zip file.

This is a perfectly fine definition, but I think it can be reasonably argued that it already applies to the current implementation of .whl. So do we really even need a new extension?

The only way that old versions of installers will continue to work is with side-by-side uploads. We’ve established a couple of times in other discussions that nobody really wants to do that, and to be honest those were for more valuable purposes than changing the packaging format (wheel variants was one). I don’t think we can assume that publishers will build twice for as long as transition takes.

So assuming all publishers switch package format immediately, the question is what happens to those installing with old installers.[1]

With a new file extension, they will likely start building from sdist. We’ve found in other discussions that this is a bad and/or surprising thing, and people would almost certainly prefer to just update their installer than build from source. Potentially the installer could backtrack and find the last version that had .whl, which means silent success but missed updates - at least some users would want this to error out so they know to update their installer.

With the same file extension, they will get either a packed file in their site-packages directory (and subsequently a ModuleNotFoundError at runtime), or hopefully an error when the installer checks the wheel metadata and finds the version has changed by too much.

Is there another path here that would be smoother? Either way, it seems that provided we get installer updates out sooner than publishers, we can transition just as smoothly with the .whl extension as with a new one.


  1. Up to date installers will of course Just Work, for the normal software engineering definition of working… ↩︎

3 Likes

Exactly what I meant, but perhaps not clear. I should rephrase that to say they should exist at the root of the archive and should be directly readable by merely unzipping the wheel.