How to reinvent the wheel

Breaking this discussion out from the PEP 778 discussion, because this does end up needing to be figured out first.

I had originally hoped that a “wheel 2.0” could be a collection of PEPs describing a group of changes in a new major revision of wheel. But as @pf_moore pointed out, this is hard to review and handling these as “all at once” is sub-optimal. I definitely don’t want to make Paul’s life hard as PEP-delegate.

So my hope is that this thread can be used to discuss how to handle adding features to wheel that add new requirements on installers.

My original plan with PEP 777 was to list multiple features (symlinks, zstd compression with a slight layout change, enhanced metadata, and potentially others if people wished to draft them) that would all be added to the spec simultaneously.

To start off with, let’s discuss some constraints:

  • the filename structure of a wheel must be the same or else pip complains it is “unspported for this platform”. This could be changed but the messaging in installers would need to be updated
  • the basic format of a zip file with .dist-info being a folder in the wheel must be maintained, otherwise older tools can’t check Wheel-Version since they can’t find METADATA.
  • only depend on standard library modules to build and unpack wheels. This basically means that any extension to wheel will take extra time to percolate through the ecosystem

I’m probably missing hard constraints, so please do suggest any and I can edit this comment to include them.

With the first three in mind, one alternative that seems plausibly workable would be to declare in wheels a list of features required to install the wheel. If an installer doesn’t support one of the features, it bails out and suggests upgrading. This still requires time to let updated installers to reach users, but it also means we aren’t bumping the wheel major version every time we add a new feature. It also means that you can express supporting a new feature (say zstd support) without needing to also support all previously added features.

Also, from the other thread:

I think this describes another issue when adding new features. If we don’t have a clear way to indicate that resolvers need more information/capabilities to make decisions, older resolvers will fail silently, and that issue will manifest in unexpected ways. So ideally we would have some way of installers/resolvers knowing from the wheels right away that they should at least warn about that circumstance.

So before I actually sit down and draft PEP 777, I wanted to discuss what the best path forward for extending wheels is. I’d be very interested to hear what other people think.

2 Likes

A new wheel version sounds similar to a higher Requires-Python version. If a library I use chooses to raise python lower bound than those new releases will be skipped by installer during resolution and it’ll backtrack to an older release. If a library decides to depend on a new wheel feature X that my installer does not recognize I would want my installer to backtrack and skip that release with maybe warning once total (not a warning per backtrack/package) that some wheels were excluded due to lack of support for feature X.

If all wheel format extensions are support for new feature than a wheel 2.0 that uses 0 new features I’d consider practically wheel 1.0 compatibility wise. For example if we do accept symlinks (and assume that is only new wheel feature) and build backend handles that, if package does not have any usage of that feature then it should remain compatible with installers unaware of that new feature. I don’t want a case where backend (say setuptools) adds support for symlinks and than someone builds there simple pure python library that never uses symlink to be treated as incompatible wheel for a user with an older installer that does not handle symlinks.

None of this being specific to symlinks, but any purely additive new wheel feature.

I think that one good constraint to maintain is the ability to perform a bare-bones installation by simply unziping the wheel in the correct directory (or at least unzip-ish + a couple of mv commands). And also the ability of add them to sys.path (via zipimport.zipimporter).

This is important because it facilitates bootstrapping (e.g. flit and get-pip), and makes it easier for other communities to use other means of installation/distribution that are not “Python-centric/driven” (e.g. OS distributed packages)

2 Likes

Note that this isn’t quite possible today, you can extract most of a wheel into some directory and it will be mostly correct, but that doesn’t handle the .data directory.

But I do agree that maintaining the “use a couple of commands of some simple few tools” is a nice constraint to maintain.

2 Likes

Perhaps the way to do this is simply to schedule Wheel 2.0 now that we know there are some proposals, and any proposal that makes it in by that time gets to be part of 2.0?

We can have an escape clause where we cancel 2.0 if not enough gets accepted in time. But it feels like the only way out of the “this needs wheel 2.0”<->“we don’t have wheel 2.0” conundrum is to commit to doing 2.0 and then see what lands.

3 Likes

The following is from the perspective of a pip maintainer, rather than as PEP delegate.

I think the first thing we need to look at is how things work today. That imposes some rather strong (and not ideal) constraints on any solution.

  • If pip (and uv) chooses to install a particular version of a package, and the selected wheel is greater than version 1.0, the install fails. The installer doesn’t backtrack, or try other versions or other wheels.
    • One consequence of this is that if a project releases a version 2.0 wheel, the project will not be installable without a maxiumum version constraint (which is generally considered bad practice).
    • Furthermore, any package that depends on the one with a 2.0 wheel will also fail to install. And changing the dependency to add a version cap isn’t possible, without releasing a new (wheel 1.0!) version of the package with updated dependency data.
    • In practical terms, the only solution is likely to be for every user to maintain a constraints file which excludes any package versions which use wheel 2.0, and apply it to every install.
  • Once a build backend starts producing version 2.0 wheels, installers won’t be able to install packages that use that backend from source, unless the backend offers a “use wheel version 1.0” flag and the user manually includes it.

This is the position now, but of course a transition plan could include a process for installers to offer a more graceful transition. For such a plan, we need to decide how long the transition process should last, before it’s acceptable to assume that enough installer users have upgraded to at least a transitional version. Until that point arrives, build backends cannot produce 2.0 wheels and indexes (especially PyPI) cannot host version 2.0 wheels.

Ideally, we’d say that users are told to keep on the latest version of pip, so the transition could be relatively short (and I’d assume uv users are generally aware that they are using a rapidly developing tool, so they would upgrade frequently as well). But the feedback we got on the symlinks-in-wheels thread is that this isn’t the reality. So someone will need to make an assessment of what we should really use as a transition timescale.

The alternative (or maybe something we could do as well) is mandate that installers ignore wheels that they can’t handle[1]. This would reduce the issue, although it would still leave problems with packages that only ship sdists (or ship sdists alongside 2.0 wheels). That could be added to the spec and then, once installers start doing that, the transition becomes a lot simpler. However, in order to do that, it’s important (so that we have acceptable performance) that installers can detect the wheel version without downloading the whole wheel. That means something along the lines of:

  • Expose the wheel version via the index API
  • Extend the wheel API to expose the WHEEL file in the same way that the METADATA file is currently exposed.

Alternatively we could change the filename format to explicitly include the wheel version. This allows projects to publish both version 1 and version 2 wheels simultaneously. But that’s an even bigger change, and it’s not clear to me that we need to go that far at this point.

Again, we’d still need to wait until we judge that (nearly) all users are using an installer that conforms to the new spec before taking the next step, but at least we would then have a framework for making future version changes less painful.

The alternative is to simply publish the wheel 2.0 spec, let tools and indexes implement support for it, and manage the damage by getting users to upgrade as needed. But this feels to me like the sort of “big bang, break the world” approach that the Python 2-to-3 transition took, and I doubt users will have the appetite for anything like this (I know that I don’t, personally :slightly_frowning_face:)

By the way, as a side note, this is all based on the need for a major version update to the wheel format. The rules for adding a new minor version would make transition a lot simpler, but it’s not at all clear to me what exactly would be appropriate in a minor version bump. All the current spec has to say on the matter is

A wheel installer should warn if Wheel-Version is greater than the version it supports, and must fail if Wheel-Version has a greater major version than the version it supports.

Does anyone have a feel for what a minor version change would look like? If we could formally establish what can go in a minor change, it might help us structure future proposals to work as a series of minor changes, rather than always needing a major change.


  1. technically this is a change to the spec, as the current version explicitly states to abort the install ↩︎

2 Likes

Interesting point. Could be useful indeed - assuming that the wheel version metadata is exposed separately from the wheel itself. Currently there is a hard requirement to error out, the spec says: “Check that installer is compatible with Wheel-Version. Warn if minor version is greater, abort if major version is greater.”

Nor entrypoints, nor byte-compiling. Unzipping is great for debugging purposes (and yes, let’s keep that), but there is no requirement today that it yields a working or complete package, so let’s not pretend there is. In reality nothing important is going to change here either way.

+1. I’m not sure I have a preference for the “list new features” idea or a regular version bump - that may depend on how many new features folks have in mind. But either way it’d be great to commit to a wheel 2.0 version, conditional on enough features (two? symlinks and better compression are two great candidates already) and an acceptable rollout plan. That way people feel empowered to go do work, without a high chance of things getting stuck in limbo.

There’s no requirement, but packages can ensure (with minimal effort in many cases) that their wheel is also usable as a zipfile addded to sys.path. This is important to how ensurepip and get-pip.py work. It may not be critical, but losing that capability would cause some disruption (and specifically, disruption to one of the installers which already has a bunch of work to do in order to support the new format).

Agreed, with one proviso, that one of the features we include has to be a mechanism to ensure that future version bumps are not this painful.

1 Like

I would think anything added to the .dist-info that should be interpreted, but doesn’t have to be in order to follow the same install process.

For example, if we added some kind of cryptographic signature file to validate RECORD, you can still install the wheel as normal, but you probably shouldn’t. Or if we added some additional output (e.g. a detailed message to print on install).

Including JSON metadata that the installer should prefer over the current format might only have to be a minor update, assuming the old format is still present and there are no critical differences in the new format (e.g. if it’s just more granular environment tags, and the wheel builder wants to provide both, then they’ll just be less granular in the old one).

Possibly a new console_script-like entry point could be justified as a minor update. Maybe it registers man pages or something? Technically non-essential, but you need a newer installer for it to work.

So basically, I think anything that doesn’t change the nature of installing the package itself (i.e. the ZIP extract into site-packages[1]), but would otherwise provide the installer more useful information about what else to do, could pass as a minor update to the spec.


  1. Or some other path that Python will somehow include in sys.path “automatically” ↩︎

3 Likes

Thanks. I’m inclined to wonder if symlinks could fit this form. Stage one, require copies rather than symlinks as now, but add a LINKS file that installers can use to replace all but one of the copies with symlinks. Stage two, make LINKS authoritative and allow omission of the copies. Stage three, allow links that don’t simply point to copies within the wheel.

There’s lots of details to fill in, and it’s longer, but we could do it without a “big bang” major version fix.

Of course, we can say that we’ve reached a critical point where there’s enough features we want to add to warrant a major version, but (a) it’s not at all obvious that a major version is better for users, and (b) just doing a major version without improving the process or trying to make minor versions work simply leaves us with the same problems next time.

This still feels like a python 2 to 3 situation to me right now… :slightly_frowning_face: And one thing we learned from the aftermath of that transition is that with sufficient effort and motivation, we can make big changes without huge compatibility breaks.

1 Like

I’ve been pondering the same thing, but I don’t think they can except as a transitional step, which doesn’t solve the need for a major version update at some point.

The best I’ve come up with would be to interpret RECORD entries for files that aren’t really in the wheel as links to a file that is in the wheel with the same hash. I believe that won’t raise any errors right now, and it sounds like when links are desirable there’s usually one “real” file and the others are conveniences for things other than running Python. So you could include the file once, add multiple lines to RECORD, and an older pip should just say that you need to update for full functionality but the core functionality (i.e. import module) still works.

Now that I write that out, I guess you get the same level of support from a LINKS file that is going to be ignored by older installers, so my RECORD idea isn’t really needed.

Back on the “how do we avoid this in future”, I think we probably need a “critical features” mechanism in the wheel metadata. So a particular wheel can say “I use symlink and it’s critical” or “I use symlink and it’s optional”, and the installer can fail or warn based on a specific feature. Wheels that don’t use links won’t declare anything.

In general, I much prefer feature detection over version checks, but it’s harder for users to figure out whether something is going to work or not.

As I understand it, part of the problem is download size, not just on-disk install size, and ZIP files individually compress each file, right? So this approach won’t fully address the problem of, say, libcudart.so.12 -> libcudart.so.12.0.0; you’ll still have two (or three, with the dev link) copies of the compressed CUDA library stored on PyPI and transferred over the wire to users.

To be fair, that is the status quo, so maybe implementing this now is worth doing, so people shipping giant libraries have a better option in a couple of years.

(Alternatively - maybe you can combine this with the complicated approach I proposed in the other thread: ship only libcudart.so.12 and a linker script at libcudart.so, which will hopefully work, and have metadata to inform a symlink-aware installer that it should rename libcudart.so.12 to libcudart.so.12.0.0, make a symlink, and also clobber the linker script with a symlink. So you get the preferred behavior with newer installers, but there’s a viable fallback for existing ones, and you only have one copy of the library in the ZIP file.)

zstd compression probably falls into the same category of requiring a breaking change to be practical, since existing decompressors can’t handle it, and so at best you can add a .tar.zst file or whatever to the wheel, but that defeats the point.

But maybe there’s a trick here too, since at least for pip, the thing that needs to be changed is the standard library:

  File "/tmp/v/lib64/python3.12/site-packages/pip/_internal/utils/unpacking.py", line 137, in unzip_file
    fp = zip.open(name)
         ^^^^^^^^^^^^^^
  [...]
  File "/usr/lib64/python3.12/zipfile/__init__.py", line 754, in _get_decompressor
    _check_compression(compress_type)
  File "/usr/lib64/python3.12/zipfile/__init__.py", line 734, in _check_compression
    raise NotImplementedError("That compression method is not supported")

So if e.g. Python 3.14 zipfile supports zstd out of the box, then wheels that have Requires-Python: >=3.14 will usually be fine in practice, for pip users. (For uv users and people who use an older python -m pip to install into a newer Python’s environment, it’s more complicated, yes.)

For the proposal to add additional metadata, I don’t think I caught enough details to be sure, but I suspect that can be done as a non-breaking change. I don’t know if anything else is on the table.

There’s another option that should be fully backwards-compatible: encode the feature dependencies as normal dependencies, instead of creating a new metadata field for it and requiring new installers to be able to parse that. Concretely, register a PyPI package called something like wheel-feature-symlinks, upload no dists to it, and give it an informative README. Teach installers that if they do support a feature, then they should implicitly consider this dependency satisfied (pip already does this for python and a few others, I think), and also that if they don’t support a feature, they should print a better error message for a wheel-feature-* dependency, and in either case don’t look it up in the index. Then

  • current installers at least don’t fail, and maybe give the user enough hints to figure out what’s going on, or maybe backtrack and find an older version without the feature
  • slightly newer installers give the user a good error message
  • sufficiently newer installers just support the feature

This does admittedly privilege PyPI as an index, but that’s only really needed for current installers. From the point of view of the specs, we’d be reserving wheel-feature-* and telling installers to special case those names and not look them up in any index.

Since dependencies are reported in the existing metadata, this lets us fully change the wheel format to something that isn’t even a ZIP file (e.g., .tar.zst seems compelling). We’re constrained to the same filename, but not to the format, because older tools won’t even get to the point of trying to read Wheel-Version. I think that with this approach we should basically say that Wheel-Version is 1.0 forever, and it can be omitted in wheels with any feature. If we need any versioning we can do that on the wheel-feature-* virtual packages themselves.

(Alternatively, if we’re worried about people downloading wheels and installing them offline, we just need to have the wheel be a ZIP file containing a ...dist-info/METADATA file with Wheel-Version: 2.0. The actual contents can be something entirely different in the ZIP file like a single data.tar.zst file - or even a polyglot file since ZIP metadata is at the end of the file.)

2 Likes

Yeah, that is true.

But that is also why I said: “or at least unzip -ish + a couple of mv commands”. I think this does apply to the current implementation, right? (and that is why we can have a relatively easy bootstrap process).

I’d like to see some exploration of whether this is in fact desirable. I’m worried it isn’t, for the same reason as the adventures with new manylinux tags. Users often don’t read the output of pip (especially if it runs in CI or something), and they might not notice that they’re effectively stuck on an old version of a dependency and there’s something they can do about that. I think it might be better to fail the install, at least by default (a non-default pip install --ignore-unsupported-formats torch seems fine). On the one hand it doesn’t work when it could, but on the other hand it clearly communicates something is wrong that is fixable.

Actually, for packages that ship sdists, I think this will get us to the exact same problem as with new manylinux tags, that the installer won’t actually backtrack versions, it will just use the sdist of the same version and attempt to build it from source. This seems like a bad UX. Also, because it’s a bad UX, package authors are going to be incentivized to not upload sdists (to help users and lessen their own support load), and this already happens a fair bit and I think that’s unfortunate. (Though maybe what we need is a general mechanism to upload sdists and flag them as only being useful if you really know what you’re doing.)

+1. If we go with the wheel 2.0 approach, wheel and similar code should try to produce the lowest-version wheel that works. For symlinks that means wheel 1.0 if there are no symlinks. For zstd, it should be opt in, or maybe we have some heuristic that wheels that are < 10 MB uncompressed don’t use zstd, or something. IMO this should be a strong recommendation in the PEP, though how each tool chooses to implement it cannot be perfectly specified.

pip is already considering the mechanics of making --only-binary the default (as usual, it’s not as easy as just making it the default).

This is inspired! I don’t think backtracking will work today, but it’s certainly a neat way to allow feature detection without having to change any existing metadata.

I think we’re still a little more limited than this, but we can at least do split packing (so the metadata can be read from an outer ZIP file, but the contents are packed again in something else). Pretty sure it’s too soon to assume/require that the index is serving separate metadata, or that an index will be capable of parsing the new format to extract the metadata. I think you covered this with “installing offline”, but it’s going to apply to online installs too for quite a while.

Perhaps the most useful minor version change would be to define the versioning system itself, and then the follow it up with a second minor version to test out that the system worked as expected. Something like

v1.1 - defines how the wheel version is recorded and accessed and what installers should do about it. Might add some metadata for this
v1.2 - symlinks, or some other appropriate feature

After that, the discussion around future versions is simplified.

I guess my point is that the first revision doesn’t need to include any new features, except for the versioning itself.

2 Likes

Thanks Paul! This is exactly the kind of thing I’m looking to discuss in this thread.

I figured pip didn’t take into consideration the wheel major version when resolving packages but it’s good to confirm that.

I do worry that if we go down the route of “try not to error up front and instead try to get existing installers to choose old wheels or sdists” that we will end up with a manylinux scenario, where users are confused about why they are doing a source build or getting an older version. At minimum, I think in any world where pip considers Wheel-Version in selection, it should at least warn if it comes across a major version it is incompatible with.

I am somewhat leaning towards it being better to surface the issue of an out of date installer to the user sooner rather than later. I don’t want a 2 → 3 scenario but I also don’t want confusing breakage or unexpected package resolution. We could recommend projects to do a version bump if starting to use wheel 2.0 wheels so users can at least constrain the version to below the version they start using those wheels if their installer still doesn’t support it.

I think one easy improvement would be to have pip add a call to action in the error message about major version. Right now it says “the wheel major version is higher than pip can handle” but if a user doesn’t know the implications of wheel 1.0 vs 2.0, they have no path to fixing the error. Explicitly saying that updating pip should fix this issue will get people to update more quickly.

I also think the update check could be made more assertive. The current warning is

WARNING: You are using pip version 21.2.3; however, version 24.0 is available.
You should consider upgrading via...

This does not express any urgency to updating in my reading, other than the “warning.” It reads to me like “oh hey we have a newer version, you might want to try it out!” But not “we highly recommend upgrading to the latest version to avoid issues.”

I absolutely agree we don’t want to just change things and break them. If we do the “upgrade the spec with several new features” I would include a multi-year plan of how to handle the migration trying to avoid breakage as much as possible.

I’d also like to do this in wheel 2.0, glad we are on the same page here. I think the hard part will be figuring out how to avoid having to do this in future releases.

Fundamentally, any feature that adds additional requirements on installer behavior is a “breaking change” in the regard that if we don’t bump the major version, installers would silently do the wrong thing.

Hm, how do build backends decide if something is optional or required? If it isn’t used then the wheel probably doesn’t need to declare the feature all, so I’m curious what an optional requirement on a feature might look like.

Easy, make the build definition specify. The backend isn’t responsible to figure all this out by magic, it’s allowed to have sensible defaults (e.g. no extra features) and require the user put it in the build file (whether that’s setup.cfg or equivalent, or the tool’s section in pyproject.toml).

“If you don’t use symlinks, you’ll only be able to import and use the module. If you do use them, you’ll also be able to compile other projects from source,” feels fairly optional to me. “If you don’t use symlinks, you won’t be able to import the module at all” is required.

Let’s not lose sight of what wheels are fundamentally for: they are so that users can install and use Python packages without having to compile them. If the wheel can do that without the feature, the feature can be optional. But ultimately, it’s up to the project, and if they want to enforce that the feature should always be used, they should specify that, not rely on the backend to guess correctly.

But until the new version is standardised and pip support is added, that statement is a lie. Of course, if we wait till pip supports wheel 2.0, it’ll be too late to update the message in older pip versions. That’s the dilemma.