Currently, as the wheel format is designed, files are permitted to exist in the root of the wheel. The Root-Is-Purelib key within the WHEEL file also indicates whether those files should end up in purelib or platlib.
Would it make sense to have a wheel 1.1 format, that enforces that contents within wheels be packed such that there’s no files in the root, and they need to use the .data/{scheme}/ directories instead?
The primary consqeuence of this would be that wheels would no longer be zipimportable – something that is considered an undesirable design consequence in the original format:
Technically, due to the combination of supporting installation via simple extraction and using an archive format that is compatible with zipimport, a subset of wheel files do support being placed directly on sys.path. However, while this behaviour is a natural consequence of the format design, actually relying on it is generally discouraged.
This is probably better handled by tools that red-flag packages that install names other than what their own name is. There are plenty of good reasons to do this, and I don’t see why we should disallow it, but there are also good reasons why installers should know that wheels are doing this.
And the question then is, why? The only consequence, as you point out, is to break a discouraged scenario without any apparent benefit. It also complicates “normal/cheap” installation, which currently lets you extract a set of wheels and reference the directory from sys.path.
Feels like churn for build backends, installers, and users, for no real reason.
Setting aside the exact container formats, this is conceptually
reminding me a lot of Debian’s deb packages. For those unfamiliar
with the design, it’s a GNU ar format archive which contains
multiple compressed tarballs of separate file trees (each of which
can use different compression algorithms too):
$ ar t util-linux_2.29-1_amd64.deb
debian-binary
control.tar.gz
data.tar.xz
That model dates back to 1995, prior to which Debian just
concatenated a pair of tarballs into a file with no real outer
container. Of course, there is probably much older prior art for
similar designs, just pointing out that the idea of nested archives
isn’t remotely novel or untested for package management.
I agree. I don’t see enough benefits to justify the disruption. And having the “main” files in the distribution at the root of the zipfile seems natural to me.
Definitely not worth spending any of our limited “churn budget” on, IMO.
One argument in favor is that the current format works on limited use-cases, and it’s probably a bad UX for users to find this when trying to use it based on very reasonable expectations. I think removing this confusing and unintended feature would provide a better UX in isolation, but having things that currently work and users might be relying on suddenly stop working will probably have much more of a negative effect in the UX than what we would gain from this change.
This is definitely something we should keep in mind when working on a new major version of the spec though.
I think it is generally important to build an open system that winds up being more powerful than expected, instead of trying to limit people from using that system in unintended ways. If we were revising the language of the specification we might say something like “use this feature if you control the contents of the wheel”…
I guess I should’ve added “why do this” in the OP.
tl;dr: The technical changes are “cheap” and it’d be a good thing to see how a new WHEEL version rolls out.
The basic technical motivation here is simplifying the wheel unpacking logic and closing the “oh, you can do this thing that works but don’t because it’s fragile” loophole in the format (the loophole of python foo-vvv-py3-none-any.whl/foo running foo.__main__ from the wheel’s zip). With this change, you can tell exactly where each file in a wheel will end up since it makes it so that the format doesn’t treat any scheme “specially” – everything ends up a {name}-{version}.data/scheme/{path}, which makes introspecting the file a bit easier as well. As mentioned, it closes
In all honesty, I don’t think these are strong reasons on their own. Installers will already read the WHEEL file unconditionally, so bumping the WHEEL version for this doesn’t really add much value.
The more compelling reasons for me is that (1) it gives us an opportunity to see how build-backends/packagers respond to a new WHEEL version (albeit, a minor one) which can help inform how to go about rolling out a major version bump of WHEEL in the future and (2) it’s easier to block WHEEL 1.0 uploads in the future on PyPI; if we choose to do so at some point.
I’ll note a couple of opinionsTM of mine:
This, on its own, is a backwards-compatible change for installers (hence 1.1). Every wheel 1.0 installer will correctly install a wheel 1.1 wheel (albeit, with a warning).
The argument that “oh, it’ll break someone who relies on doing something we explicitly discourage and we shouldn’t do that” isn’t a good one IMO.
In addition to Hyrum’s law definitely applying to this case, if we ever want to evolve the wheel format, removing a detail that we explicitly “discourage” users from relying on is the change we can make with the least amount of baggage/churn/social-capital/pick-your-phrase being spent on it. Given that we’ve had a bunch of discussions about completely reworking the format, I’d really like us to better understand how a new version of wheel would roll out, and not try an incompatible version bump and hope that we get the rollout right on the first attempt based on educated guesses.
Well, who’d we be simplifying the wheel unpacking logic for? Surely not pip or installer – a corporate wheel unpacker if one such exists? Otherwise, what’s the time frame we are looking at here, optimistically a decade before you are able to stop parsing 1.0 wheels, assuming that all build backends follow suit? And that would involve sunsetting the 1.0 spec, emitting warnings and all the rest. I think whatever simplification’s to be gained from removing this (very minor) quirk long-term will be negated by everything else that’s required to make it happen.
As for build backends, I think they are bound to respond very differently to a backward-compatible minor version bump than they will to an all-new wheel spec. I imagine that should there be a v2 wheel, build backends will opt to generate both v1 and v2 wheels during the transitory period, for example.
Thanks for clarifying. With that background, the argument is a little more persuasive.
I still think this is too small of a change to be worthwhile. It seems extremely likely to me that build backends simply won’t bother implementing the new version, as there’s no practical benefit for them. And frontends will simply change the “valid version” check to say that 1.1 is supported. Sure, frontends could check for files in the root and raise an error if they are present, but again, why bother? “Be lenient in what you accept” argues against doing it, and honestly I don’t think it’s the installer’s place to police the validity of wheels beyond what’s necessary to be sure they can be installed correctly.
I understand the value of getting a feel for how a new wheel version would go, but it’s the case of a backward incompatible wheel that’s the concern, and I don’t think we gain enough by trying a practice run with a compatible change.
Why? I don’t see how having a 1.1 version makes it any easier to block 1.0. Or indeed why blocking 1.0 and not blocking 1.1 would be of any value anyway.
I absolutely agree that this is a worthwhile goal. I just don’t see the proposed change giving us useful information on that front.
Surely there are other formats that have gone through backward incompatible changes (maybe outside the packaging ecosystem, or even in non-Python areas). Would we be better served looking for advice from the people involved in those transitions?
In some wheel unpacking code I wrote, the installer maps category name e.g. ‘purelib’, ‘platlib’ → list of prefixes inside the zip file → {category: [list of files]} → installed locations in other words the added complexity of “empty prefix maps to purelib depending on WHEEL” is about two lines of code, and does not spread to any other part of the installer.
In conda, we have .tar.bz2 and our own .conda format which is two nested .tar.zst’s (one for metadata). We continue to provide both, in a year or so we will be able to stop providing the .tar.bz2’s as the support runs out on the last version of the conda installer that didn’t support .conda packages.
If we decided to go the other direction the only thing we need to add to wheel to make a warranted executable feature would be to provide a way for the __main__.py to be omitted from installations - python somefile.whl would run the __main__.py which could figure out anything related to sys.path etc, while installs would not put a __main__.py in the site-packages/ root.
If you want to propose a “Wheel 1.1” spec, that serves both as a useful improvement, and as a vehicle to get experience for upgrading the wheel spec, how about someone puts together a proposal for supporting symlinks in wheels? That would be a useful addition in itself (there’s a request for it here and it would be another approach that editable wheels could use), while still giving all the benefits you’re after. Also, as 1.1 wheels would have a capability that 1.0 wheels don’t support, we’d get some experience with the transition questions you’re concerned about.