I have come across a few design decisions related to the wheel format which the PEP says nothing about but which are important to resolve.
File timestamps: should we store them, or set them to the UNIX epoch (1970-01-01)?
File permissions: should we store them or not? Some users say they need to flag scripts as executable.
Should symbolic links be stored in wheels? A use case was given just now in wheel issue #400. There are open questions around RECORD handling and extraction on Windows.
My personal take on these is:
I donât think they are useful, and they interfere with repeatable builds.
For PyPI uploads this is probably a no-no, but may have legitimate use cases elsewhere. This also means that wheels built from the same source would come out differently on Windows.
The zip format can already store modification time, although Iâm not sure if installers currently set them.
Permission flags make sense in niche cases. Itâs probably better to support them by packaging metadata declaration instead of introspecting the source files. For example, maybe allow setting the x bit on files in {dist}.data/scripts?
Symbolic links have come up a lot of times. IMO this should be done by adding support of the non-standard symlink support a la InfoZIP in zipfile (b.p.o.27318, b.p.o.37821) plus some special handling on Windows. But additional verification should be done to ensire the link is to another file in the wheel.
Yeah, I already know this. The question is not whether we can store them (we do currently, IIRC), but whether we should. Either way this behavior should be defined in the wheel standard.
What would help here is a someone pointing to a good use case for this. Console scripts are not such a use case as the console_scripts entry point covers that one.
I didnât know if zipfile supports this feature or not, but seems like it doesnât. This means that we would have to vendor an updated zipfile implementation in wheel.
Iâd very strongly maintain that if the stdlib zipfile module doesnât support symlinks, then the wheel spec shouldnât allow them. It seems to me that requiring a non-standard zipfile implementation to unpack a wheel would place an unreasonable burden on the tool ecosystem.
Even if the stdlib module did support symlinks (for all the versions of Python we want tools to support) there would still be a lot of questions which donât have obvious answers:
Should installers reject wheels containing symlinks on platforms that donât support them? If not, is creating multiple copies of files OK? Presumably the wheel author wanted symlinks for a reason, how do we know copies will work?
Should wheels be allowed to contain symlinks that point outside the wheel?
What should go in RECORD? In particular, what should the hash and length be of (the symlink or the target)?
I consider lack of stdlib support a showstopper, but Iâd be -1 even if it wasnât.
Ah, thanks for the clarification, now I understand. Yeah, it makes sense to explicit not set them (at least by default; a flag to let the backend change it is fine).
Ditto, I donât see use cases past âI want packaging tools to do things so I donât need toâ.
(Symlink discussion in a separate post since there are multiple topic I want to touch on.)
I think it is actually doable with a post-processing step. The difference between a symlink and a regular file in a zip is only a file flag, so you can create the zip first and set the missing flags afterwards.
The wheel format can add it as an extension to the zip format (define âa zip entry containing a relative path with the 0xA0000000 bit setâ means a symlink). The fact it is exactly how InfoZip implements symlinks is a nice âaccidentâ This means that wheel stays as a standard zip file, just the wheel defines additional installer behaviour when certain conditions are met.
Yeah, this was covered by the SO post I linked earlier.
This would require explicit support from installers. Without that support, installers would just create regular files with the link path as the file content.
One big problem is of course Windows where creating symlinks still requires administrative privileges, and even then you need to be on an NTFS drive and not, say, FAT32/VFAT.
I think it would be great to store symlinks in wheel. It would save a lot of space for wheels that distribute several identical copies of shared libraries only so that they can have x.so, x.so.1, and x.so.1.1 names. Itâs a bit of a design problem - are you allowed to symlink to anywhere, to anywhere in the wheel, or to anywhere in the category in the wheel? But the âhow are symlinks represented in .zipâ problem is standard enough, even if it is not part of the main pkware zip documentation?
We would have to bump the wheel major version to 2, so that existing non-symlink-aware installers would error out.
The timestamp should be set to a fixed date after 1980. (ZIP uses DOS timestamps that donât go earlier; it is possible to store more UNIX properties with an extension but it isnât common). This satisfies the âreproducible buildsâ crowd who expect built artifacts to be identical if the source is identical.
The +x bit should be preserved but maybe other permissions should not be preserved. Iâve thought it might be nice to include this bit in the wheel-specific list of files RECORD.
The use case in wheel issue 400 is a real one and would really benefit from this. Thereâs still the issue of dealing with Windows. Maybe the v2 wheel standard should say that the installer needs to make duplicates of the files instead of creating symlinks on Windows? Or should it create symlinks whenever it can?
Is there any ongoing effort to update the wheel spec?
Good point â I think Iâve seen mention of this in zipfile.
Do you happen to know of a plausible use case for this?
Donât we say âmark everything in scripts as +xâ even-if-it-was-packaged-on-Windows? I think this part of the spec is a bit hand-wavey. A wheel that runs an included subprocess to do its work might want to put executable files not-in-scripts.
I think itâs practical enough to say the installer should error out if the user attempts to install a wheel containing symlinks on a file/operating system without symlink support. Symlink support is an obvious consideration any wheel distributor should have in mind, we just do the best we can and let the wheel maintainer deal with the implications.
Whatever happens, I do not think we should rely on non-standard (not stdlib supported) symlink support in zipfiles. If we do add support (and I remain -1 on the idea) how about having a symlinks file in the .dist-info directory that describes name and target for all symlinks that should be created on install?
Installers could be permitted to ignore that file if they couldnât create symlinks. The installer would be responsible for adding the created symlinks to the installed RECORD file. (We still need to specify how symlinks should be handled for purposes of RECORD).
The only use case presented so far seems to be Linux specific (it refers to the âLinux Shared Library Versioning Guidelinesâ). Allowing installers to error if they canât create a symlink seems reasonable. No-oneâs yet offered a use case that needs anything better.
Weâve talked about doing âwheel greater compressionâ which is an extension to wheel that replaces everything but the .dist-info directory with a nested archive. This lets the compression algorithm work on the entire contents instead of individual files. What if we allowed the nested archive to be a .tar.zst instead of a .zip? This would allow symlinks, but would not allow an exact transform between nested and non-nested wheels.
I was assuming for the purposes of this discussion that weâre looking at the possibility of a relatively small revision to the existing format. But yes, if we go for the full âWheel 2.0â discussion, anything is possible.
If we are talking wheel 2.0, then Iâm fine with adding symlink support, but Iâd still want answers to questions like what do we do on systems that donât support symlinks, how do we handle them in RECORD, etc.
I donât personally have an opinion on whether we should add symlink support.
FWIW, Flit preserves the timestamps from the filesystem in wheels by default, except for generated files which have an arbitrary, fixed timestamp (1st January 2016). But if the SOURCE_DATE_EPOCH environment variable is set, all timestamps are generated from this.
Permissions are normalised to either 0o755 or 0o644, i.e. only a single bit is preserved for executable-ness (executability?). This seems to match what popular version control systems store.