Clarifications to the wheel specification

I have come across a few design decisions related to the wheel format which the PEP says nothing about but which are important to resolve.

  1. File timestamps: should we store them, or set them to the UNIX epoch (1970-01-01)?
  2. File permissions: should we store them or not? Some users say they need to flag scripts as executable.
  3. Should symbolic links be stored in wheels? A use case was given just now in wheel issue #400. There are open questions around RECORD handling and extraction on Windows.

My personal take on these is:

  1. I don’t think they are useful, and they interfere with repeatable builds.
  2. For PyPI uploads this is probably a no-no, but may have legitimate use cases elsewhere. This also means that wheels built from the same source would come out differently on Windows.
  3. I have no opinion on this.

The zip format can already store modification time, although I’m not sure if installers currently set them.

Permission flags make sense in niche cases. It’s probably better to support them by packaging metadata declaration instead of introspecting the source files. For example, maybe allow setting the x bit on files in {dist}.data/scripts?

Symbolic links have come up a lot of times. IMO this should be done by adding support of the non-standard symlink support a la InfoZIP in zipfile (b.p.o.27318, b.p.o.37821) plus some special handling on Windows. But additional verification should be done to ensire the link is to another file in the wheel.

Yeah, I already know this. The question is not whether we can store them (we do currently, IIRC), but whether we should. Either way this behavior should be defined in the wheel standard.

What would help here is a someone pointing to a good use case for this. Console scripts are not such a use case as the console_scripts entry point covers that one.

I didn’t know if zipfile supports this feature or not, but seems like it doesn’t. This means that we would have to vendor an updated zipfile implementation in wheel.

I’d very strongly maintain that if the stdlib zipfile module doesn’t support symlinks, then the wheel spec shouldn’t allow them. It seems to me that requiring a non-standard zipfile implementation to unpack a wheel would place an unreasonable burden on the tool ecosystem.

Even if the stdlib module did support symlinks (for all the versions of Python we want tools to support) there would still be a lot of questions which don’t have obvious answers:

  • Should installers reject wheels containing symlinks on platforms that don’t support them? If not, is creating multiple copies of files OK? Presumably the wheel author wanted symlinks for a reason, how do we know copies will work?
  • Should wheels be allowed to contain symlinks that point outside the wheel?
  • What should go in RECORD? In particular, what should the hash and length be of (the symlink or the target)?

I consider lack of stdlib support a showstopper, but I’d be -1 even if it wasn’t.

1 Like

To me, the biggest problem is not the lack of stdlib support but the fact that it’s not part of the official zip format.

2 Likes

As for the practical implementation, this SO post looks informative.

Ah, thanks for the clarification, now I understand. Yeah, it makes sense to explicit not set them (at least by default; a flag to let the backend change it is fine).

Ditto, I don’t see use cases past “I want packaging tools to do things so I don’t need to”.

(Symlink discussion in a separate post since there are multiple topic I want to touch on.)

Re: symlinks in zip

I think it is actually doable with a post-processing step. The difference between a symlink and a regular file in a zip is only a file flag, so you can create the zip first and set the missing flags afterwards.

The wheel format can add it as an extension to the zip format (define “a zip entry containing a relative path with the 0xA0000000 bit set” means a symlink). The fact it is exactly how InfoZip implements symlinks is a nice “accident” :slightly_smiling_face: This means that wheel stays as a standard zip file, just the wheel defines additional installer behaviour when certain conditions are met.

Yeah, this was covered by the SO post I linked earlier.

This would require explicit support from installers. Without that support, installers would just create regular files with the link path as the file content.

One big problem is of course Windows where creating symlinks still requires administrative privileges, and even then you need to be on an NTFS drive and not, say, FAT32/VFAT.

I think it would be great to store symlinks in wheel. It would save a lot of space for wheels that distribute several identical copies of shared libraries only so that they can have x.so, x.so.1, and x.so.1.1 names. It’s a bit of a design problem - are you allowed to symlink to anywhere, to anywhere in the wheel, or to anywhere in the category in the wheel? But the “how are symlinks represented in .zip” problem is standard enough, even if it is not part of the main pkware zip documentation?

We would have to bump the wheel major version to 2, so that existing non-symlink-aware installers would error out.

The timestamp should be set to a fixed date after 1980. (ZIP uses DOS timestamps that don’t go earlier; it is possible to store more UNIX properties with an extension but it isn’t common). This satisfies the ‘reproducible builds’ crowd who expect built artifacts to be identical if the source is identical.

The +x bit should be preserved but maybe other permissions should not be preserved. I’ve thought it might be nice to include this bit in the wheel-specific list of files RECORD.

The use case in wheel issue 400 is a real one and would really benefit from this. There’s still the issue of dealing with Windows. Maybe the v2 wheel standard should say that the installer needs to make duplicates of the files instead of creating symlinks on Windows? Or should it create symlinks whenever it can?

Is there any ongoing effort to update the wheel spec?

Good point – I think I’ve seen mention of this in zipfile.

Do you happen to know of a plausible use case for this?

Don’t we say ‘mark everything in scripts as +x’ even-if-it-was-packaged-on-Windows? I think this part of the spec is a bit hand-wavey. A wheel that runs an included subprocess to do its work might want to put executable files not-in-scripts.

I think it’s practical enough to say the installer should error out if the user attempts to install a wheel containing symlinks on a file/operating system without symlink support. Symlink support is an obvious consideration any wheel distributor should have in mind, we just do the best we can and let the wheel maintainer deal with the implications.

Whatever happens, I do not think we should rely on non-standard (not stdlib supported) symlink support in zipfiles. If we do add support (and I remain -1 on the idea) how about having a symlinks file in the .dist-info directory that describes name and target for all symlinks that should be created on install?

Installers could be permitted to ignore that file if they couldn’t create symlinks. The installer would be responsible for adding the created symlinks to the installed RECORD file. (We still need to specify how symlinks should be handled for purposes of RECORD).

But symlinks are not universally available, what would you fallback to? At best symlinks could be suggestion, not a requirement.

The only use case presented so far seems to be Linux specific (it refers to the “Linux Shared Library Versioning Guidelines”). Allowing installers to error if they can’t create a symlink seems reasonable. No-one’s yet offered a use case that needs anything better.

We’ve talked about doing ‘wheel greater compression’ which is an extension to wheel that replaces everything but the .dist-info directory with a nested archive. This lets the compression algorithm work on the entire contents instead of individual files. What if we allowed the nested archive to be a .tar.zst instead of a .zip? This would allow symlinks, but would not allow an exact transform between nested and non-nested wheels.

I was assuming for the purposes of this discussion that we’re looking at the possibility of a relatively small revision to the existing format. But yes, if we go for the full “Wheel 2.0” discussion, anything is possible.

If we are talking wheel 2.0, then I’m fine with adding symlink support, but I’d still want answers to questions like what do we do on systems that don’t support symlinks, how do we handle them in RECORD, etc.

I don’t personally have an opinion on whether we should add symlink support.

Maybe I should start a new topic for that? There are a lot of proposed changes which would require a major version bump in the spec.

FWIW, Flit preserves the timestamps from the filesystem in wheels by default, except for generated files which have an arbitrary, fixed timestamp (1st January 2016). But if the SOURCE_DATE_EPOCH environment variable is set, all timestamps are generated from this.

Permissions are normalised to either 0o755 or 0o644, i.e. only a single bit is preserved for executable-ness (executability?). This seems to match what popular version control systems store.

1 Like