How Info-Zip represents symlinks

If you use zip with the -y or --symlinks option, a symlink stored in the zip file will have stat.S_ISLNK returns True on what Python’s calls zipinfo.external_attr >> 16. In practice this is always the bit 0o120000 (octal). The target of the symlink is stored as the contents of that archive member. It is not compressed. That’s all.

Some unix timestamps, uid, gid are stored in a zip extra field. The extra field is not used for symlinks in the amazingly popular Info-Zip. It’s developed on SourceForge but this copy is convenient for linking:

Some additional context without any particular point in mind. While it’s easy to add support to compress symlinks, the problems come when you want to extract the archive. There’s some relevant discussion in bpo-27318. Reading the thread, it does not seem to me there’s much opposition to the feature, but someone needs to put in the effort thinking through the design details to push this through.

A few open questions from the top of my head:

  • How do you tell Windows whether a ZipInfo is a directory?
  • Should the extractor follow symlinks by default?
  • What should the extractor do to an archive containing symlinks if it is instructed to not follow them?
  • What happens if you extract an archive with symlinks on an OS without symlink support?

On Linux it’s really common to package a shared library as -> -> in other words a chain of symlinks to the most-versioned copy. In wheel we get three copies, wasting space. (Whether or not the specific code needs those three copies would be a different question.)

It would be great to make platform-specific wheels that included symlinks for this and other reasons. If you didn’t support symlinks you would want to either error or make a copy.

A ZipInfo is a directory if it ends with /. It should have no contents.

I agree it would definitely be a good feature. The problem from what I can see though is once the feature lands, people will start to (accidentally or not) put in symlinks pointing to files outside of the archive, which will be a big problem. Maybe the archiver should check whether a symlink is relative and the target is also in the archive. This way if a symlink doesn’t work on extraction the unarchiver can simply resort to copying the target’s content instead. (And it can be allowed to crash if the archive comes from a different source and has an unresolvable symlink.)

Whether or not that’s more dangerous than executing the code that’s in the wheel,

We’d probably restrict the links to being relative and within the same category.