How Info-Zip represents symlinks

dholth · May 5, 2020, 5:01pm

If you use zip with the -y or --symlinks option, a symlink stored in the zip file will have stat.S_ISLNK returns True on what Python’s zipfile.py calls zipinfo.external_attr >> 16. In practice this is always the bit 0o120000 (octal). The target of the symlink is stored as the contents of that archive member. It is not compressed. That’s all.

Some unix timestamps, uid, gid are stored in a zip extra field. The extra field is not used for symlinks in the amazingly popular Info-Zip. It’s developed on SourceForge but this copy is convenient for linking:

github.com

LuaDist/zip/blob/master/unix/unix.c#L382


    /* Accept about any file kind including directories
     * (stored with trailing / with -r option)
     */
    free(name);
    return 0;
  }
  free(name);


  if (a != NULL) {
#ifndef OS390
    *a = ((ulg)s.st_mode << 16) | !(s.st_mode & S_IWRITE);
#else
/*
**  The following defines are copied from the unizip source and represent the
**  legacy Unix mode flags.  These fixed bit masks are no longer required
**  by XOPEN standards - the S_IS### macros being the new recommended method.
**  The approach here of setting the legacy flags by testing the macros should
**  work under any _XOPEN_SOURCE environment (and will just rebuild the same bit
**  mask), but is required if the legacy bit flags differ from legacy Unix.
*/
#define UNX_IFDIR      0040000     /* Unix directory */

github.com

LuaDist/zip/blob/master/zipup.c#L864


        zipwarn("-ll used on binary file - corrupted?", "");
    }
#endif
  }
  else
  {
    if ((b = malloc(SBSZ)) == NULL)
       return ZE_MEM;


    if (l) {
      k = rdsymlnk(z->name, b, SBSZ);
/*
 * compute crc first because zfwrite will alter the buffer b points to !!
 */
      crc = crc32(crc, (uch *) b, k);
      if (zfwrite(b, 1, k) != k)
      {
        free((zvoid *)b);
        return ZE_TEMP;
      }
      isize = k;

uranusjr · May 5, 2020, 8:19pm

Some additional context without any particular point in mind. While it’s easy to add support to compress symlinks, the problems come when you want to extract the archive. There’s some relevant discussion in bpo-27318. Reading the thread, it does not seem to me there’s much opposition to the feature, but someone needs to put in the effort thinking through the design details to push this through.

A few open questions from the top of my head:

How do you tell Windows whether a ZipInfo is a directory?
Should the extractor follow symlinks by default?
What should the extractor do to an archive containing symlinks if it is instructed to not follow them?
What happens if you extract an archive with symlinks on an OS without symlink support?

dholth · May 5, 2020, 8:38pm

On Linux it’s really common to package a shared library as liba.so -> liba.so.1 -> liba.so.1.0 in other words a chain of symlinks to the most-versioned copy. In wheel we get three copies, wasting space. (Whether or not the specific code needs those three copies would be a different question.)

It would be great to make platform-specific wheels that included symlinks for this and other reasons. If you didn’t support symlinks you would want to either error or make a copy.

A ZipInfo is a directory if it ends with /. It should have no contents. https://github.com/LuaDist/zip/blob/master/zipup.c#L428

uranusjr · May 5, 2020, 9:05pm

I agree it would definitely be a good feature. The problem from what I can see though is once the feature lands, people will start to (accidentally or not) put in symlinks pointing to files outside of the archive, which will be a big problem. Maybe the archiver should check whether a symlink is relative and the target is also in the archive. This way if a symlink doesn’t work on extraction the unarchiver can simply resort to copying the target’s content instead. (And it can be allowed to crash if the archive comes from a different source and has an unresolvable symlink.)

dholth · May 5, 2020, 9:16pm

Whether or not that’s more dangerous than executing the code that’s in the wheel,

We’d probably restrict the links to being relative and within the same category.