Symbolic links in wheels

Hi,

Is there any plan to support symbolic links in wheels? Context: we package rather large .so files and those .so files are also versioned (e.g. libarrow.so symlinks to libarrow.so.14.0.0). Currently, those symlinks become copies, which doubles the file size (ending up at 50+MB). Ideally we would like to keep the symlinks inside those wheels.

IIRC there have been multiple discussions scattered around multiple venues (GitHub, mailing lists, and maybe here?). My understanding to this issue is:

  • Direct support is unlikely since Windows has poor symlink support.
  • The most significant technical blocking issue is not from the packaging ecosystem, but Python’s zipfile module, which cannot create symlinks. So unless pip or other package installer implement/vendor their own zip implementation, wheels with symlinks cannot be correctly installed anyway.

AFAICT it’s pretty straightforward: os.stat(filename, follow_symlinks=False).st_mode goes into ZipInfo.external_attr to store the “is a symlink” flag. The link target just goes where the file contents would otherwise go.

I had happily subclassed Zipfile in wheel to check / generate file hashes… zipfile is a very nice module.

I notice the zip on the machine I’m using also stores mtime/atime/ctime and uid/gid in the “extra” field but zipfile.py doesn’t currently interpret those.

See also

https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT

If there’s a serious need for this I have been contemplating writing a sans-I/O zip reader for stuff like this to make it easier to share base-line zip code such as zipfile, use in importlib for zipimport, etc. (which would also mean I would design it to have as few dependencies as possible since it would need to be frozen with importlib).

Here is the most recent discussion of this issue in the pip tracker for those who are interested (issue #5919: “Symlink (and other) handling of archives”): https://github.com/pypa/pip/issues/5919

The whole scheme of symlinks like libarrow.so → libarrow.so.14.0.0 is tightly coupled to how the Linux system linker searches for libraries. Wheels have a different and incompatible way of handling library searching. Including libraries like this inside a wheel is a dubious thing to do… I can see how it might solve some problems in the short term but in the long term I think you’ll hit unsolvable problems. For example, if the user also has a system copy of libarrow.so, and you’re relying on the linker recognizing well-known names like this, then your package might end up using either the system’s copy or the wheel’s copy basically at random, which sounds like a recipe for obscure segfaults.

IMO if you want to ship shared libraries in a wheel, then you should take the search problem seriously, and not rely on the linker’s naming scheme for system libraries. Auditwheel gives each vendored library a unique mangled name, which works well for that use case. From your post, I assume you also want the library to also be usable by other packages. In that case, the best approach I’ve been able to come up with on Linux is:

  • Give your library a unique name that designates a specific ABI as shipping inside a Python wheel, like libarrow-wheel-14.so or similar.
  • Provide a Python API that lets other packages query for build time configuration (linker flags, include dir, etc.), as well as the resulting wheel dependency (maybe if the third-party package is built against pyarrow 14.2.3, then that means its Install-Requires should include pyarrow >= 14.2)
  • Provide a Python API that lets other packages request the library be available at runtime, doing whatever linker finagling is necessary to make that work. (On linux, the simplest thing is to just dlopen("path/to/libarrow-wheel-14.so"); then any future requests for that shared library will be automatically satisfied without going through the normal library search.)
1 Like

It wouldn’t be that difficult. The regular one isn’t too welded to the filesystem, it works fine with any seekable file-like, you could get it fetching range requests over the network to only load the parts of the zip extracted without too much trouble. How would a sans-I/O module differ?

From my POV no dependencies, but that’s more implementation detail.

I would be very disappointed if wheels became incompatible with standard zip tools.

At most, I’d want to see it implemented as metadata, along with a requirement for a direct email contact that frontends can display to users when packages use non-portable features like this :smiling_imp:

In reality, I’d rather the shared native dependency problem be solved properly, though I think as long as we rely on the package developers to do their own wheels and don’t develop a culture where it’s okay for organizations to provide their own (compatible) builds on an alternate index (e.g. a per-OS index maintained by people who do nothing but maintain the builds for that OS) then we’re going to be constantly searching for the next workaround like this one.

2 Likes

Symlinks in zips are standard, but they aren’t implemented in zipfile.py yet.

As Chris mentioned in the linked GitHub issue, the ZIP spec says nothing about symlinks; they just happen to work in certain implementations, on certain platforms. You could consider the zlib implementation the standard (whether that is valid is another question), but even that does not work on all platforms (IIUC zlib extracts symlinks as zero-size files on Windows).

Even in the scenario that zipfile.py (or pip’s own zip implementation) adds symlink support, including one in a wheel automatically makes the wheel non-portable. That might be okay for some (most?) people, but never all. I guess I am personally okay if symlinks are allowed in specifically-picked situations (say platform-specific wheels like manylinux and macosx), but would be quite unhappy if they are allowed for all wheels. This would be another potential trap for cross-platform package maintainers, and for users wasting time figuring out why a tool fails on them.

1 Like

This is indeed the use case that is talked about here. See first post in discussion…

See https://pypi.org/project/zipfile2/ by the venerable David Cournapeau.

If you’re very worryied just put symlinks behind a flag on the build side. Then cross platform wheels that have otherwise avoided Unix API calls or other incompatibility will not also accidentally include a symlink.

1 Like

Right now we simply reverted to copy all libraries (which means larger wheels) rather than symlink them.