PEP 778: Supporting Symlinks in Wheels

I think I agree with @njs that symlinks are not actually needed for shared libraries on UNIX. And if that’s true, I think it would be better not to use this functionality, so that wheels are installable by existing tools that only handle wheel 1.0. I’m worried that if we require a new version of install tools, this will be a repeat of the time that people were still running manylinux-unaware versions of pip and were needlessly building things from source and not realizing they could and should upgrade pip. (To be clear, I still think this PEP is worthwhile even if it’s not used for shared libraries, see below.)

There’s two cases where symlinks show up for shared libraries on UNIX. The first one is for runtime use, e.g., the libxml2.so.2 -> libxml2.so.2.9.1. I also am not aware of anything actually using the full name, except for making it human-informative which version you have. So I think I agree that packaging tools for Python wheels should just get rid of the libxml2.so.2.9.1 name and ship libxml2.so.2 as a real file. Which, I agree, people shouldn’t have to do by hand:

But we already ask people to run auditwheel on wheels that contain dynamic libraries to make them work right, where “work right” is a little complicated to explain and very complicated to do by hand, but the users of auditwheel don’t have to care exactly what that process involves. It would be straightforward to extend auditwheel to do the replacement of symlinks, and it can be done in a deterministic programmatic way without asking people to do anything fragile. (Actually, what does auditwheel do currently? Surely this has come up before.)

The other case is the development symlink, in this case, libxml2.so -> libxml2.so.2 so that cc -lxml works right. But there are ways to do this without a symlink (and without a copy). For the GCC ecosystem, this can be done with linker script. Create libxml2.so as a plain text file containing INPUT(libxml2.so.2), and the linker will process that. (At a former job I actually had to do this for a distribution system that, like wheel, didn’t support symlinks, so I can attest that this does work in production.)

Example of using a linker script instead of a symlink
$ export LD_LIBRARY_PATH=$PWD LIBRARY_PATH=$PWD
$ cat lib.c
int x(void) {
    return 42;
}
$ cat main.c
#include <stdio.h>
int x(void);
int main() {
    printf("x = %d\n", x());
}
$ cc -fPIC -shared -o libx.so.1 lib.c
$ cc -o main main.c -lx
/usr/bin/ld: cannot find -lx
collect2: error: ld returned 1 exit status
$ ln -s libx.so.1 libx.so
$ cc -o main main.c -lx
$ ./main
x = 42
$ rm libx.so
$ echo 'INPUT(libx.so.1)' > libx.so
$ cc -o main main.c -lx
$ ./main
x = 42

On current macOS you can do this with tapi stubify, which creates a small plaintext .tbd file that has the same effect. (I imagine that most platforms are going to have some equivalent. Linux and macOS are the only UNIX targets that pypi.org currently allows uploads for, but I’m happy to dig into how to do this on some other platform that anyone feels is important.) And again this should all be wrapped up in a tool so packagers don’t have to think about how any of this is implemented; the experience should be that they start with some existing UNIX-conventional directory layout (including an unpacked MKL, CUDA, etc. binary archive) and end up with a wheel that does the right thing.

It’s also worth noting that the development symlink case is only needed for when you’re using a wheel as a build dependency for native code, which is a little bit of an unusual case. (In particular your development package will want to ship include files, debug symbols, etc. that most of your users won’t need.) From within Python code, one of the modes of cffi would make use of this, but probably you should be precompiling your cffi code anyway.


But there were a handful more use cases identified in this discussion thread:

  1. Executables that can be invoked by multiple names, e.g. the pkg-config -> pkgconf example above, or ex -> vim. These are used where people expect to call the binary by the other name, and sometimes when the different names have different behaviors. An important special case of this is python3 -> python3.12 for the Python interpreter itself, which is one of two reasons the pybi format needs to deal with symlinks. Note this is only really needed for things that aren’t Python code (C, shell, etc.); if it’s a real entry point you can just create another entry point with a different name.
  2. macOS frameworks. This is the other use case identified in the pybi spec (PEP 711): “symlinks are required to store macOS framework builds in .pybi files. So […] we absolutely have to support symlinks in .pybi files for them to be useful at all.”
  3. Providing compatibility for Python code itself, the namespace/foo -> bar/ example. For which there’s an argument that this is perhaps better off explicitly disallowed, and I think I agree.
  4. Representing editable installs as normal wheels, which is currently out of scope for the PEP except to keep it in mind as a future goal.

(Did I miss any?)

Use case 1 looks very common and is clearly sensible cross-platform. But it’s also a use case that can be specified in a very narrow and precise way: we define something akin to entry points that says, when you install this wheel, make this symlink in the scripts directory (which bin/ on UNIX) to this other file that is also in the scripts directory. By narrowing the problem to symlinks to files in the same directory, we avoid a host of security questions and we also get a precise cross-platform implementation. On Windows, this can be hard links; on any platform, this can be a simple file copy, at the cost of some disk space. So this is something that can have a straightforward cross-platform spec.

Use case 2 is Mac-specific. I think it would be great to encode something into the wheel format (in wheel 2.0 if needed) so that pybi is not a separate format. But we don’t have to answer cross-platform compatibility questions there, because these wheels will not be installed on other systems. And there are also some fairly tight constraints around what frameworks need, e.g., they only point within their own directory. (I also suspect that the symlinks can be flattened out the way that the libxml2.so.2 case can be flattened out, but I haven’t tried it.)

Use case 3 is questionable and probably worth discouraging, and use case 4 is inherently pointing to things outside the Python environment and is unsuitable for wheels that anyone is distributing. Maybe one answer is to spec them out technically, probably with the same means as use case 2, but ban them on PyPI for now and permit installers to not implement this functionality.


tl;dr, I suggest that we split this in to three streams:

  • Get auditwheel and friends to fix up UNIX libraries without needing symlinks and quietly Do The Right thing for wheel 1.0.
  • Add a thing analogous to entry points to allow aliases for binaries, which can be implemented by symlinks, hard links, or copies, whichever is most convenient.
  • Continue to spec out symlinks in wheels to unify the pybi spec with wheels. Have PyPI accept them only for macOS platform tags and only where needed for frameworks, and only require pip etc. to implement the unpacking on macOS where symlinks are guaranteed.

On that last note I do like the approach and rationale in PEP 711 for handling symlinks. In short, starting from the fact that normal files are listed both in the zip file and in RECORD, it stores symlinks in the zip file so that UNIX unzip tools work, and it also stores them in RECORD so that non-UNIX systems have a convenient option.

2 Likes