PEP 778: Supporting Symlinks in Wheels

Rendered link: PEP 778 – Supporting Symlinks in Wheels | peps.python.org

This is an attempt at adding support to symlinks to wheels. The abstract summarizes it well:

Wheels currently do not handle symlinks well, copying content instead of making symlinks when installed. To properly handle distributinglibraries in wheels, we propose a new LINKS metadata file to handle symlinks in a platform portable manner. This specification requires a new wheel major version, discussed in PEP 777

Please note that I have not yet drafted PEP 777 but the goal of that PEP is to discuss the process and group of changes to adopt in a “Wheel 2.0”. @msarahan is working on another PEP which would require a wheel major version bump.

Discussion/Open questions:

PEP 660 and Deferring Editable Installation Support

This PEP leaves the specification and implementation of a PEP 660 editable installation mechanism as unresolved for a later PEP, should we specify that here?

Security

This PEP needs to be reviewed to make sure it would not allow for new security vulnerabilities.

Are there other restrictions we should place on the source or target of symlinks to protect users?

Allow inter-package symlinks

This could be useful for projects that want to shard dependencies such as large libraries between wheels but make them available in the main parent wheel.

The Format of LINKS

Currently the format is derived from RECORD, but perhaps a better format exists.

5 Likes

I haven’t looked in detail at the PEP yet, but one immediate question I have is regarding the comment “By using a LINKS file, installers will be able to potentially use other methods of handlings symlinks”.

How closely does an alternative have to match symlink behaviour? You mention junctions on Windows - they don’t behave exactly like symlinks, but presumably you view them as “close enough”. Would it be acceptable for an installer to use hard links?

Also, I’m +1 on the constraint that links must point to files that are installed by the wheel, but I will note that this constraint makes the feature useless for implementing editable installs (which require installing links to the source tree).

Are the other PEPs likely to introduce new metadata files as well? Should we look at a more extensible format to keep “used at install/runtime metadata” together (especially if we’re calling all this “Wheel 2.0”)?

Hard links would be the preferred alternative for Windows, but my understanding is that the feature would have very little applicability there because there’s no convention of versioning with symlinks (because you’d just version by having a private copy of the binary, which is the situation we’ve already got for Python environments).

I don’t see any rationale in the PEP that would justify anything other than “on Windows, these will be hard links”. Perhaps it exists? Junctions sure aren’t going to be an alternative.

1 Like

My goal there is to allow flexibility in what is supported. If a wheel only has symlinks between directories, then just using a junction is acceptable. I don’t think it has to exactly match a symlink in other words, it just needs to be sufficient to handle what is in the wheel.

I think that a future PEP to define how editable installs would work with symlinks could weaken this requirement in installers, but I didn’t want to deal with that in this PEP.

Agreed. The only use case provided is for Unix shared library versioning. Are there any other use cases? If not, maybe we should just say installation must fail if symlinks aren’t available. It might even be reasonable to restrict the PEP to solely “supporting Unix shared library aliases in wheels” (which would still need symlinks, but only in a very restricted and therefore safe context).

I’d weaken requirements for editable installs, make those the special ones (and if you want to do an editable install via a wheel, flag it some way that can’t go to PyPI, such as a custom platform tag).

Edit - Basically say “editable installs can do whatever they want because this PEP is about distribution wheels”.

I don’t want to include a solution for editable installs in this PEP, I merely wish to not make that PEP have to make significant changes in the future.

For background, I am hoping to write in PEP 777 a “Wheel 2.0” PEP that discusses the process to adopt a wheel 2.0, and list changes to include (referencing other PEPs such as PEP 778). One thing that could be added is a new link-based solution for editable installs, but I do not wish to write that PEP.

I left the door open to Windows symlinks solely for the potential for a PEP in future to use it for editable installs. I’m not opposed to removing support for this extracting symlinks on Windows entirely, but it seems odd that Windows can have symlinks (on some systems) and users wouldn’t be able to materialize that on their system.

I do have an additional backwards compatibility argument, which is that if I change namespace.foo to namespace.bar, I’d want to put a symlink from namespace.foo pointing to namespace.bar.

The motivation section doesn’t make sense to me… as far as I know, using symlinks for library versions is only a thing on Linux for system-wide libraries managed via ldconfig. I don’t think there’s any reason to have them in wheels at all, except for people getting confused?

Note also that PEP 711 has a proposal for symlinks inside files that are almost-but-not-quite wheels, so if we do end up adding symlinks to wheels we’ll want to harmonize the two proposals.

2 Likes

It seems odd regardless, since nothing in Windows relies on symlinks other than OS compatibility shims. We certainly don’t have a good reason to encourage it, but I wouldn’t want someone’s wheel to just break, so there should be some reasonable enough behaviour.

This is likely more problematic than just adding namespace/foo.py with from .bar import *.

Agreed, which makes it feel like this proposal could be an attractive nuisance, encouraging people to use symlinks for situations where they aren’t the best solution. In particular, Unix developers who are used to using symlinks freely, might well reach for them too eagerly, without thinking about the compatibility issues for Windows environments.

I can’t comment on the point @njs made anout Unix shared library aliases not actually neing necessary for wheels, but apart from that possible use case, are there any other concrete use cases for this feature? I don’t mean “maybe it would be helpful for X”, I mean situations where projects currently have to implement suboptimal workarounds, or omit useful functionality, due to the lack of symlink support in wheels.

I know that there’s been a lot of interest in supporting symlinks in wheels for a long time now, but one of the key jobs of a PEP has to be to collect together the motivating use cases behind that interest, and present them as a justification for adding the feature.

1 Like

Thank you for working on this @emmatyping, this is great to see.

There are other use cases. One example that immediately comes to mind is executables. It’s useful to install those as symlinks if you for example have a project that is an alternative to some other project. Packaging-related examples that come to mind: Samurai is an alternative implementation of Ninja, Muon of Meson, pkgconf of pkg-config. pkgconf has a 99.x% compatible interface to pkg-config and improvements over it. It actually recommends in its README that if a distro decides to use it as the preferred implementation, to do so via a symlink: https://github.com/pkgconf/pkgconf#pkg-config-symlink.

I don’t think this is true. Build systems can and do produce symlinks. I don’t have a build at hand to verify the exact details right now, but if you build OpenBLAS for example you get the actual shared library as libopenblas_skylakex_xxxx.so and a symlink libopenblas.so.

MKL also contains symlinks, and them being omitted from the wheels at mkl · PyPI has caused issues trying to use those wheels, since build support files (CMake, pkg-config) are broken (they expect the original symlinks still). Here is an example of installing that wheel and manually putting back the symlinks to make a build succeed: numpy/.github/workflows/linux_blas.yml at ea02e846d58bd157af8fbb3e89d2fddbb897157d · numpy/numpy · GitHub

Further examples for a number of projects are linked at Other issues - pypackaging-native (this link is already in the PEP).

It’s also not a Linux-only issue, some searching will find examples like Support for symlinks in wheel files · Issue #203 · pypa/wheel · GitHub that are for macOS.

5 Likes

Thank you for the feedback on the motivation. I was trying to reference more in-depth discussion of issues, but perhaps I need to give some better examples of projects having issues.

So yes, Ralf is correct that it is very common on Unix (which I use very intentionally in the PEP) to use symlinks for shared libraries. Also within a namespace at NVIDIA we currently set RPATH to search for neighbor wheels, so that we can ship e.g. cublas and cudnn (which depends on cublas) independently. With a modified RPATH, we must name the shared libraries e.g. libcudnn.9, whereas projects that would like to link against libcudnn at build time would like a file libcudnn.so, but we cannot include that without a copy.

I do not mind entirely dropping Windows from the PEP if we think that there isn’t a motivating use case on that platform, but I do think that it might provide some useful tools that I am unaware of as Ralf describes above.

From what @steve.dower said, it sounds like I should update the PEP to suggest extracting hard links on Windows unconditionally. Steve, is that what you were suggesting? Should we never use symlinks even if they would work? Also I have heard that ReFS does not support hard links, so I’m not sure what the behavior should be if the target file system is ReFS.

I would mind - the executables example I gave wasn’t hypothetical, it was something I had tried in the past. Everything I work on is cross-platform and has to work on Windows. So making it the outlier that doesn’t support what every other platform supports is certainly going to be annoying. And it seems that there’s no real problem to motivate omitting Windows support.

I think the PEP draft as currently written is already quite restrictive. I’d prefer to see support for symlinks that is as generic as possible. Some of the restrictions seem motivated by security concerns, but I’m not sure that that is all justified. For example this:

“A simple example would be if a user were to run sudo pip install malicious, and there were no protections …”.

I don’t think is correct. Installing an unknown package from PyPI is inherently unsafe. For all you know it contains an sdist with a build script that deletes your home directory. Or it can contain Python code that modifies any file outside of the install directory. So I don’t yet see a reason to treat symlinks differently here.

Editable installs would also just work (if the backend would implement them via a wheel) without any special considerations if the symlinks were allowed to point to source files elsewhere on the disk. So I’d prefer to see a better rationale for not allowing that.

2 Likes

I think you should reference PEP 662 as at least related to this concept.

I agree with this comment Discuss PEP 662: Editable installs via virtual wheels - #97 by steve.dower and the explanations below about why symbolic links cannot be expected to work on Windows.

Good point, I will add a reference. I think my approach when writing this PEP is that on Windows, just as on Unix, installers must be ready to fail if the backing filesystems does not support symlinks.

The additional risk here is that installing a wheel is currently safe. IMO this proposal needs to be careful not to introduce any additional risks for a hypothetical user who runs pip install --only-binary :all:.

The threat I’d be most concerned about here is a compromised wheel that contains the exact same code in it as the original wheel, but has a malicious LINKS file added. That’s the easiest case for a user to miss, and so probably the most tempting target for an attacker.

However, I do agree that we shouldn’t get too obsessed with security. The context is still “a user instructs a program they are trusting to download a package they are trusting from a hosting service they trust, using protocols they trust, and then they are invoking code from the installed package without further checking on the integrity of that package”. That’s a lot of (quite possibly unwarranted!) trust…

What I think is important is to try to avoid adding features without a reasonable motivation for them. The most secure feature is the one that doesn’t exist, after all…

FWIW, I think there is a good motivation for supporting this PEP on all platforms - encouraging “only works on platform X” is problematic as it fragments the community, and makes building a working system that much harder. So I’m -1 on making this Unix-only unless it is only for support of a Unix-only capability like shared library aliases.

The problem is that on Windows, unlike Unix, whether symlinks are supported isn’t a property of the filesystem alone, but also depends on the user’s current rights. It’s perfectly possible for a user to be able to uninstall a package they previously installed, but unable to reinstall that exact same package into the exact same environment at a later date.

Also, on Windows, symlinks are less common, so tools are less likely to handle them correctly. For example the info-zip command line zip tool will copy a symlink into a zipfile as a directory:

> set-content a "Some text"
> new-item -type SymbolicLink -Target a b
> zip -9 c.zip a b
  adding: a (172 bytes security) (stored 0%)
  adding: b/ (stored 0%)
> unzip -l .\c.zip
Archive:  ./c.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
       11  21/05/2024 15:36   a
        0  01/01/1980 00:00   b/
---------                     -------
       11                     2 files

7-zip fails to include symlinks, saying “WARNING: The directory name is invalid”. Git on Windows seems to treat symlinks as normal files containing the name of the target. These or other issues could result in backup solutions or development workflows being broken if symlinks are used too freely.

So I think there are both technical and “social” reasons for not treating Windows and Unix the same when it comes to symlink support. It may be that we have to accept some level of risk, and assume that if people use symlinks, they know how to deal with them, but that also means we should strive not to use symlinks more than necessary, or without sufficient visibility.

3 Likes

Yes, agreed.

I think there is a part that is valid there. E.g. LINKS should not contain install locations of the symlinks themselves that other types of files in wheels cannot be installed to (that should keep installing a wheel safe). However, there is no reason as far as I can tell to prevent target locations being symlinked to be anywhere. After all you can point to anywhere on the file system and manipulate it from within .py files.

I’m not an expert on symlinks or security, but that sounds reasonable. And it does make symlinks a reasonable approach for editable wheels if we take that view.