Symbolic links in wheels

A .pth includes the directory containing a file, so if a directory contains multiple Python modules it’s all-or-nothing. Symlinks are more wieldy and have less gotcha. Another big advantage is you can symlink something under a different name, which is useful for extension modules.

Regarding the implementation, IMO the |= 0xA0000000 makes most sense for wheels. It is not standard to treat such a file as symlink, but setting that attribute flag is standard-compliant, so we can amend the wheel spec to say something like “if a file in a wheel has this attribute and contains one single line representing a relative path inside the same wheel, the installer should create a symbolic link at the location to the target file”. This won’t need additions to zipfile (which raises standard-compliant issues) and should be entirely doable within Python packaging specs and projects.

Implementation-wise, Windows now has symlinks and you can enable them pretty easily in you have admin control to the machine, so that’s much less of a problem now than five years ago. And if the user does not (cannot) have it enabled, the installer can choose to either copy the file or raise an error telling the user to talk to their admins. For pip at least, there are already things that may need admin intervention for installation to work on Windows (path length limit), so there’s precedence already.

2 Likes

The cited blog makes it seem novel that “[s]tarting with Windows 10 Insiders build 14972, symlinks can be created without needing to elevate the console as administrator”. This has been possible another way since Windows Vista. UAC only filters the symlink privilege from the set of privileges that are provided by the Administrators group, since the group is disabled (deny-only). The symlink privilege can still be granted directly to the user or one of the user’s groups such as “Authenticated Users”. Anyone with admin access can modify this to allow creating symlinks without elevating.

So I thought a bit about this, and it seems we’d need a wheel version bump for this? Because tools that work with the current wheel version are not guaranteed to be able to handle symlinks, so we need to signify what is compatible and what is not.

If that’s the case, a PEP should specify a new wheel version (1.1) that

  • Does everything that wheel version 1.0 does.
  • If a file in the wheel has external_attr bit 0xA0000000 set, the file MUST contain only one single line that contains a path relative to the directory containing the file. The path MUST point to another entry in the same wheel.
  • On installation, the install tool SHOULD create a symbolic link in place of the file with external_attr bit 0xA0000000 set, with the target being the wheel entry specified by the path in the file.
  • If an install tool is unable to create such a symbolic link, it MAY copy the target instead if the target is regular file. Otherwise, the install tool SHOULD signify this failure.
  • Build tools are advised to use wheel format 1.1 only if they need to include a symbolic link in the wheel, to maintain best compatibility with existing install tools.

I’ll begin drafting some text for the PEP if the above makes sense.

1 Like

This constraint precludes using this feature for editable wheels, as per PEP 660.

As far as I am aware, there are two use cases for this feature, Unix .so file symlinks, and editable wheels (if there are others, people should speak up!). Do we want to only support one of those? I assume the benefit is that there is less risk (in terms of both security and general breakage from mistaken assumptions) if the feature is limited to symlinks internal to the wheel? If we do, the PEP should definitely include a section explaining why we only allow internal symlinks, and the implications of doing so.

Some other points:

  1. Existing tools probably don’t check the wheel version, and so will likely create a regular file containing the text in the link (the target filename). We can’t do much about this, but the PEP should probably note the risk. Also note that the wheel 1.0 spec says that installers must warn but proceed if the major version is the same but the minor version greater than the expected version. So the best we can expect older installers to do is warn.
  2. The broken behaviour above (writing a regular file) is valid according to the new PEP, as the chain of SHOULD/MUST requirements only says that installers SHOULD link, copy or fail. Maybe we should be stricter and say that installers MUST fail if they can’t link or copy?

I would want to re-state the path rewriting rules e.g. package-1.0.data/platlib/x.py installs to $VIRTUAL_ENV/site-packages/; rewrite the symbolic link target with the same mapping.

I would want to add a column to RECORD to indicate the executable or link bits so that the link or executable status of individual files affects the hash of RECORD.

I have a draft updated PEP although it was focused on trying to add the ‘greater compression’ extension where a nested archive could provide files not in the .dist-info directory, but the document needs work to succeed in being more clear than the original.

I think it would be version 2.0? If an old tool tries to unpack it and loses all the symlinks, that will completely break the wheel in most cases, surely?

External symlinks are definitely tricky from a security perspective, because if anyone can be tricked into writing through them, they can redirect writes to arbitrary filesystem locations. For example, consider a symlink from __pycache__/something.pyc pointing to a critical system file.

Another nasty case that you have to harden against is where the zip file first contains a symlink to a system directory: something -> ../../../../../../../etc, and then later on in the zipfile it contains a regular entry that falls inside the symlinked directory: something/passwd → "some text". If the archive tool is unpacking entries sequentially, it can easily end up writing through the symlink to overwrite /etc/passwd with "some text". (For this one, I think the usual solution is to wait to create all symlinks until the other files have been unpacked, in a second pass.)

TBH I’m still not clear on what the .so file symlinks use case is – like I said upthread a few years ago, symlinks like libarrow.so -> libarrow.so.14.0.0 only makes sense for libraries on the ld.so search path, and I don’t see how that ever applies to wheels.

For the editable wheels case, Tzu-Ping points out that:

Real issue! But – maybe there’s another way to fix it? Since .pth files can contain arbitrary code, and importlib has a ton of flexibility in how it searches for modules, maybe we can stick a bit of code in a .pth that only adds a single Python module/package to the search path? We’d have to check with the importlib experts – I always get lost when I try to read the docs :-). But if that works, it might be both backwards compatible and 100% reliable across platforms, neither of which symlinks are.

I guess no-one followed my link earlier… the way I’m doing it is with a special pseudo-hash function:

name/of/symlink,symlink=path/to/symlink/target,

See here for rationale.

2 Likes

If it’s not clear that it’s a valid use case, and neither is the the editable wheel one (see below), then it seems a bit premature for anyone to be thinking about writing PEP…

The discussions around PEP 660 were, to say the least, extensive. I personally agree with you (see my library which supports .pth files and import hooks), but there was a lot of discussion around symlink-based solutions. In the end, I approved PEP 660 on the basis that if symlinks are important to people, we can address the questions as part of adding symlink support to wheels.

I’m happy, personally, to not worry about the editable wheel case, but I felt that I should point out that if the proposal doesn’t support that use case, we will be closing the door to the option of implementing editable installs via symlinks.

I don’t mind the pseudo-hash symlinks. We agree that the rules about what to do with the symlinks are more important than how to embed them in the .zip archive. If we added symlinks to wheels it shouldn’t be difficult to have a different security rule for editable wheels.

The largest wheels on pypi duplicate .so’s; the use case is popular. It may not matter whether or not it is absolutely necessary.

An installer understanding the new format automatically understands a 1.0 wheel, which means backwards compatible (?) so I chose 1.1. But honestly I never really understand how format versioning works, so if 2.0 is the current version, this should be 2.0.

Security is the main issue here to me, and to me the editable use case shows how it’s problematic to use wheel both as a data interchange format between the frontend and the backend, and a distribution format for transmission between multiple environments. Security is a big issue for the latter, but some of those are non-issues for the former (since the frontend has all the information needed to decide what is valid and what is not). So if we want to treat wheel as a distribution format and keep its promise of not writing to a location outside of the target environment’s prefix, editables can’t be supported. If we take PEP 660’s idea and define wheel as the frontend-backend interchange format, however, linking something to /etc should be allowed in a wheel, and the security implication should be solved at another layer in distribution—for example, make PyPI validate uploaded wheels to not contain any out-of-environment symlinks, like how wheel allows the local version segment, but such a wheel cannot be published to PyPI.

I think this can be said to basically all symlink use cases (and actually some existing wheel features), because theoratically any usage of it can be replaced by some kind of runtime magic. But the use case still comes up too many times I think it is worthwhile to allow people to use them, since otherwise people are resorting to solutions like creating duplicated binary blobs that make wheel size baloon, which IMO is not good for the ecosystem. We can yell all day they don’t need that in the first place, but it’s pretty clear people are not going to try the “good” alternatives we suggest, so IMO it is better to provide a solution more obvious to them (symlinks in wheels) as long as it does not compromise the rest of the format.

Semi-related, I also considered allowing only regular files to be symlinked (not directories), which can help with some of the issue (and still covers all known symlink use cases, I believe). But of course that doesn’t help if a wheel contains something → ../../../../../../../etc/passwd, so arguably not very meaningful?

I don’t disagree with the pseudo-hash approach, but IMO this should be done in additional to the 0xA0000000 thing. Hash functions are for validation, and it does not feel right to me to overload it to describe file attributes. Also, wheels currently have a nice “feature” of being able to be extracted directly and still be useful-ish, which the 0xA0000000 happens to achieve, which I like. But that’s minor though.

1 Like

I seem to remember there being discussions on removing the arbitrary-code functionality of .pth files due to security concerns. I can’t find these discussions via search however.

The static usage of .pth files only support adding a directory to the list of package search directories, and not specific-package and specific-module linking.

I remember self-replacement being a solution which can include specific files.

I don’t think that follows. PEP 660 doesn’t redefine the role of wheels, it just uses them as a mechanism to implement an editable install. Personally, I’m perfectly fine with not using symlinks for editable support, and PEP 660 explicitly acknowledges that wheels don’t support symlinks.

I don’t really understand the security issue here (surely if you can write to /etc via a symlink, you need to have security permisstions to write to /etc directly?) but I’m not a security expert so I’ll leave that assessment to the people who are.

I remember that I would rather use a real hash assuming symlinks are also stored as zip directory entries (with a special attribute). Otherwise the hash-checking/generating part of wheel’s ZipFile subclass would also have to be aware of the pseudo hash.

Flipping the symlink flag changes the content of the wheel, though. So if RECORD is going to record the full contents of the wheel, then it has to include the symlink flag somehow.

1 Like

I believe the security issue is for sudo pip install foo --only-binary=foo to potentially modify things outside of the Python prefix. This is impossible right now, but can be done if the wheel contains a symlink (and the user has already granted write permission with sudo).

Well, that would be specific to pip and making pip capable of validating symlinks to ensure that a written path does not extend past sys.prefix is always a possibility. This can be trivially done with with normalizing resultant paths prior to creating a symlink and comparing it with normalized sys.prefix.

I have implemented a drop-in replacement for bdist_wheel called bdist_axle, that records symlinks and provides a mechanism for a post-install hook to deploy those.

The generated wheel is fully compliant and installation with regular PIP will suffice. There are limitations and security implications discusses in the Runtime’s README.

Most of the documentation of how this works is in GitHub - karellen/wheel-axle-runtime: Wheel Axle Runtime.

That’s neat. Sorry I haven’t had a chance to look at it myself.

1 Like

Old thread, I know, but thought I should mention that I ran into this issue recently too and even raised an issue for wheel.

It is better avoid using symbolic links in packages. With CMake-build projects like Arrow C++ you can use CMAKE_PLATFORM_NO_VERSIONED_SONAME=ON with the configuration so that only libarrow.so is built.