Selecting variant wheels according to a semi-static specification

oscarbenjamin · May 15, 2024, 9:38pm

This is a variant of the selector packages idea and and the build backend approach that is sort of suggested here. There is a good summary of some of the problems this is trying to solve here so I won’t repeat that.

The basic problem is that somehow we want a user to be able to do pip install foo and end up with different wheels being installed depending on some property of their system that is not currently expressible in packaging metadata. The property in question might be CUDA version or some property of their CPU or GPU etc. It is easy for the maintainer of the project being installed to write a small piece of code that checks this property but much harder or impossible for maintainers of tools like pip to maintain environment/system checks that would make this work for all projects.

There is also a tension between the desire for static metadata and static dependency resolution and the requirement for some level of dynamism. A frequent concern with previous attempts at solving this has been that the proposed mechanisms allow arbitrary code execution when installing or when resolving dependencies. Here I have attempted to come up with something that is as close to static as possible while still containing the unavoidable dynamic part.

I will describe this in terms of someone doing pip install python-flint just because I know that case well. I described here that it would be useful to be able to provide x86-64 wheels that are built for newer architecture versions like x86-64-v3 in order to be able to use things like AVX instructions. The particular case of x86_64 variants is potentially better handled through more traditional platform tags but let’s just ignore that for now…

Firstly we reduce the problem to two parts:

Having variant wheels based on some property.
Providing a way to select among variant wheels.

For the first part you could extend the platform tag in the wheel filename using e.g. + as a separator to have wheels like:

python_flint-0.6.0-cp312-cp312-win_amd64.whl
python_flint-0.6.0-cp312-cp312-win_amd64+x86_64_v3.whl
python_flint-0.6.0-cp312-cp312-win_amd64+x86_64_v4.whl

The extra parts of the platform tags are “extended platform tags” and are just arbitrary strings. PEP 517 build backends should possibly not output wheels with names like this by default but tools like cibuildwheel might provide options to rename the built wheels like this as they already do when renaming wheels as e.g. manylinux. The suggestion is that you should be able to make wheels with these names and upload them to PyPI or put them in a local wheelhouse directory.

The next step is selecting the right wheel. The problem here is that we usually put the metadata that tools like pip consume into the wheels themselves but pip install python-flint does not yet know which wheel to look at. For that I suggest having an additional file that sits alongside the wheels so that the wheelhouse or PyPI index page looks like:

python-flint-0.6.0.tar.gz
python-flint-0.6.0-wheel-selector.toml
python_flint-0.6.0-cp312-cp312-win_amd64.whl
python_flint-0.6.0-cp312-cp312-win_amd64+x86_64_v3.whl
python_flint-0.6.0-cp312-cp312-win_amd64+x86_64_v4.whl

A tool such as pip should check for the *-wheel-selector.toml file before choosing a wheel. This will tell it how to decide which wheel to choose. The wheel selector file is a toml file that is statically analysable by a dependency resolver. Its contents look like:

# python-flint-0.6.0-wheel-selector.toml
[wheel-selector]

variables = ["x86_64_version"]

[selector.x86_64_version]

requires = ["cpuversion >= 1.0"]
function = ["cpuversion:get_x86_64_psABI_version"]

wheel_tags = {
    x86-64 = [""],
    x86-64-v2 = [""],
    x86-64-v3 = ["x86_64_v3", ""],
    x86-64-v4 = ["x86_64_v4", "x86_64_v3", ""],
}

The cpuversion requirement is an installable package (e.g. from PyPI) that provides a get_x86_64_psABI_version function like:

>>> import cpuversion
>>> cpuversion.get_x86_64_psABI_version()
'x86-64-v3'

When pip reads the wheel selector toml file it should install cpuversion and call the indicated function. The acceptable return values must be the strings listed in the wheel_tags table where lhs is a value returned and rhs is an ordered list of allowable extended platform tags. The empty string allows a wheel without any particular extended platform tag. The order of the list indicates preference so usually the first item should be selected but all items can give a valid install. If cpuversion.get_x86_64_psABI_version() returns 'x86-64-v3' then the allowable wheel files are

python_flint-0.6.0-cp312-cp312-win_amd64+x86_64_v3.whl
python_flint-0.6.0-cp312-cp312-win_amd64.whl

and the _v3 wheel is the preferred one.

From a project maintainers perspective it is ideal to be able to do this without needing to make separate packages either for the different variants or for a separate selector package that then installs the main package. Adding a single file along with the wheels is the nicest way to handle wheel selection from that side. In more complex situations I suppose that the variant wheels can still be used to pull in different dependencies as is the idea for selector packages.

From a dependency resolver perspective I imagine that this fits in neatly with a part of the process that any resolver already needs to handle. Basically at some point there is a requirement for a project, a version is selected and then there is a list of release artefacts from which one must be chosen. The requirement never says which artefact to choose so the resolver somehow chooses a wheel/sdist from the given version based on some preference system. The suggestion here just alters how that choice is made. I might be massively underestimating the complexity of how this fits into the broader resolution though…

This does involve some arbitrary code execution as part of the install because we need to call a function from the installed cpuversion package. However this is more limited than in some other proposals and there are some other advantages.

The arbitrary code execution comes from calling a function in the cpuversion module. In practice there would not be many such modules and so it could be feasible for someone to maintain a list of allowed modules that are vetted and that work for the dependencies that they want to install. Also if particular packages like cpuversion, cudaversion etc were established then an installation tool could potentially vendor them to avoid arbitrary external code execution.

For a dependency locking tool it is impossible to know what the output of get_x86_64_psABI_version() might be on the target machine. However the tool can see all the possible values that it is allowed to return and can also see what the implications of those different values would be. A locking tool could exhaustively resolve all possible cases for the version and produce a lockfile that accommodates all of them. Alternatively a locking tool might recognise that the empty string "" matches all cases and could choose the wheel with no extra platform tags. A final option is that the locking tool could provide a way to specify what values to choose like:

locktool -r requirements.txt --selector-tags x86_64_version=x86-64-v2
locktool -r requirements.txt --extra-wheel-tags x86_64_v3

Equally installers like pip could use the same options for handling extra platform tags during dependency resolution.

Probably many of the details above are not right and need more careful thought and also I have avoided going into details of more complicated cases like multiple extended platform tags. My intention here is to try to present a dynamic resolution scenario that is as constrained as possible so that I can ask:

Is this level of dynamism acceptable to those who want to make locker tools or want to minimise arbitrary code execution etc?

Is this sufficient to handle all cases like GPUs etc where people want dynamic resolution?

steve.dower · May 15, 2024, 10:15pm

At first glance, this seems sufficient. But a lot of approaches are technically sufficient while being socially or politically difficult.

My concern is the gatekeeping inherent to only certain packages being allowed to execute at resolution time. While I 100% get the security angle, I intensely dislike that in order to provide specialisations you have to first get yourself accepted by install tools and your new platform tag extensions approved (by whom?). Disabling selector packages and statically choosing some fallback would be a great security option for installers to reduce arbitrary code execution, but I wouldn’t want to prevent tools from experimenting with their own approaches to reducing risk.

The relevant part of my counter-proposal is making the selector package specify the desired package by name, which means you can easily specify the exact package you want instead of the selector. It’s an approach that is open to anyone, can be bypassed by anyone, works at the granularity desired by the project, and doesn’t require anyone else’s permission or effort to extend.

pf_moore · May 15, 2024, 10:20pm

We should also consider how this would affect uv - @charliermarsh do you foresee any issues for uv with this proposal?

oscarbenjamin · May 15, 2024, 10:56pm

I imagined this being done by a person who wants to create static lockfiles rather than by any maintainer of any particular tools. If someone is concerned about such things then they could disable dynamic execution but allow it for the cases that they need and find acceptable like:

pip install --max-security-disable-everything --allow cudaversion

charliermarsh · May 16, 2024, 1:15am

Thanks for the tag. N.B. I’m familiar with the problem here (and found the intro link very good) but admittedly haven’t engaged in any prior conversations around solutions.

I don’t see any issues that would be specific to uv, if this were to be proposed. I mean, it wouldn’t be particularly fun to implement, but I don’t see any fundamental problems. It’s somewhat similar to extras, somewhat similar to how platform tags work already (e.g., we use the tags on the current platform but allow users to provide alternate tags to resolve for other platforms). It’s good that the domain of possible values would be provided upfront (i.e., the wheel_tags field) and that they’re mutually exclusive.

A few questions / comments:

What if a user doesn’t want the x86_64_v3 variant, despite it being supported? E.g., perhaps they have GPU support, but don’t want to install the GPU variant of a package. (I have a bias towards solutions that require users to encode the accelerators (or similar) that they want in the specifier, as opposed to automatically resolving them, and I suppose this is one reason why.)
We probably can’t really cache these lookups reliably since it’s just ambient system state (e.g., you can install CUDA, and so CUDA support can vary over time on a single Python platform).

barry-scott · May 16, 2024, 8:03am

When Fedora looked at support for v3 and v4 optimised code for its RPMs.

The recent discussion suggested that its often better to do run time checks for the CPU features within a single program and not ship separate builds.
Indeed this is already done for some software that does benefit from AVX for example.

oscarbenjamin · May 16, 2024, 10:26am

As far as I know there is not any way to specify which wheel you want pip to install if there are multiple matches. There is --no-binary to say that you want to use the sdist but otherwise pip just chooses a wheel and I assume that uv is the same. Usually that is fine because all wheels have mutually exclusive tags so the only question is whether to use the sdist or the only matching wheel. The difference here is that it is more likely that there would be multiple possible wheels and strong reasons to prefer which one you get. Note that it is not just about preference though: wheel_tags also encodes hard requirements.

I suggested two mechanisms for overriding the automatic selection:

locktool -r requirements.txt --selector-tags x86_64_version=x86-64-v2
locktool -r requirements.txt --extra-wheel-tags x86_64_v3

The first of these refers to the lhs of the wheel_tags table and the second to the rhs i.e. the first specifies the dynamically determined value and the second specifies more directly which wheel to choose. These are out of band wrt the requirements.txt because requirements cannot express wheel selection. There are potential uses for specifying which wheel you want in requirements for example to encode ABI dependency between wheels but currently there is no way to express this.

You can also install hardware like a new CPU/GPU etc. At some point you have changed things enough that you should just nuke your environment and rebuild. We have tried to make it as easy as possible to build from scratch and sometimes that is just what you need to do. I don’t know if uv’s cache design makes that more complicated than it would be with pip/virtualenv.

Let’s not get too distracted with the details of the example. Runtime checks are definitely nicer from the packaging side but are harder for the maintainers of the libraries that have to implement the checks and fat builds as well as more difficult for the people who build the binaries. Most importantly runtime checks definitely do not help with the GPU file size problem.

Feel free to improve the selector package proposal. I don’t mean for this to be a competitor but rather a thought experiment: is the static wheel_tags table enough to satisfy those who dislike the opaqueness of other proposals for dynamic resolution? I am sure that the selector package proposal can be modified to provide something equivalent to wheel_tags as well.

From my perspective what I dislike about the selector package approach (if I have understood it correctly) is just that I don’t want to have to make multiple packages like python-flint, python-flint-x86-64-v3, python-flint-selector etc as separate names on PyPI just so that a user can install what is logically a single package.

pf_moore · May 16, 2024, 11:37am

That’s not precisely true. It’s possible for multiple wheels to be compatible with a given system, and in that case the installer is supposed to pick the “best match”. That’s done currently by the installer having an ordered list of compatible tags, but it would certainly be possible to give the user the option to modify that list. The problem would be designing a good UI for this, because tag lists are pretty long and messy…

oscarbenjamin · May 16, 2024, 12:21pm

That’s why I said “usually”. I’m sure you do but I don’t personally know of any examples where a project uploads wheels with tags that are not all mutually exclusive. The examples I know are all either:

sdist plus single wheel or
sdist plus matrix of wheels for Python version, OS and architecture.

In these cases an installer only needs to decide between the sdist and the only matching wheel so --no-binary and --only-binary are sufficient to decide what happens.

Yes, and this would be made more complicated both for UI and implementation if extra platform tags were allowed.

msarahan · May 16, 2024, 1:00pm

Thanks for this writeup! My initial thought is that we should avoid things that require new implementation in installer tools and in wheelhouse. The extra TOML file does limit the black-box nature of the lookup package, relative to a PEP 517 build backend approach, but adding new functionality to installers feels like it dramatically increases the potential effort to adopt this feature.

Perhaps we could achieve something similar if the PEP 517 build backend “dispatch” packages contained (or could obtain/download) enough metadata about their potential dispatches to construct a more complete static environment picture. This would limit their flexibility and correctness, of course, since remote options may change depending on updates and partial mirroring, but maybe it’s enough to get the right compromise.

One thing that GPU packages have to do pretty often right now is host some or all packages on an external index. The size constraint, plus the practical tedium of PyPA staff managing manual overrides, means that we can’t assume that all variant packages live in the same place. Do you see a clear way to allow referencing external indexes? Could it be just another field in this TOML file?

Yes, I think the selector should be at a higher level than a single package. Once installed, it should activate logic for any package with a matching selection to be made.

xref What to do about GPUs? (and the built distributions that support them) - #71 by msarahan - I think it is important to be able to change selector and re-evaluate an environment. I don’t know what the right implementation is, but treating variant packages exactly the same as other packages does seem like it will inevitably mean environment nuking/recreation instead of swapping stuff out. If variant packages are plain dependencies alongside others, then environment specs recorded from that env are hardware-specific.

From my experience, building envs from scratch is unfortunately very rare in many workflows, and people don’t do it often enough. Conda works great at first, but the more history an environment has, the harder things get. People could avoid so many problems by nuking/recreating envs, but in practice, they mostly iterate on envs and keep them around until the env is unmanageable.

Do you mean caching the hardware metadata lookup in the selector package, or caching the package lookup based on that metadata? I think you can cache decently well. it is volatile system state, but that state doesn’t change often. Knowing when to invalidate that cache is the hard part, as usual.

dhellmann · May 16, 2024, 1:16pm

Oscar Benjamin:

I suggested two mechanisms for overriding the automatic selection:
locktool -r requirements.txt --selector-tags x86_64_version=x86-64-v2
locktool -r requirements.txt --extra-wheel-tags x86_64_v3
The first of these refers to the lhs of the wheel_tags table and the second to the rhs i.e. the first specifies the dynamically determined value and the second specifies more directly which wheel to choose. These are out of band wrt the requirements.txt because requirements cannot express wheel selection. There are potential uses for specifying which wheel you want in requirements for example to encode ABI dependency between wheels but currently there is no way to express this.

If the selector tag variables were exposed to the code that does marker evaluation for other things like python language version, then the requirements list could express wheel selection.

dhellmann · May 16, 2024, 1:19pm

Oscar Benjamin:

The wheel selector file is a toml file that is statically analysable by a dependency resolver. Its contents look like:
# python-flint-0.6.0-wheel-selector.toml
[wheel-selector]

variables = ["x86_64_version"]

[selector.x86_64_version]

requires = ["cpuversion >= 1.0"]
function = ["cpuversion:get_x86_64_psABI_version"]

wheel_tags = {
    x86-64 = [""],
    x86-64-v2 = [""],
    x86-64-v3 = ["x86_64_v3", ""],
    x86-64-v4 = ["x86_64_v4", "x86_64_v3", ""],
}
The cpuversion requirement is an installable package (e.g. from PyPI) that provides a get_x86_64_psABI_version function like:

What’s the benefit of putting the function name in the selector file, rather than defining a plugin interface based on entry points?

oscarbenjamin · May 16, 2024, 3:53pm

No particular benefit. I just suggested the first thing that came to mind. Possibly a plugin interface would be better.

oscarbenjamin · May 16, 2024, 5:14pm

I am not sure if you have explicitly spelled out the “build backend approach” anywhere. I think I sort of understand what you intend to do and I also think that if you were to spell it out fully then some people would not be happy with it. It sounds to me like you are suggesting to use PEP 517 as a backdoor for reintroducing the equivalent of setup.py based installation with dynamically determined dependencies which is something that people have spent 10 years or so trying to eradicate. I don’t think anyone would stop you from using PEP 517 like that but if you want to come and discuss with packaging people what is a “proper” solution then I expect that many of them would not want to endorse that approach.

In this thread I have tried to present something that I hope would be more palatable to the people who would (I presume) dislike the build backend approach if it were spelled out more clearly. What I have suggested does require making changes to installers and would first need a PEP etc. I am not proposing that I would do that myself but rather putting this out to see if it is the sort of compromise that could plausibly get consensus around an “officially supported” way of doing dynamic dependency resolution.

I am not personally a maintainer of pip, PyPI or any other packaging tools or infrastructure. I am fairly sure that the people who do maintain those things will not like the idea of referencing external indexes without any explicit opt-in from the user.

More importantly though I thought that a big part of the goal here is to solve the file size problem by making the files smaller. As I understood it from your summary in the other thread if we could handle dynamic dependency resolution then we could detect the cuda version and the GPU and then we wouldn’t need fat wheels and then we would have smaller wheels. I don’t have much of a sense for what is in these wheels or how much smaller anything could be though.

pf_moore · May 16, 2024, 7:36pm

I don’t think the current process offers any scope for install-time dynamic behaviour. The only thing it’s possible to install is a wheel, and the install process for a wheel is clearly specified as being nothing more than moving files into their final locations. PEP 517 is purely about building a wheel from a sdist, and while arbitrary code execution is possible there, installers can and do cache wheels built from a sdist, so even when installing a sdist there’s no guarantee that any code will be run.

The setuptools “dependency links” feature offered this sort of thing, and we explicitly removed it as a security risk. So yes, I think you’re right, allowing a package to trigger the use of an extra index without explicit user opt-in is very unlikely to be acceptable.

msarahan · May 16, 2024, 7:55pm

Thanks! I’m not looking to sell a certain approach. I appreciate your proposal here. Let me lay out how I understand our current approach:

A user wants to install cudf, which implements GPU support for dataframe (Pandas) operations. They can’t install cudf alone, because we currently put the CUDA version into the package name. So the user needs to install cudf-cud12.
If you look at the release page for cudf-cu12, you’ll see a tiny sdist: cudf-cu12 · PyPI
The sdist depends on a build backend, called nvidia-stub:

[build-system]
requires = ["nvidia-stub"]
build-backend = "nvidia_stub.buildapi"

nvidia-stub contains the lookup and download logic, returning a wheel that matches what the “build” of the sdist is expecting. This wheel then gets installed. This is the part that is essentially behaving like setup.py that you say people have been trying to kill off.

It sounds like this is what you gathered/expected/feared (though I’d certainly appreciate confirmation), and I’m very grateful for your feedback.

I’d like to try to understand what degree of dynamic behavior might be palatable. Am I reading your example correctly in that its constraints on expected value are what improve it relative to the “setup.py arbitrary code or things like it” (let’s call this ACE for short?) You still have some ACE in your design, but it is not involved in downloading anything. It only serves to map some system property to some pre-determined tag.

Can we establish some boundaries, then?

ACE may not write any files to the filesystem (including downloading files)
ACE may open and execute shared libraries (but these may not modify the filesystem)
ACE is limited to returning tags that match pre-defined values

Is that a reasonable start?

On the flip side, I’ll mention some expectations that I have perceived from the corporate community:

They don’t want to use indexes external to PyPI, but it has been a functional necessity. As you know well, the support for variant packages on PyPI is poor.
They really want things to be on PyPI, because it dramatically simplifies instructions for customers and reduces friction with the major projects (e.g. PyTorch, TensorFlow, JAX).
They need to operate on their release cycles. If it takes days or weeks to get a size override approved, there’s a LOT of angst going on inside the company.

Some of this stuff is not at all related to our current discussion, but I bring it up to give some context especially around the question of external repos. We can’t get rid of them soon enough, but they are a necessary evil at the moment, and they’ll be necessary until 100% of our software can be on PyPI.

dhellmann · May 16, 2024, 8:02pm

OK, I thought maybe there was some aspect of the static analysis that would benefit. Or maybe tools not written in Python? Although either way they have to be able to run the function, so it’s a minor detail.

pf_moore · May 16, 2024, 8:10pm

I’m not sure I follow your next comments about constraints for ACE, but the key constraint on a system like cudf-cu12 is simply that every build of that sdist on the user’s machine must result in the same wheel^[1]. That’s required because the installer can cache the built wheel and then skip the build (and hence the code execution) in future.

Personally, I’m not even sure I have a problem with this, from the POV of an installer maintainer. It’s just another build backend as far as I’m concerned.

The user might want to review the security implications of this, though. There’s a lot of magic going on when “building a wheel”, and they should probably be auditing that. And the average user probably isn’t used to auditing build backends.

with some exceptions around things like config settings, but I doubt they apply here ↩︎

msarahan · May 16, 2024, 8:17pm

I think your reference to “exceptions around things like config settings” is key here. I believe that user preferences should direct dispatch. This might switch between CPU and GPU implementations, for example. Where I absolutely agree is that for a system that has a non-changing hardware configuration and non-changing dispatch preferences, the “build” of the sdist to yield a wheel should always yield the same wheel. This forbids any kind of hotfixing or swapping out wheels on the server. They must be stable, with filenames being absolutely unique, as they are on PyPI.

In other words, your cache key is a function of hardware state and preference settings.

dhellmann · May 16, 2024, 8:27pm

Paul Moore:

msarahan:

I’d like to try to understand what degree of dynamic behavior might be palatable.

I’m not sure I follow your next comments about constraints for ACE, but the key constraint on a system like cudf-cu12 is simply that every build of that sdist on the user’s machine must result in the same wheel[1]. That’s required because the installer can cache the built wheel and then skip the build (and hence the code execution) in future.

Personally, I’m not even sure I have a problem with this, from the POV of an installer maintainer. It’s just another build backend as far as I’m concerned.

The user might want to review the security implications of this, though. There’s a lot of magic going on when “building a wheel”, and they should probably be auditing that. And the average user probably isn’t used to auditing build backends.

I’m not making the connection between this comment about building wheels and how the installer selects a pre-built wheel. If the filenames are still unique, which the original proposal ensures, then the caching of selected wheels should be fine, right? Unless some alternative input is passed as an override to the selection process, at which point a different wheel could be picked and that one would be cached.