Selecting variant wheels according to a semi-static specification

mdrissi · May 17, 2024, 7:20pm

One thing I’m bit confused by is that these approaches generally feel like a way to add markers somewhere. At same time we want users to be able to still have some control over flavor picked. So would it be enough to allow users to specify custom markers themselves? As a concrete example,

pip install torch --custom-markers=gpu:cuda11

It seems like there’s relatively small set of custom markers and libraries can document which ones they support and try to have informal consensus. I’d expect all nvidia maintained packages to pick consistent marker name and then other libraries like pytorch/tensorflow that are gpu sensitive can align. Similarly numpy/scipy/python-flint can decide on their own which custom marker they’ll use.

edit: This is similar to extras except here you define marker once and all packages/dependencies use it in consistent manner. So if tensorrt depends on tensorflow/onnx/other packages that use gpu you are choosing cuda11 flavor for all of them.

dhellmann · May 17, 2024, 7:37pm

I agree those are separate. I assert that 1 is more important for this part of the ecosystem. There are way more people installing these pre-built wheels than building them. If we can only work on part of the solution, we should focus on that first one.

I don’t think we need to block on it, but if we could come up with a way to provide for these sorts of extensions without having to change the wheel standard each time there’s a new reason for an extension, that seems ideal. This space is evolving pretty quickly, especially with new hardware, so it’s likely we’ll see new variations as well as new reasons for variations in the next couple of years.

It’s not just what command the user issues explicitly. These libraries are dependencies of other packages, many of which don’t have their own optimizations (they are applications or abstractions on top of the library with the optimizations). So, I think the real requirement here is to be able to pick “a correct” build with a simple dependency like torch or python-flint without any other direction from the user. Allowing the user to provide a hint is useful, but shouldn’t be required.

I look at those as having a mode to override the value provided by the selector logic (whether it’s built into the installer or provided a some sort of dynamically loaded module). So that’s an extension of the first use case. It could be

pip install --tag x86_64_version=v2 python-flint

We may need multiple tags for different things, so a more complete example might be

pip install --tag x86_64_version=v2 --tag hardware_accelerator=cuda python-flint

If we ignore the user interface for actually passing different compilation flags, I would expect the build front-end to add an option to make that explicit. So if I run something like

CFLAGS=some-special-sauce pip wheel --tag x86_64_version=v2 --tag hardware_accelerator=cuda .

then the output wheel should include those tags. Strictly speaking, there are versions of CUDA, too, but I don’t think that level of detail is needed here. The main thing is there may be multiple tags, representing different characteristics of the build, and they must be combined in order to select the right package.

I’m comfortable leaving it up to the person doing the build to get the combination of input flags to the compiler and tag arguments to pip match correctly, for now, because that’s the sort of thing that’s going to go into a Makefile or build script. In the future, it would be nice if those compiler flags could be defined in pyproject.toml somehow keyed by various tag values.

With the right types of tags available, we might be able to say something like “here is the Fedora 39 build of this wheel, linking to the system libraries instead of including them in the wheel itself” (a more specific OS platform tag could help with that). That approach would let some of us OS vendors package the big dependencies so they don’t have to be served on PyPI at all. I know in the past we’ve avoided that because it makes it hard for users to know what they need on their system, but selector plugins could play a role in expressing those outside dependencies (at least via error messages like “pip could not select a package for hardware_accelerator=cuda, do you have $package-name installed?”).

A more immediate potential outcome of this is we should be able to build lots of small packages instead of a few very large ones. I don’t know how that lines up with the storage requirements for PyPI any better, because the total storage for a given package might not actually go down.

Regardless, I don’t think the community at large needs to take on the burden of hosting gigantic pre-compiled artifacts for free and I hope solving that isn’t seen as a blocker for progress in some of the other areas we seem to be closer to agreement.

Those use cases are both valid, but also intersect. When you’re building highly optimized binaries, you apply all of the optimizations for all the reasons you can. It often means linking to different .so files, as in the hardware accelerator case, as well as emitting different compiled code, as in the CPU instruction set case.

oscarbenjamin · May 17, 2024, 7:38pm

The primary purpose of the OP proposal is the ability to implement automatic selection on behalf of the user so that usually a user just does pip install foo and gets the right wheel without passing any additional arguments to pip. Likewise usually a downstream project does requires = ["foo"].

Maybe markers is a better way to implement this in requirements syntax.

dhellmann · May 17, 2024, 7:53pm

Oscar Benjamin:

dhellmann:

Sure, although I was thinking more about the resolver/selector, than about a user specifying the value. For example, if the x86_64_version variable was exposed as a marker, then the metadata about each wheel could be served by the package index and the installer could just evaluate the expression without having to have the extra TOML file. It would need to know how to get the value of the x86_64_version variable, but the selector dependencies could be metadata provided by the package index, too.

I don’t think I understand enough about the mechanics of how an installer interacts with pypi to be able to reply to this… I thought that there was a predefined set of markers. Is there already a way to dynamically declare new markers like x86_64_version? If not then where could the project put that information if it is not in the wheel-selector.toml file?

No, there’s not a predefined way, today. I’m proposing that it could make the implementation of the installer easier, at the expense of having to change the hosting standard, as well as the wheel format. I don’t really think there’s a way to provide the extensibility we need without doing both.

If, instead of putting every tag in the filename, the filename had the standard tags it has today and then a single hash of all of the rest of the new (potentially arbitrary tags), that would give us unique filenames. The actual, parseable, metadata for which tags apply to a dist could be (a) in the wheel itself and (b) in the HTML anchor tag on PyPI’s simple index.

Other values like python version are already part of the HTML in the data-requires-python attribute, for example (Simple repository API - Python Packaging User Guide). We could add an attribute like data-selector-rule with an expression to be evaluated.

There is also a separate existing standard for pointing to a metadata file outside of the wheel. I don’t know if the installer uses that for making choices about which package to grab, but extending that existing file could be better than adding a new one (Simple repository API - Python Packaging User Guide).

Basically, we need a way to express “use this dist when X is true”, and the marker syntax already has some of those sorts of rules. Extending that syntax, and exposing it in more places, would make the evaluation of the rules consistent (and the code reusable).

dhellmann · May 17, 2024, 7:58pm

That’s the idea, yes, but we want users to not have to explicitly specify the markers. Doing that isn’t substantially better than the user experience we have today of pointing pip at a different package index – the user still has to know the right value and remember to pass it on the command line. Instead, we want a system where torch can say “in order to select the right build, you need torch-selector installed” and pip will install that automatically and the torch-selector will probe the current system’s hardware to figure out the right values to give to pip. Ideally, we would have a generic accelerator-selector instead of torch-selector, but that’s going to be up to the package authors to agree on.

mdrissi · May 17, 2024, 8:15pm

I think big difference with package index case is when not all libraries are maintained by same owners. If you only have 1 library in mind with 0 dependencies then yes I think user specified custom markers vs index-url are similar. If you depend on various libraries maintained by different groups there is no straightforward index to use at that point.

The other aspect is I think it’s valuable to have a way for user to opt-out of certain hardware specific settings. A gpu being present does not mean you want all libraries to use it nor is it always case your cpu supports hardware flag means you want the more specialized wheel. If users need a way to specify explicitly which variant to pick, then you could start by supporting manual explicit custom markers and then have automatic custom default marker evaluation built on top using selector package idea here.

pf_moore · May 17, 2024, 8:31pm

Having just read some more of the discussion around the marker idea, I see your point here, and I agree. The sorts of packages that need this feature are generally sufficiently complex to build that we should be actively trying to prevent end users from ever having to build them.

Maybe we can take an explicit stance here and say that we’re not even going to consider the question of building variant wheels - the build process is entirely down to the project and may be as customised as necessary. Building from sdist on an end user machine is out of scope, except to say that it must either fail or build something that works on the user’s machine, but what “variant” gets built is unspecified.

(This is probably what @oscarbenjamin has been trying to say by referring to variants getting created via cibuildwheel, but I’d missed the point until now).

Encoding compatibility in the filename was essential in the original design because we didn’t want tools to have to download every wheel to check for compatibility. The filename check is still by far the fastest way of reducing the list of wheels to check (and with some projects publishing significant numbers of wheels per version, that’s an important optimisation) but having a separate metadata file means the cost of keeping some compatibility information just in the metadata is acceptable. The fly in the ointment here being that separate metadata isn’t mandatory - we can’t assume that all indexes (or index proxies) publish metadata files[^we had an example of one that doesn’t on the pip tracker just a couple of days ago].

I can think of a couple of ways of saying “here’s a number of wheels, all with the same name, version and compatibility tags, you now need to filter them further by checking metadata before you select the best match”. Maybe even without a spec change. But I think that for now we should keep the focus on the higher level questions - just knowing that it’s possible is sufficient for the moment.

This is where I think we need to look at actual use cases. As a user, I can only ever imagine wanting one of two things: “the best variant for my hardware of everything that gets installed”, and “make sure everything works on x86_64_v2 because I may be copying the environment to other environments and that’s the lowest common denominator I want to support”. Are there other realistic possibilities that I’m not thinking of?

In particular, I don’t see a need for a package to ever explicitly demand the x86_64_v4 version of python-flint, because it shouldn’t care. I have a recollection that @oscarbenjamin mentioned APIs that might only exist if certain hardware instructions are available, but I would assume that the API would always exist, and simply error if the hardware support wasn’t there. And a package depending on python-flint needs to cater for the possibility of that error. So I don’t see why variants would appear in dependency specifications. Do you have a concrete scenario in mind?

That’s my “make sure everything works on this combination” use case, basically. But yes, that’s the sort of UI I think is going to be needed.

Agreed, but I mentioned this because I don’t want hosting issues to be seen as a justification for automatically adding extra indexes into the mix as part of the selection process (which is essentially what the nvidia stuff is doing, with the justification that "users don’t like having to set --extra-index-url).

At the moment it doesn’t (the filename design is specifically to ensure it doesn’t have to). In pip’s case the “finder” pre-selects valid wheels using only the data from the project’s index page. That’s highly efficient (one HTTP call to get the candidate set, and from that exactly one candidate per version). Changing that so it gives multiple valid candidates per version which can only be filtered down to a single answer by doing one or more further HTTP calls per candidate is going to be a big step backwards in terms of performance.

HTTP caching will eliminate a lot of cost, but only over the long term. Ephemeral environments like docker containers and CI workers may well not be able to gain from caching, so they make a big hit on their build times. (It’s possible to persist the pip cache, and CI often does this, but it’s added complexity that we can’t assume everyone will do).

dhellmann · May 17, 2024, 9:08pm

That’s fair. There is definitely room for some standards across projects. But it’s not actually required. If I publish a package and say I want a tag with specific values, I just have to publish the selector package that knows how to pick the right values for a given host. I don’t have to agree with anyone on what those are, or even tell the user, because it’s all automated. Ideally we wouldn’t have a huge proliferation of these, and there would be community consensus on the names, of course.

Yes, indeed. There are some examples of doing that elsewhere in the thread.

dhellmann · May 17, 2024, 9:09pm

Paul Moore:

dhellmann:

I agree those are separate. I assert that 1 is more important for this part of the ecosystem. There are way more people installing these pre-built wheels than building them. If we can only work on part of the solution, we should focus on that first one.

Having just read some more of the discussion around the marker idea, I see your point here, and I agree. The sorts of packages that need this feature are generally sufficiently complex to build that we should be actively trying to prevent end users from ever having to build them.

Maybe we can take an explicit stance here and say that we’re not even going to consider the question of building variant wheels - the build process is entirely down to the project and may be as customised as necessary. Building from sdist on an end user machine is out of scope, except to say that it must either fail or build something that works on the user’s machine, but what “variant” gets built is unspecified.

(This is probably what @oscarbenjamin has been trying to say by referring to variants getting created via cibuildwheel, but I’d missed the point until now).

I can definitely go along with that, especially if we say “for now.” I do think there’s room to improve the lives of the packagers, too. It’s just not the first priority and isn’t needed to improve the lives of package consumers.

dhellmann · May 17, 2024, 9:13pm

For the vast majority of packages it won’t matter. So keeping it optional, and implementing it in warehouse, devpi, and a few other commonly used tools may be enough. I may be able to find resources to help with that (definitely for warehouse), if there’s interest in the tool maintainers.

Sounds good to me.

dhellmann · May 17, 2024, 9:17pm

Paul Moore:

dhellmann:

These libraries are dependencies of other packages, many of which don’t have their own optimizations (they are applications or abstractions on top of the library with the optimizations).

This is where I think we need to look at actual use cases. As a user, I can only ever imagine wanting one of two things: “the best variant for my hardware of everything that gets installed”, and “make sure everything works on x86_64_v2 because I may be copying the environment to other environments and that’s the lowest common denominator I want to support”. Are there other realistic possibilities that I’m not thinking of?

In particular, I don’t see a need for a package to ever explicitly demand the x86_64_v4 version of python-flint, because it shouldn’t care. I have a recollection that @oscarbenjamin mentioned APIs that might only exist if certain hardware instructions are available, but I would assume that the API would always exist, and simply error if the hardware support wasn’t there. And a package depending on python-flint needs to cater for the possibility of that error. So I don’t see why variants would appear in dependency specifications. Do you have a concrete scenario in mind?

No, I think what you’re saying lines up exactly what what I had in mind. My main point was that we shouldn’t worry about what a user types to install torch, we should think about what the user types to install the thing that uses torch.

I could however envision, in the future, a situation where torch itself moves some of its code out into separate packages and those packages are just compiled C/C++/Rust code. Something similar to the packages in the standard library where there is a pure python implementation that’s replaced by a compiled version for better performance. That’s now how the code is organized today, but if we expose these new tags via the marker syntax it could be organized that way in the future.

dhellmann · May 17, 2024, 9:17pm

Only for the override case, but yes.

dhellmann · May 17, 2024, 9:18pm

OK, we’re on the same page.

dhellmann · May 17, 2024, 9:21pm

That’s part of why I like exposing the selector expression (is that the right name? marker expression?) in the HTML tags. You maintain the same ability to get everything you need to pick a distribution in that first call.

Good point, yes. Relying on client caching to improve service performance isn’t going to be reliable, even for clients that try to be well-behaved.

pf_moore · May 17, 2024, 9:21pm

Warehouse has it already. I don’t know about devpi. Proprietary index providers like Artifactory are often slow adopters (for obvious reasons). The case I saw recently was a custom proxy, which proxied the index page and the wheels, but didn’t proxy the metadata files (because they aren’t mentioned explicitly in the index page).

I agree that 99% of the time it’s not important. But given the size of things like the wheels for torch, we really don’t want to download the full wheel just to say “nope, wrong variant”. We can worry about the cases where things go badly later, but it would be good not to forget about them completely.

dhellmann · May 17, 2024, 9:57pm

Absolutely. I just meant that if we do the work in some of the places we know are dealing with large wheels now, then the maintainers of the other tools can update if/when the problem affects them or their users. And the only work I would do in the installers to cope with an index that does not support the new metadata standards is to have a fallback so if there’s no way to resolve the right wheel, there’s a deterministic default. I say that based on the assumption that someone using an index that doesn’t support the new standard could continue to do what they’re doing today (running an index for each variant, or whatever) and still be functional.

oscarbenjamin · May 17, 2024, 10:08pm

I realise that the discussion around markers etc suggests changing the format of how this is specified but I don’t know what the format would be so I’m going to stick with the selector.toml format for now.

These two cases are very different. I described the first case in the OP and suggested that you would have a wheel-selector.toml like:

# python-flint-0.6.0-wheel-selector.toml
[wheel-selector]

variables = ["x86_64_version"]

[selector.x86_64_version]

requires = ["cpuversion >= 1.0"]
function = ["cpuversion:get_x86_64_psABI_version"]

wheel_tags = {
    x86-64 = [""],
    x86-64-v2 = [""],
    x86-64-v3 = ["x86_64_v3", ""],
    x86-64-v4 = ["x86_64_v4", "x86_64_v3", ""],
}

Here the idea is that there are different levels and of CPU support and you would ideally want to select the best one but others can also work.

For the CUDA case it is different because the variants are incompatible. In this case the selector file for cudf would be like:

# cudf-0.6.1-wheel-selector.toml
[wheel-selector]

variables = ["cuda_version"]

[selector.cuda_version]

requires = ["cuda_selector >= 1.0"]
function = ["cuda_selector:get_cuda_version"]

wheel_tags = {
    11 = ["cu11"],
    12 = ["cu12"],
}

The important differences are that:

In the x86_64 version case it is just about selecting a preferred wheel. There is always a fallback “generic” wheel that could be used. There might be some need/desire for users to be able to control which wheel is chosen since there are multiple valid choices.

In the CUDA version case there is only one correct wheel so no concept of preference applies. There is no generic fallback and it does not really make sense for users to influence the choice.

More generally the GPU case can also involve more complicated things depending on exact GPU etc but I am just presenting the simple cudf scenario that was described above.

Another difference is that in the x86_64 case it is really just a question of which wheel you used for python-flint and there are no questions about compatibility between different python-flint wheels and any other packages: the python-flint wheel just needs to be compatible with the CPU.

In the CUDA case all packages using CUDA within the same venv need to use the same CUDA version so as soon as one package is installed all future packages need to match. One way to handle this is that they all depend on some cuda-base package that fixes the version. An obvious thing would be that cuda_selector could check which version cuda-base is using but that raises another question:

Which venv does cuda_selector run in?

Is it an isolated venv or does it run in the same one that the packages will be installed in and does that mean that it can access the already installed packages?

dhellmann · May 17, 2024, 10:57pm

One fallback might be to not run accelerated at all. That approach might not work for other types of selectors, though. But in that case, if there’s really no built package that matches the system, and there is no fallback, the appropriate thing to do is either install from source or say the requirement can’t be resolved.

Oscar Benjamin:

Another difference is that in the x86_64 case it is really just a question of which wheel you used for python-flint and there are no questions about compatibility between different python-flint wheels and any other packages: the python-flint wheel just needs to be compatible with the CPU.

In the CUDA case all packages using CUDA within the same venv need to use the same CUDA version so as soon as one package is installed all future packages need to match. One way to handle this is that they all depend on some cuda-base package that fixes the version. An obvious thing would be that cuda_selector could check which version cuda-base is using but that raises another question:

Which venv does cuda_selector run in?

Is it an isolated venv or does it run in the same one that the packages will be installed in and does that mean that it can access the already installed packages?

That’s a great thing to call out. Why would some existing package have a different result for the CUDA type? What leads to that situation? Maybe the user was explicit about what they wanted in one invocation of the installer (passing the --tag option the first time, but not the second)? Or maybe the hardware was swapped out? Or something else?

oscarbenjamin · May 18, 2024, 11:14am

Doug Hellmann:

No, there’s not a predefined way, today. I’m proposing that it could make the implementation of the installer easier, at the expense of having to change the hosting standard, as well as the wheel format. I don’t really think there’s a way to provide the extensibility we need without doing both.

If, instead of putting every tag in the filename, the filename had the standard tags it has today and then a single hash of all of the rest of the new (potentially arbitrary tags), that would give us unique filenames. The actual, parseable, metadata for which tags apply to a dist could be (a) in the wheel itself and (b) in the HTML anchor tag on PyPI’s simple index.

Other values like python version are already part of the HTML in the data-requires-python attribute, for example (Simple repository API - Python Packaging User Guide). We could add an attribute like data-selector-rule with an expression to be evaluated.

There is also a separate existing standard for pointing to a metadata file outside of the wheel. I don’t know if the installer uses that for making choices about which package to grab, but extending that existing file could be better than adding a new one (Simple repository API - Python Packaging User Guide).

Basically, we need a way to express “use this dist when X is true”, and the marker syntax already has some of those sorts of rules. Extending that syntax, and exposing it in more places, would make the evaluation of the rules consistent (and the code reusable).

Using marker syntax makes a lot of sense since it is already very complicated and powerful enough to express any logic if the markers can be extended.

Part of my thinking with suggesting a separate *-wheel-selector.toml file is that it also works when you have a wheelhouse i.e. you ask pip to find packages in a local directory like

pip install --no-index -f ./wheel_directory foo

If you can put the *-wheel-selector.toml there as well as the wheels then it can still work. That’s why I put project name and version in the name of the file like python_flint-0.6.0-wheel-selector.toml.

I assume that the logic here works like:

pip wants to satisfy a requirement foo >= 1.0.
pip asks the repo (or local directory) for a list of foo versions and then (usually) chooses the most recent i.e. foo == 1.6.
pip then asks the repo (or local directory) for the list of files associated with version 1.6 of foo.
pip then uses hard-coded rules based on the filenames to select which wheel or sdist it wants from the repo or local directory.
pip then downloads/installs from the repo or local directory.

If the selector.toml can be checked for at step 4 (using only the filenames) then usually it won’t be there and wheel selection proceeds as normal. If it is there then step 4a is to read the selector.toml and see what rules it specifies and step 4b is to apply those rules to choose a wheel or sdist instead of the normal rules.

I didn’t realise that data-requires-python is provided separately when using an index. Is that just ignored when using a wheelhouse?

pf_moore · May 18, 2024, 12:36pm

It’s read from the wheel metadata when you use --find-links. The data-requires-python field in the index is just a copy of what’s in the metadata (for performance - hence my earlier comments on the way the finder works).