Selecting variant wheels according to a semi-static specification

steve.dower · May 21, 2024, 8:09pm

This is essentially how PEP 517 works, and there’s no reason to use a different pattern from that.

pf_moore · May 21, 2024, 8:14pm

One reason to use a different pattern would be performance. PEP 517 is triggering a build, so the performance of the hook is irrelevant. I’ve frankly forgotten the underlying proposal here, but if it’s something that could get called multiple times in the wheel selection process (the “finder” in pip’s terms) performance might be a lot more important.

But at this point, it’s sufficient to conclude that it’s possible, and go back to the more important parts of the proposal.

dhellmann · May 21, 2024, 8:37pm

I like the consistency of using the 517 pattern.

steve.dower · May 21, 2024, 8:40pm

It’s hard to get any more performant than import/call. Non-Python package managers don’t really have a choice but to launch a Python interpreter to interpret Python code.

Yeah, I think this idea applies to a selector package - a package that instead of installing files actually contains a plugin that knows how to determine which packages should actually be installed. So yes, there could potentially be multiple, but only really in a large install process anyway.

I personally still prefer adding more specific platform tags, but it looks like we need to overcome the rapid expansion caused by manylinux first. (Though even in this case, an extension that determines which other platform tags should be included might be valuable.)

dhellmann · May 21, 2024, 8:42pm

The installer should need to invoke each selector at most once to collect its values and then cache them. There may, of course, be multiple selectors for the full set of target packages.

I think the slowdown of having to invoke the selectors is still a better experience for a user than what we have today. And in the future, if it’s deemed a problem, we could look at other optimizations.

pf_moore · May 21, 2024, 9:43pm

Ah. PEP 517 is “run in an isolated interpreter”, not import and call. And of course there’s security risks in importing untrusted code into a pip process (which might be running sudo, much as we’d rather nobody ever runs sudo pip).

As I say, though, this can be thrashed out later, once the overall idea is looking plausible.

oscarbenjamin · May 21, 2024, 10:15pm

New versions of the selectors could be released with bugfixes or different semantics. I suggested in the OP that the *-wheel-selector.toml package have a requirement for the selector:

requires = ["cpuversion >= 1.0"]

The resolver might need to try different versions of the package to install and those might have different requirements for the selector package. In principle you could assume that values returned from nominally compatible versions of the selector can be cached but I can imagine various situations where this goes wrong because of an old/buggy value being cached.

And there’s still the question of which environment the selector runs in…

dhellmann · May 22, 2024, 12:45pm

Good point. I wonder how often that’s likely to happen in practice. I could certainly see newer versions of target packages needing newer versions of selectors (say if the selector learns to detect a new version of an accelerator, for example).

What happens if 2 versions of the target package depend on 2 incompatible versions of the selector package? How is the installer supposed to decide when to use the response from each selector? Is that an issue? If the selector value is being used in the context of choosing a distribution from a given version of the target package, maybe not?

I think that lets us cache the responses, the cache key just needs to include the version of the selector.

If we’re following 517 rules, it would run in an isolated environment unless the installer is told to run not isolated, at which point the user will need to have already installed it like the other build system dependencies, right? That precludes a selector from looking at what else is installed (or being installed) in the virtualenv, but I don’t think we have a strong use case for examining that information, do we?

pf_moore · May 22, 2024, 1:00pm

Please let’s not assume PEP 517 rules. They are an utter pain to manage, and having to use them for wheel builds as well as sdist builds would just be a pain (especially for anyone who wanted to create a wheel-only installer tool “because handling sdists is the hard bit”).

Examples of why PEP 517 rules are hard:

If a selector needs a selector, you need nested environments. A malicious selector could cause an infinite loop, “fork bombing” the user’s PC. Yes, pip has explicit code to prevent this for sdists.
How do you handle installer options? Are they inherited when you install the selector in the isolated environment? Some (like network settings) should be. Some (like --target) shouldn’t. For some it’s not clear (--index-url? --no-binary?)

In many ways, I’d prefer a process based hook implementation - the hook is a named executable that must be available on $PATH and you pass the input on stdin and get the output on stdout. It’s completely transparent, and language independent.

But as I’ve said before, let’s worry about higher level issues before getting into implementation details.

dhellmann · May 22, 2024, 2:58pm

Makes sense.

Sounds good. Which questions not related to implementation are still open?

steve.dower · May 22, 2024, 3:09pm

(I’m going to use “flavour” to imply any of the additional variants we’re covering, such as SIMD set or GPU type/version, as well as similar things we may come up with in the future.)

what name do we use instead of “flavour”?
do we embed the additional flavours in package names, ABI tag, platform tag, or update the wheel filename spec to allow for a new tag?
are the additional flavours defined centrally or can publishers define their own?
is the detection of flavours defined centrally or can publishers detect it on their own?
how do other packages depend on either a “flavourless” package or a specific flavour?
can multiple flavours be installed simultaneously in the same environment?
how equivalent does package metadata (primarily transitive requirements) have to be for different flavours?
can/how does a user set their preferred flavours?
can/how do locking tools handle flavours?
how is fallback handled when no suitable flavour is available?

pf_moore · May 22, 2024, 3:18pm

Maybe it’s time to put the proposal into (pre-) PEP format?

Things that come to mind for me:

I’m not at all clear what the point is of the “variables” part of the wheel-selector.toml file.
If multiple tags get selected, I assume the TOML file says what order to prefer them in.
What, if any, options exist for sharing wheel-selector.toml files? Having to include one for every version of your package seems excessive.
What about cross-package dependencies? Packages A and B may both use any one of a number of BLAS implementations, but it may be essential that they use the same implementation.
What if the wheels have different dependencies? Is there any need for that (I imagine so, as GPU-based code may need to depend on the appropriate GPU support library)? How would a resolver handle that? At the moment, for example, pip’s algorithm is roughly: pick a package/version, select the “best matching” wheel, get the dependencies from it and add them to the list we need, rinse and repeat backtracking if things go wrong. Once things go wrong with a particular version of a package, we discard it - we wouldn’t, for example, try a different wheel. The implementation details don’t really matter, though, as much as the expectation that only one wheel for a given version is ever picked as a “candidate”.

A couple of examples, worked through in detail (including explicit code for a selector, because I’m honestly not sure how, for example, I’d detect what CPU extensions are available in a Python hook) would likely also be useful. From a personal perspective, I’m interested most in knowing how the installer (specifically resolution) process should work.

dhellmann · May 28, 2024, 2:05pm

I can work on that. Can you point me to the process docs? Is it the standard PEP process or is there something separate for PyPA?

pf_moore · May 28, 2024, 2:31pm

Basically the standard PEP process, with some minor packaging related adjustments documented here.

msarahan · May 29, 2024, 5:39pm

I’ve been working on this in a Google Doc at Packaging metadata extension - Google Docs

If there’s a better way to collaborate somewhere else, I’m up for whatever. I’ve been studying the pip, uv, and packaging code that generates tags, with the goal of having a customized installer (probably pip) that demonstrates the behavior. I haven’t started coding that yet, but I think I’m about ready.

This said, if you’d like to take the reins, Doug, just let me know if/how I can help.

msarahan · May 30, 2024, 1:22am

How about “variant”? Not trolling. Seriously. I can’t come up with anything better.

I’ve been assuming that these are additions to the platform tag, like you describe above:

I’ve been assuming that publishers should be able to define their own. In trying to cobble together some examples, I’m not so sure it makes sense. I have been tinkering with a few different use cases:

“accelerators” - this is the CUDA/ROCm/TPU use case. My assumption here is that users can specify one or more accelerator families to enable, and that they are not mutually exclusive.
“cpu” - the SIMD stuff that Oscar has described so well.
MPI - where implementations are mutually exclusive. Users must specify which implementation they want.
OpenMP - implementations are mutually exclusive, but users probably don’t need to specify which implementation they want. Instead, the first implementation installed should “win” and activate that variant, which would make it impossible to install a package built with other implementations.

In these use cases, I started out with a meta-selector for “accelerators” and then had selectors for CUDA version and arch. I don’t think that separation really makes sense. I think it makes sense to lump in CUDA, ROCm, TPU and whatever else into one logical unit that people collaborate on. Maybe I’ll regret that idea, but there’s also nothing stopping anyone from making their own selector.

I think you mean some kind of advertisement of variants that are available across the collection of packages. Is that accurate? The installer workflow that I’ve had in mind goes something like:

Installer requests index metadata
index metadata contains list of variants (dimensions, axes, labels, whatever, NOT values).
Installer uses that list to initialize variant values from the system/environment
Installer uses variant values to create tag candidates

If variants are encoded in the system platform tags, I’m not sure this is relevant. There’s no such thing as depending on a variant (unlike what we have today with something like cudf-cu12).

As I mentioned above, I think there are several behaviors that we’d need to account for. I think these would need to be accounted for in the detection program, and as such, that detection program would need to be aware of the environment that it was running in.

I’m operating under the assumption that the core metadata file must be similar across all wheels. I expect that differences in requirements will have to be expressed with environment markers. That does seem to point to a need to extend the environment marker scheme so that it is also using information from the variant detector programs.

Users definitely should have ultimate overrides over anything, and I also think that overriding detected values will be a key part of package building. I was imagining the usual environment variable schemes, along with maybe entries in pip.conf or some new config, since the idea is to be independent of any single installer. This new config should be shared by any “detector program”

I don’t think there’s anything special here - locking tools can use the variant detector stuff the same as installers. There is potential for override, though, or for locking with some matrix of values, such that a lockfile might have many variants to choose from, and only one gets used when creating an environment with the lockfile. This is somewhat like the multi-platform lockfiles that conda-lock does.

Is an sdist available? If so, try to build it. If not, provide an error message that clearly states which package(s) did not have a matching set of variants, and also provide the current values for the variants.

msarahan · May 30, 2024, 1:53am

Paul, is this kind of like the setuptools entry point executables on Windows? My inclination is to do this in Go or Rust, because I don’t trust myself with C/C++. Does that make sense to you, or did you have a different way in mind to make a small standalone executable for these?

This is a really important part of the motivation for this effort, and it’s essential to get this right. The metadata tag stuff should be able to keep things aligned, but legacy stuff that doesn’t express tags will catch people for quite a while - until version constraints effectively age things out of consideration.

Is it accurate to say:

given a library A, that adds transitive dependency B
introducing library C, that conflicts with A

that transitive dependency B will be discarded along with A?

If so, then I’m confident that we’ll be fine. One thing I don’t really understand, though, is the concern that you have for the number of tags that need to be checked. The variable space is combinatoric, but I think we can keep the tags manageable with some basic rules. As long as the rules keep the variants presented in the same order, it seems tractable. The rules I had in mind are:

Preserve the position of older system platform tags (unless we decide to omit them)
Sort the variant programs alphabetically, or alternatively allow the variant program to set its “weight” that would control how early in the tag it should come.
variant weight/position must be global across the package ecosystem. No one can decide that they want a different order of platform tag for their packages.

Resolution is not a matter of combining all the variants that the user has on their system. Resolution is only a matter of getting a list of the variants that a given package reports using, and then building the tag set with that set of variants.

Does that make sense, or am I missing something? I suppose preferences when more than one variant is acceptable could get a lot more involved.

As a concrete example, JAX already breaks out their builds by CUDA version and cuDNN version in their package collection at https://storage.googleapis.com/jax-releases/jax_cuda_releases.html. This is a realistic approximation of how many files there would be, and what data would be in the filename. Of course, the metadata is in PEP440 local version, not the platform tag, but the idea remains - we’d sort filenames according to user preferences, and then try them in order.

My document describes hashes in the filename as an ultimate differentiator. I think in the vast majority of cases, the filename sorting approach will be enough, and by far the fastest way. The hash and the extra metadata that it represents would be needed to differentiate between two wheels that were identical aside from their hash. If that’s the case, then we’d need to download the extra metadata and compare it with our preferences.

pf_moore · May 30, 2024, 10:16am

The idea I had (which is mostly unformed, so you’ll need to thrash out the details yourself) is just that instead of defining a hook as calling a Python function, you define it as running an executable. Parameters are passed as command line arguments, and the return value is just the process stdout (presumably as a JSON document if you want structured values). The person writing the hook can do so in any language, as long as running the command works (which probably requires a .exe on Windows, because OS support for scripts is patchy). PEP 516 was an alternative to PEP 517 which used a process based implementation, if you want to see an example.

Honestly, having just gone into a bit more detail here, I’m not sure it’s as workable as I’d hoped (the fact that you can’t do subprocess.run["foo-hook", *args]) on Windows to run foo-hook.py makes writing portable hooks annoyingly tricky).

The main point I was making is that the PEP 517 approach of trying to require hooks to be written in Python, but run in an isolated environment, is far more complex to get right than we’d hoped, and there’s no guarantee that because tools have implemented PEP 517, that infrastructure is reusable for these new hooks.

So the PEP will need to have a really good discussion on the transition process. “This will catch people out for quite a while” definitely isn’t an acceptable transition plan

I don’t quite follow what you mean here. Are you saying that A depends on B, and B depends on C, but C is incompatible with A? That’s a broken dependency graph and nothing can be validly installed in that case. But broken dependency graphs are a bug, and need to be fixed - yet you seem to be thinking here as if this is something normal that resolvers expect to handle as a matter of routine. So I think I’m missing something.

The concern I have is around whatever replaces or extends packaging.tags.sys_tags(). On my relatively straightforward Windows machine, I have 42 entries in that list. On Python 3.11 on Ubuntu (WSL) there are 888! If you add in CPU architecture and GPU tags, just to give some simple examples, that 888 will start to multiply fast. If my machine supports x86_64 v4 instructions, I’ll need to multiply by 4 (x86_64_v4, x86_64_v3, x86_64_v2 and x86_64_v1). Plus any other tags that might apply, like avx2. Then multiply again by whatever tags apply for my GPU (a few CUDA versions, for example) and we end up with thousands of tags.

Every wheel is checked for matches against the list of supported tags - and I think the design means that search has to be linear. That’s a very costly matching process.

Furthermore, there’s the prioritisation algorithm to consider. Is a (x86_64_v4, cuda11) wheel higher or lower priority than a ( x86_64_v2, cuda12) wheel? We can’t defer that choice to the user - no user is ever going to be able to answer that question.

My point here is simply that the wheel tag matching algorithm and design simply weren’t written to scale to this sort of level. Even manylinux and the fun MacOS architecture support gets up to is pushing its limits, so adding yet more multiplying factors is by no means guaranteed to work.

It’s not about how many wheels need to exist, though. It’s about the fact that the search algorithm needs an explicit enumeration of the full combinatorial set of possibilities (and potentially needs to linear search against that algorithm for every one of those wheels).

It may work - after all, even the current number of tags is outside what we originally expected - but you’ll need to make sure you establish that properly. “It seems tractable” isn’t going to be sufficient, I’m afraid.

In pip terms (which is what I’m most familiar with) it’s not resolution that matters, it’s wheel selection (the “finder”). We look at each package/version combination, pick a “candidate” by selecting the best matching wheel based on what tags the wheel has and what tags the system supports, and then feed those best candidates into the resolution process. We never backtrack into the selection process from the resolver - if the wheel we chose doesn’t work, we discard that package/version from consideration, and try something else.

This is why I was talking about BLAS versions. If the wheel selection chooses a BLAS-1 wheel for A and a BLAS-2 wheel for B, nothing in the resolver can ask pip to try a different wheel for A - the resolver will simply say “A and B are incompatible”. And the finder has no global context - it doesn’t do any matching between what tags get selected for A and B, it considers each one in isolation.

Add to that the fact that the finder is designed to be fast. It skims the index (which as you’ve shown, could contain hundreds of wheels for one package version) picking out one candidate per name/version as quickly as it can, before passing that much shorter list onto the resolver, which is the slow (in terms of big-O performance) part. Resolution is basically an exponential algorithm, and passing multiple candidates per name/version will multiply the input size, blowing up performance to unacceptable levels very, very quickly.

Of course, all of the above is implementation details, and isn’t set in stone. But a proposal that involves “rewrite the installer resolution algorithms” as a starting point won’t get very far.

Hopefully, all of the above matches with what you’ve discovered in your preliminary investigations into the existing code. So maybe you already have answers to the concerns I’m raising. If not, then I hope they act as pointers for you.

Ultimately, a good prototype implementation plus some non-trivial performance benchmarks are likely to be critical for getting this PEP accepted.

msarahan · May 30, 2024, 1:38pm

Thanks for the insight. I had not realized that backtracking from the resolver to the selection process was not an option.

I’ll try to come up with a different way to do things. My first gut feeling is to try a scheme that filters packages based on variants, then passes that into the finder. I recognize that all of this is speculation and none of it means anything without a prototype to study behavior, but I’m mostly posting this to iterate on ideas out in the open.

Installer recurses the dependency chain, downloading variant info for each package. I believe that a global approach to the variants will be necessary for ensuring alignment, given what Paul has said about how the finder works. Variant info looks like:

{
    "variant_name": {
        "most_preferred_value":
           "excludes": ["another_variant_name==somevalue"]
           "compatibility_range": ""
        "less_preferred_value": {}],
}

This is Oscar’s design, more or less, but with some additions to express compatibility or incompatibility with other variants

The user specifies which variants to prioritize by listing them in order in some configuration. Prioritization can be done as:

maximization of number of packages in the graph that have a given feature with a given value (maximize usage of the feature)
maximize number of packages in the graph that have the latest value of a given feature (optimize for most preferred variant value over broadest availability variant)

The installer optimizes the collection of variants (possibly formulated as a resolver problem). This is close to how Conda used to operate, and I think the problem space of just variant names and values (NOT packages and dependencies!) will be small enough to be fast with a pure python implementation. Again, speculation, and I know this needs to be proven.
The installer uses the collection of variant values to select a subset of the available wheels for each package in the recursed, exploded package graph. This subset goes into the finder as it exists today.

steve.dower · May 30, 2024, 2:13pm

Jumping to a slightly different spin on this idea, how far did we ever look into having custom/plugin-provided environment tags?

Rearranging Oscar’s original example (and going back to claiming one package name for each variant), imaging installing python-flint and having these to choose from:

python_flint-0.6.0-cp312-cp312-win_amd64.whl
python_flint_neutral-0.6.0-cp312-cp312-win_amd64.whl
python_flint_x86_64_v3-0.6.0-cp312-cp312-win_amd64.whl
python_flint_x86_64_v4-0.6.0-cp312-cp312-win_amd64.whl

Obviously we grab the one named python_flint-0.6.0.... But what’s in there? Basically just metadata:^[1]

...
Install-Requires: python_flint_x86_64_v4==0.6.0; cpuversion == 'x86-64-v4'
Install-Requires: python_flint_x86_64_v3==0.6.0; cpuversion == 'x86-64-v3'
Install-Requires: python_flint_neutral==0.6.0; cpuversion != 'x86-64-v4' and cpuversion != 'x86-64-v3'

Then the trick is to make cpuversion available. I don’t think we’ve yet come up with a way to do any of our proposals without installing something extra first, but while that currently looks like a big deal, I imagine there’ll eventually be one popular tool and distros will just start bundling it. IOW, no reason to block an idea.

I get that having multiple package names isn’t as convenient as silently selecting the right one, but it also sounds like silent selection will either not work or not be silent. Splitting up into separate packages does help stay under PyPI’s quotas though, which ought to be very attractive for these kinds of packages (if you put all CUDA versions under one name, you’ll probably only fit one version at a time on PyPI).

Using environment markers as the selector mechanism also seems likely to be flexible enough. Otherwise we’re going to invent a new flexible-enough system, which will slow adoption time fairly drastically compared to reusing an existing one (though marker extensions will take time to get out there too I suppose).

IIRC, there’s a self[extra] version of this that would work too, but I don’t know the details well enough to write it up. ↩︎