Selecting variant wheels according to a semi-static specification

The idea I had (which is mostly unformed, so you’ll need to thrash out the details yourself) is just that instead of defining a hook as calling a Python function, you define it as running an executable. Parameters are passed as command line arguments, and the return value is just the process stdout (presumably as a JSON document if you want structured values). The person writing the hook can do so in any language, as long as running the command works (which probably requires a .exe on Windows, because OS support for scripts is patchy). PEP 516 was an alternative to PEP 517 which used a process based implementation, if you want to see an example.

Honestly, having just gone into a bit more detail here, I’m not sure it’s as workable as I’d hoped (the fact that you can’t do subprocess.run["foo-hook", *args]) on Windows to run foo-hook.py makes writing portable hooks annoyingly tricky).

The main point I was making is that the PEP 517 approach of trying to require hooks to be written in Python, but run in an isolated environment, is far more complex to get right than we’d hoped, and there’s no guarantee that because tools have implemented PEP 517, that infrastructure is reusable for these new hooks.

So the PEP will need to have a really good discussion on the transition process. “This will catch people out for quite a while” definitely isn’t an acceptable transition plan :slightly_smiling_face:

I don’t quite follow what you mean here. Are you saying that A depends on B, and B depends on C, but C is incompatible with A? That’s a broken dependency graph and nothing can be validly installed in that case. But broken dependency graphs are a bug, and need to be fixed - yet you seem to be thinking here as if this is something normal that resolvers expect to handle as a matter of routine. So I think I’m missing something.

The concern I have is around whatever replaces or extends packaging.tags.sys_tags(). On my relatively straightforward Windows machine, I have 42 entries in that list. On Python 3.11 on Ubuntu (WSL) there are 888! If you add in CPU architecture and GPU tags, just to give some simple examples, that 888 will start to multiply fast. If my machine supports x86_64 v4 instructions, I’ll need to multiply by 4 (x86_64_v4, x86_64_v3, x86_64_v2 and x86_64_v1). Plus any other tags that might apply, like avx2. Then multiply again by whatever tags apply for my GPU (a few CUDA versions, for example) and we end up with thousands of tags.

Every wheel is checked for matches against the list of supported tags - and I think the design means that search has to be linear. That’s a very costly matching process.

Furthermore, there’s the prioritisation algorithm to consider. Is a (x86_64_v4, cuda11) wheel higher or lower priority than a ( x86_64_v2, cuda12) wheel? We can’t defer that choice to the user - no user is ever going to be able to answer that question.

My point here is simply that the wheel tag matching algorithm and design simply weren’t written to scale to this sort of level. Even manylinux and the fun MacOS architecture support gets up to is pushing its limits, so adding yet more multiplying factors is by no means guaranteed to work.

It’s not about how many wheels need to exist, though. It’s about the fact that the search algorithm needs an explicit enumeration of the full combinatorial set of possibilities (and potentially needs to linear search against that algorithm for every one of those wheels).

It may work - after all, even the current number of tags is outside what we originally expected - but you’ll need to make sure you establish that properly. “It seems tractable” isn’t going to be sufficient, I’m afraid.

In pip terms (which is what I’m most familiar with) it’s not resolution that matters, it’s wheel selection (the “finder”). We look at each package/version combination, pick a “candidate” by selecting the best matching wheel based on what tags the wheel has and what tags the system supports, and then feed those best candidates into the resolution process. We never backtrack into the selection process from the resolver - if the wheel we chose doesn’t work, we discard that package/version from consideration, and try something else.

This is why I was talking about BLAS versions. If the wheel selection chooses a BLAS-1 wheel for A and a BLAS-2 wheel for B, nothing in the resolver can ask pip to try a different wheel for A - the resolver will simply say “A and B are incompatible”. And the finder has no global context - it doesn’t do any matching between what tags get selected for A and B, it considers each one in isolation.

Add to that the fact that the finder is designed to be fast. It skims the index (which as you’ve shown, could contain hundreds of wheels for one package version) picking out one candidate per name/version as quickly as it can, before passing that much shorter list onto the resolver, which is the slow (in terms of big-O performance) part. Resolution is basically an exponential algorithm, and passing multiple candidates per name/version will multiply the input size, blowing up performance to unacceptable levels very, very quickly.

Of course, all of the above is implementation details, and isn’t set in stone. But a proposal that involves “rewrite the installer resolution algorithms” as a starting point won’t get very far.

Hopefully, all of the above matches with what you’ve discovered in your preliminary investigations into the existing code. So maybe you already have answers to the concerns I’m raising. If not, then I hope they act as pointers for you.

Ultimately, a good prototype implementation plus some non-trivial performance benchmarks are likely to be critical for getting this PEP accepted.

1 Like