Implementation variants: rehashing and refocusing

While looking up the current macOS platform compatibility tags, I stumbled across this issue where it turns out the “optimise for portability or for the current system?” logic for macOS is already problematic due to x86-64 wheels being preferred to universal ones when both are available: Order of architectures in platform tags on macOS · Issue #381 · pypa/packaging · GitHub

That’s not an issue, it’s the right thing to do. And it has worked just fine for several years now. The arguments around universal2 being more portable are a bit nonsensical. If the installer knows it’s on x86-64, then grab the x86-64 wheel. It has very little to do with portability, unless you want to manually copy envs around between machines (which is unsupported by Python packaging). universal2 wheels are an anomaly, and it’s debatable whether they should exist at all, just like we don’t have combined 32/64-bit Windows wheels. Please do not mix that into this discussion - it’s hard enough to even follow along with this thread already.

1 Like

Copying environments between compatible machines is entirely supported. You have to be careful about it to ensure you don’t mess up and inadvertently depend on things you’re not shipping, but it’s supported. There are lots of packaging tools that rely on this (including some that are mentioned in the linked ticket, but also things like conda-pack, shiv, and more).

The reason the issue hasn’t been pushed in relation to the status quo is because there are ways to ensure the x86_64 wheels are excluded when the environment builder knows they really want the universal2 ones, and the problem only comes up when specifically building portable environments.

The connection to this thread is just to emphasise that we definitely need at least the ability to choose between “optimise for this hardware” and “optimise for portability” when selecting between available variants (recognising that that’s likely to actually be a spectrum between “highest performance with most restrictive hardware selection” and “broadest hardware compatibilty with either lowest common denominator runtime performance or larger packages containing multiple implementation variants”).

This is definitely a critically important part of this addition, but I think we need to do better than making users choose between portability and speed at package install time. Instead, I hope we can get to a place like we’ve discussed above, where the variant can be somewhat orthogonal to the exact package specs, and we can have 2 different environment specs that serve the purpose of capturing each of a portable and a reproducible set of specs. The default install behavior should be to always install some attempt at optimal hardware capability, but it should be easy to create environment specs that are more portable, and just as easy to take portable specs to a different system and use them to create a hardware-optimized environment there.

The user experience at question in my mind is whether people are creating environments iteratively from the CLI, then “freezing” them to create the environment spec to pass around, or instead writing a more complete environment spec file, then creating an env from it. The former case here will be really hard to retroactively make portable. The latter case should be fine.

1 Like

I think we have a very similar idea for how this might work, Doug. I think we need more than just iteration over variants, though. I think we need a resolver step that operates on variant variables and values. This would give us the ability to express relationships among variant variables and values, which is going to be really important for things like mutual exclusion relationships between OpenMP and BLAS implementations.

Here’s a rough diagram.

I’m working first on a tool that produces packages with some prototype variant metadata, but coding this is my next step.

The “backtracking” that I propose here is hopefully to just use resolvelib’s existing code. I haven’t looked at it closely, but the conceptual problem is the same. It may differ in how the constraints get formulated for input to the solver, but the idea is to avoid the need for any more complicated separate solver implementation.

1 Like

My immediate thought is that “collect variants by recursing dependency tree” is problematic, because the dependency tree doesn’t exist (as a completely defined entity) at this point. Because Python packages can have different dependencies depending on which version or even which wheel you select, the dependency tree is only discovered incrementally, during the resolution process. That’s the key point which I’ve been trying to get across, but no-one seems to be picking up on.

For a very simple example, suppose we have package A. Version 2.0 of A depends on B, version 1.0 does not depend on B. B has a variant X. If I request installation of A, do I need variant X? At this point, I don’t know whether I’ll pick A 1.0 or A 2.0 (it might depend on whether A 2.0 has a compatible wheel for my platform, for example).

Now imagine that happening at the bottom of a deep dependency tree, with boto (which has hundreds, if not thousands, of versions) somewhere in the middle.

No practical algorithm exists which allows you to consider “the dependency tree” as a concrete, fully-known, entity. This is a fundamental complexity of Python packaging, which most other dependency resolution problems don’t have to deal with. We’ve had extended discussions about the possibility of requiring all wheels for a given version of a project to have the same metadata (which would be a step towards addressing this, but would not be sufficient by itself) and even that has proved impossible to get consensus on.

The purpose of the dependency tree idea is to allow discovery of the set of variants that needs to be considered for a given set of packages. If discovery is too hard, we can back off and just rely on some public list of all variants, along with the user’s choice of which ones to pre-install to enable. It puts more onus on the user to opt-in to these things, rather than be given choice to enable them where relevant.

I have been envisioning inverting this problem. You don’t ask whether you need a variant for a particular package. You ask what set of variants is “best” according to some priority ordering, and then you take the corresponding package sets (which kind of behave like mini-indexes), and install packages using the normal algorithms in these sets.

One thing I haven’t really settled on is how variant-less packages fit in here. They should probably be available in every variant package set, so that we can fulfill dependencies. Maybe another way to do it is to say that “the installation from the variant package set only installs packages that have variants” and “dependencies are satisfied from the non-variant package set as a later step.”

This is all hand-waving in the absence of a working demo, but I think it is worthwhile to build that demo and poke at it to understand what the hard limitations might be.

FWIW, I think this is helpful and I support it. Maybe in the context of variants, there’s room for a higher-order definition that relaxes some of the need for every wheel of a given version to align. If you resolve variant values first, then can you say that every package with a given variant value must have the same metadata? This allows lots of nice variation between variant values while also preserving the benefits of metadata sameness in other ways. Older installers that do not understand variants would just not see these packages at all - it would require some specification of variant to see them. This is related to the way that PyTorch handles their wheels, where the variant metadata is encoded in the index folders instead of in each wheel itself.

1 Like

I thought that changing the syntax of requirements would invalidate all metadata standards that involve requirements e.g. that would require a metadata 3.0 anyway. Part of my thinking by suggesting to use extras was that requirements syntax would be unchanged so that at least some standards and tooling based on it would be unaffected:

Also the reason that I proposed using the platform tag in the wheel filename is because I hoped that older installers would ignore the wheels with unrecognised platform tags. More details need to be worked out but I hoped that there could be a way for a project to upload wheels/sdist that make things not worse for old installers while still achieving new behaviour for new installers.

2 Likes

That would be the reason I described the approach of adding new fields to improve backwards compatibility as counterintuitive :slight_smile:

The trick is that it is changing the permitted contents of existing fields (whether syntactically or semantically) that causes problems for existing clients. New fields are inherently ignored by existing clients, so as long as the “status quo” metadata continues to be published in the existing fields in the same way it has historically, it is possible to avoid a major version bump.

You definitely incur extra complexity doing things that way, but the pay-off is in smoother potential rollout plans for new functionality.

The situation with wheel filenames is similar: as long as older clients see the filenames they expect for default variants, the primary consideration for non-default variants is that older clients should fail to install such wheels rather than seeming to succeed but getting their installed distribution metadata wrong somehow. Putting the variant info in one of the existing fields may prove a convenient path to that outcome, but adding a completely new field may be judged even better (since it will fail early, at the filename partitioning step)

1 Like

I was imagining that you could do something similar with extras. In the python-flint case the current situation is that you have one wheel for Windows which is what gets installed:

python_flint-1.0-cp312-cp312-win_amd64.whl

Hypothetically in future we add some other wheels so it looks like:

python_flint-1.0-cp312-cp312-win_amd64.whl
python_flint-1.0-cp312-cp312-win_amd64+x86_64_v4.whl

Old versions of pip ignore the new file and continue to install the same wheel as before. New versions of pip allow you to select the alternative wheel explicitly:

pip install python-flint[x86_64_v4]

In the python-flint installation instructions we tell users something like:

Run ... command to find out if your CPU has AVX512, then install the latest version of pip and run pip install python-flint[x86_64_v4] to get a Flint build with the fft_small module and assembly enabled.

In future someone might want to add this sort of thing to a distribution requirement somewhere so you have an sdist with:

Requires-Dist: python_flint[x86_64_v4]

That is a problem because then both old and new installers could see this metadata (unless it only exists in wheels that old installers would ignore). A solution is that the default wheel can have an empty extras field sort of like:

extras = {
   'x86_64_v4': []
}

Then old installers install the old wheel and find the empty extras specification and consider it satisfied. New installers could know to check for variants metadata rather than just extras and then use that to select the other wheel.

This approach works for the python-flint case where there is always a clear fallback that is the acceptable even if suboptimal status quo wheel. I’m not sure how a good fallback scenario works for other cases under any of the proposals. In cases where a basic pip install foo already doesn’t work (need to use a custom index etc) then I guess the fallback is less of a concern.

There has been a lot of confusion about this sort of thing in this thread so let me be clear that for python-flint in particular there would never be a reason for another project to require a particular variant like this. I mention this only as a hypothetical example to consider how installing could work with this metadata.

The challenge here is that in isolation, you can’t tell whether this is referring to an extra or a variant. That ambiguity is a problem, not a benefit since it means the build tooling can’t provide any hints that this might pose a backwards compatibility problem with older installation clients.

By contrast, if new syntax is defined, then this would be disallowed:

Requires-Dist: python_flint(x86_64_v4)

And the build tooling could recommend replacing it with this:

Requires-Dist: python_flint
Provides-Extra: x86_64_v4
Requires-Dist-Variant: python_flint(x86_64_v4); extra == "x86_64_v4"

and handling detection of the more optimised version of the dependency at runtime.

There’d still be some compatibility issues with that (old tools wouldn’t handle dependencies declared on the python_flint[x86_64_v4] extra correctly, since they’d ignore the Requires-Dist-Variant field), but if anyone did run into problems, the discrepancy would be much easier to detect than it would be if the only way to identify the problem was to look at the content of the extras definitions, rather than the use of a new metadata field that old installers ignore.

Some of those potential problems could also be mitigated by having PyPI initially require that variant dependency declarations be limited to non-default variants until variant support in installation clients became more widespread. That is, putting the above example on PyPI having to be written as:

Requires-Dist: python_flint
Provides-Variant: x86_64_v4
Requires-Dist-Variant: python_flint(x86_64_v4); variant == "x86_64_v4"

That way, default variants wouldn’t be able to transitively bring in dependencies on non-default variants, you’d only get one by explicitly requesting it at the top level of the installation request, which would only be accepted if the installer being used understood build variants.

It’s not elegant, but it’s still a smoother transition path than having to bump the major metadata version, or having every declaration of a dependency on an extra becoming a potential installation inconsistency trap.