Implementation variants: rehashing and refocusing

Completely agree, but we’ll save that discussion for another thread :smiley:

1 Like

As a side effect of selecting different builds, variants will need the ability to select different dependencies though (e.g. a self-contained build may have fewer dependencies than a dynamically linked one). At that point, they may as well also be explicit about any variant consistency constraints that imposes.

(btw, I’m intentionally ignoring the complexities of actually relying on dynamic linking between different Python packages on platforms that don’t allow processes to mutate their shared object loading path at runtime the way Windows does. That’s a real technical problem, but Windows is a sufficient platform to prove out any related design concepts that come up in this thread)

That’s no different than any wheel though, right?

As part of introducing the concept of variants, I’d like to see us make the current assumption many tools already make explicitly valid: that wheels built from the same variant all share common dependencies, with environment markers used to describe any platform or version specific dependencies. (This would need a deprecation period, but eventually PyPI would disallow inconsistent uploads)

Variants would keep the freedom to have differences in dependencies though.

The current cases where projects publish wheels with differing dependencies that can’t be combined via environment markers would be required to migrate to publishing multiple variants instead.

1 Like

This is a good point. The primary objection as I remember it to the consistent metadata proposal was that some projects might want to ship wheels to PyPI that would have different dependencies from the sdist. The reason for wanting to do that is precisely to unbundle libraries like BLAS which has all the benefits I mentioned above. Having variants and allowing them to have additional dependencies provides a way to support that case while making the metadata differences understandable for tools that want to assume consistent metadata.

The unbundling issue has been a problem for a while but looms large at the moment with free-threading threatening to double the size of a PyPI release by doubling the number of wheels that bundle identical binaries of BLAS, Flint etc.

I was almost about to agree, but then I had some uncertainty in exactly what you’re defining as a “variant”.

You have a sdist that represents a distribution, i.e. package+version. That sdist gets built into potentially many different wheels, each of which (in the cases that matter here) have a different ABI compatibility matrix. Which of those things is the “variant” or is it a new, different “object”?

I’ve been thinking that we have “selectors” which are essentially the platform tags and other new ABI tags that combine to resolve which wheel to install once the package+version has been chosen. Meaning to me that the different ABI-compatible wheels are the “variant” for a specific package+version.

1 Like

My view is that variants need to be a new tier of object between distributions and wheels, along the lines of this tentative metadata sketch: Implementation variants: rehashing and refocusing - #44 by ncoghlan

(originally I thought variants would need to be a new physical artifact, but was later persuaded they could just be represented via an updated wheel naming convention)

In that approach:

  • distributions are unchanged aside from new metadata fields
  • variants would be a new concept to allow grouping of distinct categories within a distribution’s wheels (with each variant offering different runtime characteristics and potentially different dependencies)
  • wheels change such that all wheels built from the same variant are explicitly required to declare the same dependencies, with any differences indicated via environment markers rather than declaring different dependencies

Each project would still have an implicit or explicit default variant, so existing tools would just work with default variants without needing to know that alternate variants even exist.

Yes, the list of install dependencies for a given package will be different for different variants, in some cases. That’s an outgoing dependency, though, and those outgoing dependencies don’t need to include the variant information in the requirements specification. The dependency will either be unique to the variant (so there might only be 1 dist to download and it only makes sense to use that thing in the context of the variant build that depends on it), or the dependency will have its own variants that will be picked based on the selector parameters. It’s not necessary for those variants to match the variant of the thing that depends on them.

So, A has variants 1 and 2. Variant 2 is being installed, it depends on B. B has variants 3 and 4 for completely different reasons than A has variants. The selection process will have to decide whether to use 3 or 4, and that’s fine.

This is way more complex than I’ve been looking at it. There’s no need to model variants as a new type of object. They’re an update of the existing wheel format. We theoretically could just use more and more platform tags, but we’ve established that doing that is not practical. Conceptually, though, that’s all variants are.

I don’t know what “wheels built from the same variant” means. Wheels are the manifestation of the variation. There’s no difference in the source, only the arguments passed to the build tool when creating the wheel.

I like defaults, but they need to be optional. Not all projects can provide default builds.

1 Like

I’m still having a hard time understanding the hierarchy of objects involved in this vision of variants.

  • Are variants actual artifacts that are produced from an sdist?
  • Do we have multiple sdists each of which has different variant metadata (Requires-Dist-Variant, Provides-Variant)? If so, how do package managers turn a code repository into different sdists?
  • Are wheels built from these intermediate variant artifacts? Is this a separate step from the usual “code repo” → “1 x sdist” → “N x wheels” process?
  • Are the variants things you upload to PyPI or just additional metadata in existing artifacts (sdists and wheels)?
  • Do variants participate in the installer resolution algorithm, or just help to choose which artifact to download and install (sure, possibly extending transitive dependencies)?
1 Like

No, there is only one sdist. Variants are like project build options. You start from a single source distribution and then you build it with different options to get different variants. This is analogous to:

# foo variant:
./configure --enable-foo && make install

# bar variant:
./configure --enable-bar && make install

They don’t. They just build a particular variant for the binaries that they ship. Possibly the distro variant is a different variant from anything available prebuilt on PyPI. The distro should record the variant information as appropriate in the PEP 376 metadata.

It is unlikely but possible that a distro would want to provide multiple variants somehow but that would be multiple binaries derived from the same source distribution rather than multiple source distributions.

3 Likes

Okay, that totally makes sense to me. In that view, variants aren’t themselves physical artifacts.

For an sdist, no, but from my understanding they are for wheels.

I’m afraid all of this talking at a high level is making this all a bit confusing to follow (at least for me). In my head, this all works as the following:

  1. Take an sdist (or source tree)
  2. Build your variants from the sdist (or source tree)
  3. Each variant in its metadata specifies what that variant represents (e.g. compiled against cuda12)
  4. That metadata is somehow exposed to installers, probably by making the variant metadata be something recorded in the wheel file name, else you will have file name collisions (this would also be the first time a project contributed data to it’s own wheel filename instead of the build back-end doing it all, which might not be a bad thing for allowing community coordination to figure out what a variant need represents)
  5. The user runs some code to calculate what variants an environment supports
  6. Installers can query that variant support data to know if they can/should download a variant to satisfy a installation requirement

I’m purposefully leaving out how to get something like cuda12 installed as that feels like we’re getting into PEP 725 territory and I feel like this discussion has been focusing on how to simply tell what wheel files are a variant compared to sibling wheel files.

1 Like

Your mental image aligns with mine.

Even if we don’t model them explicitly and give them an official name, variants will exist, since they’re just a way of categorising wheels that differ in ways that platform tags and environment markers cannot express.

Since the whole point of the discussion is to allow projects to publish multiple wheels with the same platform tag and give installers a reasonable way to decide which of those variants is the preferred choice for a given installation environment, I don’t see the value in trying to claim that variants are just wheels like any other. That’s true when it comes to physically installing them into an environment, but it’s not true when it comes to satisfying dependencies (in either direction).

It’s why I see the various discussions of potential variant categories that projects might want to publish variants as interesting, but ultimately irrelevant. If a project wants to call their variants “foo”, “bar”, and “bob”, then that should be entirely possible, and the question of whether they’re mutually exclusive or not, or whether they mean the same thing in two different projects or not should be expressed in the metadata rather than having to be explicitly encoded in the tools consuming that metadata. To the machinery, they should just be opaque strings, the same way extras are.

But A should also have a way to declare that “A variant 1” depends specifically on “C variant 5” while “A variant 2” depends specifically on “C variant 6” (and in some cases, the names of those variants will match, either because one of those projects copied the other, or they’re both following recommendations from another source, like a selector library)

What is this “variant” thing that you end up with after that build? Is it a wheel or something different? I’m trying to get at whether variants are a new kind of artifact (i.e. not an sdist or a wheel) or just some special kind of existing artifact (i.e. a wheel with some metadata encoded somehow).

What I’m surmising from this quote is that you see variants as a special kind of wheel file, distinct from, but essentially the same format as “sibling” wheels, though I’m not sure what a sibling wheel is now either :smiley:

Actually, I think this is a key point, because I want to understand whether we have a new type of artifact with a new format or not, and in either case, where this thing we’re calling a variant directly participates in dependency resolution or not.

1 Like

I think variant wheels should have the same file format as regular wheels, but (aside from default variants) a filename structure that installers that aren’t aware of variants will ignore.

As far as dependency resolution goes, I think variants should take over the niche currently occupied by wheels that already have differences in their declared dependencies without any externally visible notification of the inconsistency. Wheels that share the same variant name (including anonymous default variants) would then be required to declare the same dependencies rather than leaving that as a common, but technically incorrect, client assumption.

I think that this wording is confusing because it makes “variant wheels” still seem like something different from “regular wheels”. It is difficult to answer Barry’s question though without presuming some details that are not agreed or are currently deferred in the discussion. Let me fill out some details hypothetically to try to make this clearer…

First there is an sdist and when you build the sdist you get a wheel. So you start with foo-1.0.tar.gz and you can extract the archive and do:

cd foo
python -m build

Then out the other end pops a wheel

foo-cp312-cp312-win_amd64.whl

If we go ahead with the variant idea then it becomes possible to do something like:

python -m build --variant=bar
python -m build --variant=qux

Now what pops out is (respectively):

foo-cp312-cp312-win_amd64+bar.whl
foo-cp312-cp312-win_amd64+qux.whl

The outputs here are still wheels. The extra part of the filename can be called a “variant tag”. While all wheels are “wheel format” the contents of these different wheels are different in some way that is relevant for compatibility reasons so we encode that difference in the variant tag.

The output of building any variant is always a wheel but when we want to distinguish these different wheels we refer to them as “variants”. There is no distinction between “wheels” and “variant wheels” except that perhaps there is a “default variant” which would be the wheel with no variant tag.

3 Likes

That jives with my thinking as well.