Implementation variants: rehashing and refocusing

I think this is also going to be a good way to illustrate why new syntax and new metadata fields is going to be desirable from a backwards compatibility point of view. This may seem counterintuitive, but it makes sense if the following guiding principles are adopted:

  • the semantics of all existing fields remain unchanged
  • new syntax is only permitted in new fields (at least on PyPI, but potentially everwhere)
  • the end-to-end process of publishing and installing default variants should be unchanged

That way projects would be able to publish metadata for their default build variants that was still compatible with older client tools, even while adding the new build variant metadata for newer clients that could make use of it.

The design sketch below follows these principles. While we wouldn’t necessarily do things exactly the way it proposes, I think it’s illustrative enough to suggest that something along these lines could get us quite a long way (just as extras have).

Also, to be clear, it would be possible with the design sketch below to make installation fail by being overly specific with build variant dependencies. This isn’t massively different than the status quo, where overly specific version dependencies are likely to result in unresolvable dependency trees. I see that as a quality of implementation concern at the level of individual projects, and believe it can be handled the same way overly specific version dependencies are: either avoiding affected projects (if there are viable alternatives with better dependency management), or else discussing the problem with the projects involved, and perhaps offering them PRs to improve the situation (if they’re amenable to that).

Dependency declarations

Dependency declarations would change to allow a new (variant) field to appear between the distribution name and the [extras] field.

Depending on name and depending on name() mean different things: name will accept any build variant, while name() explicitly requires the default variant. It will primarily make sense to depend on name() when that’s just a shorthand for a particular named build variant declared as Default-Variant in the underlying project’s metadata. (Edit: inadvertently omitted this paragraph in the initial writeup)

Environment markers would also gain access to a new variant == ... clause.

Using a new dependency syntax and environment marker clause means that older clients will fail fast when given a set of requirements that depends on the client understanding build variants for it to be correct. The source metadata rules below are also designed such that older clients won’t be invadvertently exposed to the new syntax via transitive dependencies.

It also means that the installers themselves know they need to check for Provides-Variant metadata if the package is already installed in the target environment.

By contrast, if we try to reuse the existing extras syntax and metadata fields for this purpose, then we’d be forced to bump the major metadata version to keep older clients from being exposed to it, which means that projects would have to make a choice between supporting older installation clients and making use of the new build variant support, and we know from past experience that imposing such a requirement is enough to kill a new capability before it ever gets anywhere (if anyone is fortunate enough to not know what I’m referring to: explore the fate of the old metadata 2.0 PEP which languished for years before finally being put out of its misery with the publication of metadata 2.1 which actually came with a viable transition plan).

Metadata updates

Source metadata

New fields:

  • Provides-Variant: like Provides-Extra, but for build variants. To avoid confusion, default would be disallowed as a variant name (being reserved as the name of the otherwise anonymous variant indicated by an empty build variant string).
  • Default-Variant: normally a dependency on distname() would refer to the anonymous default variant, this field allows it to mean something else (e.g. numpy() might be equivalent to numpy(openblas)
  • Requires-Dist-Variant: allows the new name(variant) syntax in dependency declarations and the new variant == ... expression in environment markers (so older clients will never see the new syntax).

The idea here would be to allow projects to publish source metadata where their default variants continued to be backwards compatible with old installation clients, while still allowing new variants to be more specific.

Numpy for example, might declare:

Provides-Variant: openblas
Provides-Variant: mkl
Default-Variant: openblas

While Scipy declared:

# Default build variant continues to bundle its own copy of OpenBlas and works with any NumPy
Provides-Variant: openblas
Provides-Variant: mkl
Requires-Dist-Variant: numpy(openblas); variant == openblas
Requires-Dist-Variant: numpy(mkl); variant == mkl

When it comes to combining build variants for optional features that aren’t mutually exclusive, I think it could reasonably be handled by allowing comma-separated lists anywhere a variant name is specified. When compared, these lists would be converted to sets first so the order didn’t matter (i.e. Provides-Variant: FeatureA,FeatureB would match a dependency declared as name(FeatureB,FeatureA)), and the check would be that the request is for a subset of the required features (i.e. Provides-Variant: FeatureA,FeatureB would match dependencies declared as name(FeatureA) and name(FeatureB) ). Combined variants would also satisfy variant == ... clauses for any of the features they contain. (Edit: clarified that the actual compatibility check would be for subsets, not equality)

However, if the distribution metadata doesn’t explicitly list a combination of features as supported, installation tools would assume those features are mutually exclusive.

Wheel metadata

Same fields as the source metadata, except:

  • Provides-Variant: only the variant matching the wheel is kept
  • Requires-Dist-Variant: any entry with a variant == ... clause that doesn’t match the wheel is omitted

Default-Variant is retained (if set) so name() dependencies against already installed packages can be checked by installers.

For backwards compatibility, wheels for the default variant would omit the variant portion of the wheel name (however that ends up being spelled).

Installation METADATA file

Matches the wheel metadata (and is the actual reason for defining the wheel metadata that way)

Installation process

If a dependency set ends up requiring a specific build variant of an already installed package, other than the one that was currently installed, the behaviour would depend on whether package upgrades/replacements were allowed or not (similar to what happens when an installed package doesn’t meet the determined version requirements).

If replacement is allowed, swap out the installed variant for a variant that satisfies the requirement set. If replacement is not allowed, fail the install and report the conflict.

4 Likes