I wanted to put a proposal out here and solicit some advice. This proposal is mostly based on the top post, which is why I’m posting in this thread.
There’s 3 parts to this proposal, which combine other proposals we’ve seen in this and other threads.
Packages:
- Contain a record of variants (metadata provider programs, keys and values) that were used for producing the packages. These are namespaced, like
metadata-provider:variable=value
- Hash the record of variants, and use this hash in the build tag for differentiating variants
Example filename: myproject-1.2.3-h8d84c3-py3-abi3-linux_x86_64
On the hosting side
I propose a file, variants.json
that the repo/index software is responsible for creating. When a wheel file is uploaded, the server uses the record in the package to create or update the variants.json
file. That file looks like:
{
"8d84c3": {
"provider_abc:variable1":123,
"provider_xyz:variable2":123,
},
"b5670a":{
"provider_xyz:variable2":123
}
}
The hash keys there are just the first 6 (arbitrary length) characters of the sha256 hash of the dict values as strings.
This file does not list any individual files. Instead, it is a map of available variants, and a way to associate meaning to the hashes (by associating the hash key to human-readable info about the input content).
On the client’s system
There will be a file that specifies the state of variants that tools should find/install. Right now, I think it lives at the environment level, in line with @barry’s thoughts. That file might look something like:
[providers.provider_abc]
version = "1.0.0"
[providers.provider_abc.variables.variable1]
description = "This is a description"
values = ["123", "456"]
[providers.provider_xyz]
version = "1.0.0"
[providers.provider_xyz.variables.variable2]
description = "This is a description"
values = ["primary", "secondary"]
and this file could be generated manually, or with some combination of hardware detection programs (the standalone executables that have been mentioned might be relevant here). The values here do not have to come from the set of values that are available remotely, but of course if they don’t, then that variant is considered unavailable.
This file would be “compiled” locally into a cached collection of hashes, exhaustively hashing the combinatoric space of all combinations of all variables. The ordering in this file would prioritize:
- Combinations with more variables (more specific variants)
- Position of the provider/variable entry in the variants.toml file
- Sorting based on order in the list of values provided in the variants.toml file
When a tool wants to look up matching variants, it:
- fetches package/variants.json
- loads the local collection of pre-computed hashes
- does a set intersection of the remotely available variants with the locally available hashes
- Sorts the result according to the order in the locally available hashes
- Retrieves the file listing, which is named according to the hash -
package/files-8d84c3
or similar. This files listing file would be identical to the existing PEP 503 and/or PEP 691 standards, but would only show files for that variant. This is similar to filtering by filename using the build string, if that makes more sense to have a flat hierarchy, but then the variant files may confuse “normal” package resolution.
What I’m not clear on is:
- Is it feasible to add these new endpoints (one for variants.json, another for the per-variant files listing)?
- If the variant matches, but there ends up being no packages that match the user system’s platform tags, what recourse is there to find either other variants, or fall back to the no-variant listing?
I’m playing with an implementation that I’ll post soon. I don’t know if the repo stuff (extra files/endpoints) I’m doing is going to be viable, but hopefully it will at least be good food for thought.