Publishing manylinux install tag heuristics as a dedicated PyPI package?

ncoghlan · July 26, 2019, 2:03pm

Continuing the discussion from The next manylinux specification:

The idea of moving the build profile definitions out of pip core in order to ease future rollouts is an interesting one, though. That could be done in a couple of different ways:

amending the simple API to include an extra metadata file (potentially tricky to evolve over time, but theoretically no harder than evolving a Python API. Definitely harder to roll out though, since we’d need to update all simple API implementations, not just PyPI)

creating and publishing a new “wheel_build_profiles” package that serves as a perennially updated map from wheel naming conventions to installation target checking heuristics.

Updating “wheel_build_profiles” to a new version could then be as routine a task as updating to a new timezone database, such that even highly conservative distros would be willing to keep it up to date.

s/wheel_build_profiles/manylinux_wheel_tags/

This idea is technically a competitor to the perennial manylinux proposal, since it would aim to provide at least some of the same benefits, but in a different way.

Summarising the core problem that both ideas attempt to tackle:

one of the delays in rolling out new versions of the manylinux spec is that it requires end users to upgrade to a newer version of pip in order to consume wheels targeting the newer baseline
because pip is still aiming to improve its default behaviour in various areas, a lot of end users and distributors are quite cautious about pip major upgrades, since they may bring new deprecations and changes to default settings
this means that publishers may delay targeting the newer baseline solely due to their users using the older pip version, rather than due to their users actually running older incompatible Linux distros
it also means that if publishers do switch to targeting the newer baseline, then some of their users may no longer be able to install the published wheel archives

The perennial manylinux concept encodes the tuning parameters for a compatibility checking heuristic directly into the archive filename. This allows a new iteration of an existing heuristic to be published and remain compatible with existing versions of pip, but the definition and rollout of new heuristics would still require a pip upgrade.

There’s a different way of looking at this though, and that focuses on the question of “Why are folks reluctant to upgrade pip?”. And the answer to that is because it’s a complex piece of software with a lot of technical debt still to be paid down, so switching to a new version is something where folks will often want to allow for a bit of testing time first.

Looking at the question that way leads to the follow-on question: what if there was a dedicated package on PyPI that just generated an ordered sequence of manylinux tags for a particular target system, and the logic in packaging.tags was updated to use it, rather than embedding the logic directly?

For reference, that logic can be found at packaging/packaging/tags.py at 5ef37d364bfb18461da28fd667e4d538ba67255b · pypa/packaging · GitHub, and the key point is that when run on the target system, it generates an ordered sequence of manylinux platform tags.

Right now, pip always uses its vendored copy of packaging.tags to calculate the linux tags, but it would be possible to amend that to say:

If the target environment includes a newer copy of manylinux_install_tags, use that to generate the list;
Otherwise use the vendored copy

The potential benefit of this approach over encoding heuristic tuning parameters directly in the wheel filename is that it would also allow the introduction of new heuristics (such as ones based on the contents of /etc/os-release), without requiring a full pip upgrade (just an upgrade of manylinux_install_tags).

If we’re able to get manylinux_install_tags to be seen as a data file update (akin to folks updating pytzdata), then the barriers to getting compatible systems updated to accept the new tags should be much lower than they are now.

With this approach in place, then even switching over to the perennial manylinux wheel tagging concept would no longer technically require a pip update - just a newer version of manylinux_install_tags.

dholth · July 26, 2019, 3:48pm

The idea was there from the start. “Installers are also recommended to provide a way to configure and re-order the list of allowed compatibility tags; for example, a user might accept only the *-none-any tags to only download built packages that advertise themselves as being pure Python.”

I think the right way would be to have an entry point that accepts the entire list of default tags, and returns an iterable producing the new list. The no-op filter would be customize_tags(default_tags): return default_tags.

If you had more than one filter, would it be important to be able to order them?

cjerdonek · July 26, 2019, 5:51pm

Note that pip isn’t quite using a vendored copy of that module, but rather an old copy?pip/src/pip/_internal/pep425tags.py at 4a7b345cc518cecbdc817e503391c231a6e967ac · pypa/pip · GitHub
so it would be a task to switch pip to using that. I haven’t checked to see if it’s in pip’s tracker.

brettcannon · July 26, 2019, 6:53pm

Actually packaging.tags is my re-implementation of the logic to replace pep425tags. And packaging hasn’t done a new release with packaging.tags included in it yet.