PEP 440: Relative ordering between public and local version identifiers

spoolar · December 18, 2021, 11:23am

TL;DR; How does version string 1.0 compare to 1.0+abc? Which one would get downloaded if both were available and you only specified the name of a package? Would it be deterministic?

Hello, I’ve read PEP 440 and I couldn’t find an answer to the question: how do public version identifiers compare to local version identifiers, especially when both have the same “base version”, e.g. 1.10.0 and 1.10.0+cpu?

There is a section called “Summary of permitted suffixes and relative ordering”, but it doesn’t mention anything about local version identifiers, only how to compare two public version identifiers.

Further, the section defining local version identifiers explains how to compare local version labels of two local version identifiers and it doesn’t explain how they compare to public version identifiers.

There are only two places in PEP 440 which implicitly suggest how to compare public and local version identifiers.

The first one is the example at the end of section “Summary of permitted suffixes and relative ordering”. You can see that 1.0 < 1.0+abc.5 < 1.0.post456. This suggests that if you want to compare a public version identifier with a local one, you have to strip away the local version label and compare remaining strings as if both were public version identifiers:

if they compare equal, then the local version identifier compares greater than the public one by default,
otherwise, the ordering between both identifiers is determined by the result of that comparison.

In other words, the ordering is deterministically 1.0 < 1.0+abc < 1.0.post1 < 1.1 < 1.1+abc < 1.1.post1.

Another place that relates to the question is a sentence (near the end of subsection “Version matching”; emphasis mine):

If present, the development release segment is always the final segment in the public version, and the local version is ignored for comparison purposes, so using either in a prefix match wouldn’t make any sense.

Note that the exact phrase used is local version, not local version identifier or local version label. It allows me for two interpretations:

it refers to a label; it means that you must strip labels away before making comparisons. If that’s true, then it means that 1.0 == 1.0+abc < 1.0.post1 < 1.1 == 1.1+abc < 1.1.post1.
it refers to an identifier; it means that packages with local version identifiers are completely ignored and they won’t get downloaded(?), unless you provide a full identifier when downloading a package (e.g. pip install foobar -f <url> will download version 1.0 and pip install foobar==1.0+abc -f <url> will download 1.0+abc).

I’ve done the following empirical test:

pip install torch --find-links https://download.pytorch.org/whl/cpu/torch_stable.html

It downloaded 1.10.1+cpu, even though 1.10.1 was also available, contradicting my second interpretation.

So, did I miss or misunderstand something in PEP 440 or is the answer actually not there?

CAM-Gerlach · December 20, 2021, 2:34am

Perhaps this might be better suited to the packaging section, and I’m just basing this off skimming the PEP and confirming this for myself with packaging, but here’s my analysis.

You can test this yourself in the de facto modern reference implementation of PEP 440, the packaging library. You can use packaging.version.parse to parse versions, which then can be compared for equality, precedence, etc. The standard library distutils also has the similar distutils.StrictVersion, but it is deprecated and has not been kept up to date, so packaging is preferred.

You can see that

parse("1.0") < parse("1.0+abc") < parse("1.0.post1")

compares truthy, as does

parse("1.0") < parse("1.0+abc.5") < parse("1.0.post456")

and, as you’d expect,

parse("1.0") != parse("1.0+abc")

The sorting within local version labels is described in the PEP:

Comparison and ordering of local versions considers each segment of the local version (divided by a .) separately. If a segment consists entirely of ASCII digits then that section should be considered an integer for comparison purposes and if a segment contains any ASCII letters then that segment is compared lexicographically with case insensitivity. When comparing a numeric and lexicographic segment, the numeric section always compares as greater than the lexicographic segment. Additionally a local version with a great number of segments will always compare as greater than a local version with fewer segments, as long as the shorter local version’s segments match the beginning of the longer local version’s segments exactly.

And indeed, as you’d expect,

parse("1.0+abc") < parse("1.0+def")

is truthy, and so too is

parse("1.0+abc") < parse("1.0+abc.1") < parse("1.0+abc.5")

This refers to prefix matching with version specifiers, not comparison with version identifiers. Version specifiers are what you including in install_requires and requirements.txt files to specify your dependencies, not the version identifier stored in the core metadata of your package (e.g. version in setup.cfg, pyproject.toml, etc). This means that for comparison purposes, the version specifier == 1.0.* would match == 1.0+abc. You can also see this in the code. So if you pass pip install torch==1.10.1.*, it will still match torch 1.10.1+cpu.

However, to note, this is technically prohibited by PEP 440, as these labels are intended for downstream integrators (Linux distros, Conda, etc), not PyPI:

As the Python Package Index is intended solely for indexing and hosting upstream projects, it MUST NOT allow the use of local version identifiers.

However, it would seem is not actually strictly enforced as of the release of that version.

Does this help clarify things for you?

CAM-Gerlach · December 20, 2021, 8:49pm

Thanks @brettcannon for the move to a more appropriate category.

I realized I missed explicitly answering your TL;DR

Taking your question a bit too literally, the truthy version string comparison be "1.0" < "1.0+abc". But pedantry aside, the version identifier 1.0 compares less than to 1.0+atc, and conversely the version identifier 1.0+abc compares greater than to 1.0. I.e. packaging.version.parse("1.0") < packaging.version.parse("1.0+abc").

Assuming all other package metadata (or specifically, that affecting the solve) were identical (which very well may not be the case, particularly in the tensorflow example you cited), and both were the highest current version available, and neither were yanked, and both had wheels for your platform, and there were no other local versions of 1.0, and they both were a valid solve for your Python version, current environment and supported wheel tags, and you were using the latest pip version to install, you’d get 1.0+abc. Basically, TL;DR, everything else equal, 1.0+abc.

As far as I am aware, yes, to the extent the pip solver is deterministic, but if and only if you fully specify all the variables and constraints above, and they do not change.

spoolar · March 2, 2022, 12:35am

I’ve just realized that if you run

# first scenario
pip install -U torch==1.10.2
pip install -U torch==1.10.2 --find-links https://download.pytorch.org/whl/cpu/torch_stable.html

then torch will get downloaded twice (versions 1.10.2 and 1.10.2+cpu, respectively). However, if you run those commands in the reverse order, i.e.

# second scenario
pip install -U torch==1.10.2 --find-links https://download.pytorch.org/whl/cpu/torch_stable.html
pip install -U torch==1.10.2

then torch will get downloaded only once (version 1.10.2+cpu), even though in both scenarios you can see “Requirement already satisfied” message in the output of the second command. Could you help me understand it?

This quote from PEP 440 - [exact] version matching confuses me:

If the specified version identifier is a public version identifier (no local version label), then the local version label of any candidate versions MUST be ignored when matching versions.

My initial interpretation of that quote is that torch should be downloaded only once is the first scenario, not twice.

This is the only way I could explain it:

first, pip makes a list of all available versions online and sorts them based on “relative ordering” rules described in PEP 440; it means that 1.10.2+cpu is placed higher on the list than 1.10.2 (based on discussions above),
then, pip goes one-by-one through the list and strips the “local label” from every version on that list (so 1.10.2+cpu temporarily becomes 1.10.2 and now there are two entries for the same version, but the CPU one is still higher) and compares versions to the provided version string,
when pip comes across a version that compares equal, it then checks some kind of hash of that package and checks if it appears in the local cache. If yes, it uses the cached package, otherwise it downloads it. Since 1.10.2 and 1.10.2+cpu have different hashes, it decides to download it a second time in the first scenario.

Is that a correct explanation? If so, why doesn’t pip first look at the list of available packages locally and conclude that 1.10.2 is already available?

Additionally, why in both scenarios you can see a message “Requirement already satisfied: torch==1.10.2”, but pip still downloads torch a second time in the first scenario?

CAM-Gerlach · March 2, 2022, 12:45am

Its a little unclear what you mean by downloaded—could you provide full output to illustrate?

It seems probable that the difference is ultimately the result of local version identifiers sorting higher than non-local ones, as your more detailed hypothesized mechanism speculates, perhaps combined with the fact that the second install is from a third party index which doesn’t expose the package metadata without downloading the full package. The full output might help clarify, though ultimately to confirm this one will need to dig into the code, and perhaps summon a pip maintainer (which I can do).

spoolar · March 2, 2022, 1:24am

(Note: I forgot to add -U flag in all commands in my previous reply. I’ve fixed it just now)

I tested those scenarios using Docker.

This is the Dockerfile:

FROM python:3.8-slim-buster
WORKDIR /var/workdir
COPY . .

I build it like this: docker build -t customimage .

There are two scripts:

script1.sh:

#!/bin/sh
pip install -U torch==1.10.2
pip install -U torch==1.10.2 --find-links https://download.pytorch.org/whl/cpu/torch_stable.html

script2.sh:

#!/bin/sh
pip install -U torch==1.10.2 --find-links https://download.pytorch.org/whl/cpu/torch_stable.html
pip install -U torch==1.10.2

And I run them like this (I made a shortcut script run.sh, seen on screenshots):

docker run -it --rm customimage ./script1.sh

Screenshot from 2022-03-02 02-17-301116×505 102 KB
docker run -it --rm customimage ./script2.sh

Screenshot from 2022-03-02 02-17-391107×343 77 KB

I hope that better explains my issue. By “downloaded twice” I meant that I could see two loading bars and it took longer to download than in the other scenario (slow internet connection).

pf_moore · March 2, 2022, 10:40am

This is starting to sound less like a general question about how version ordering works (that question appears to have been answered) and more like a report of a possible pip bug. So would it be better to move the pip-related part of the discussion here onto the pip tracker?

spoolar · March 2, 2022, 12:28pm

Of course, if you think it’s better to move it elsewhere, then we can do that. It’s just that when I noticed this behavior, I thought that I misunderstood something about relative ordering and @CAM-Gerlach’s explanations and wanted to confirm if that’s the case.

Btw, how do I move this part to pip tracker? And where do I find pip tracker?

CAM-Gerlach · March 2, 2022, 2:11pm

Copy and paste the relevant parts of your posts into the relevant parts of the appropriate pip issue template. They both use GFM, so you shouldn’t need to change the syntax.

Sidenote—pasting text with the appropriate formatter (e.g. ```console) is almost always preferable to screenshots.

Google is a pretty good pip bug tracker - Google Search