PEP 440: Relative ordering between public and local version identifiers

TL;DR; How does version string 1.0 compare to 1.0+abc? Which one would get downloaded if both were available and you only specified the name of a package? Would it be deterministic?

Hello, I’ve read PEP 440 and I couldn’t find an answer to the question: how do public version identifiers compare to local version identifiers, especially when both have the same “base version”, e.g. 1.10.0 and 1.10.0+cpu?

There is a section called “Summary of permitted suffixes and relative ordering”, but it doesn’t mention anything about local version identifiers, only how to compare two public version identifiers.

Further, the section defining local version identifiers explains how to compare local version labels of two local version identifiers and it doesn’t explain how they compare to public version identifiers.

There are only two places in PEP 440 which implicitly suggest how to compare public and local version identifiers.

The first one is the example at the end of section “Summary of permitted suffixes and relative ordering”. You can see that 1.0 < 1.0+abc.5 < 1.0.post456. This suggests that if you want to compare a public version identifier with a local one, you have to strip away the local version label and compare remaining strings as if both were public version identifiers:

  • if they compare equal, then the local version identifier compares greater than the public one by default,
  • otherwise, the ordering between both identifiers is determined by the result of that comparison.

In other words, the ordering is deterministically 1.0 < 1.0+abc < 1.0.post1 < 1.1 < 1.1+abc < 1.1.post1.

Another place that relates to the question is a sentence (near the end of subsection “Version matching”; emphasis mine):

If present, the development release segment is always the final segment in the public version, and the local version is ignored for comparison purposes, so using either in a prefix match wouldn’t make any sense.

Note that the exact phrase used is local version, not local version identifier or local version label. It allows me for two interpretations:

  • it refers to a label; it means that you must strip labels away before making comparisons. If that’s true, then it means that 1.0 == 1.0+abc < 1.0.post1 < 1.1 == 1.1+abc < 1.1.post1.
  • it refers to an identifier; it means that packages with local version identifiers are completely ignored and they won’t get downloaded(?), unless you provide a full identifier when downloading a package (e.g. pip install foobar -f <url> will download version 1.0 and pip install foobar==1.0+abc -f <url> will download 1.0+abc).

I’ve done the following empirical test:

pip install torch --find-links

It downloaded 1.10.1+cpu, even though 1.10.1 was also available, contradicting my second interpretation.

So, did I miss or misunderstand something in PEP 440 or is the answer actually not there?

Perhaps this might be better suited to the packaging section, and I’m just basing this off skimming the PEP and confirming this for myself with packaging, but here’s my analysis.

You can test this yourself in the de facto modern reference implementation of PEP 440, the packaging library. You can use packaging.version.parse to parse versions, which then can be compared for equality, precedence, etc. The standard library distutils also has the similar distutils.StrictVersion, but it is deprecated and has not been kept up to date, so packaging is preferred.

You can see that

parse("1.0") < parse("1.0+abc") < parse("1.0.post1")

compares truthy, as does

parse("1.0") < parse("1.0+abc.5") < parse("1.0.post456")

and, as you’d expect,

parse("1.0") != parse("1.0+abc")

The sorting within local version labels is described in the PEP:

Comparison and ordering of local versions considers each segment of the local version (divided by a .) separately. If a segment consists entirely of ASCII digits then that section should be considered an integer for comparison purposes and if a segment contains any ASCII letters then that segment is compared lexicographically with case insensitivity. When comparing a numeric and lexicographic segment, the numeric section always compares as greater than the lexicographic segment. Additionally a local version with a great number of segments will always compare as greater than a local version with fewer segments, as long as the shorter local version’s segments match the beginning of the longer local version’s segments exactly.

And indeed, as you’d expect,

parse("1.0+abc") < parse("1.0+def")

is truthy, and so too is

parse("1.0+abc") < parse("1.0+abc.1") < parse("1.0+abc.5")

This refers to prefix matching with version specifiers, not comparison with version identifiers. Version specifiers are what you including in install_requires and requirements.txt files to specify your dependencies, not the version identifier stored in the core metadata of your package (e.g. version in setup.cfg, pyproject.toml, etc). This means that for comparison purposes, the version specifier == 1.0.* would match == 1.0+abc. You can also see this in the code. So if you pass pip install torch==1.10.1.*, it will still match torch 1.10.1+cpu.

However, to note, this is technically prohibited by PEP 440, as these labels are intended for downstream integrators (Linux distros, Conda, etc), not PyPI:

As the Python Package Index is intended solely for indexing and hosting upstream projects, it MUST NOT allow the use of local version identifiers.

However, it would seem is not actually strictly enforced as of the release of that version.

Does this help clarify things for you?

1 Like

Thanks @brettcannon for the move to a more appropriate category.

I realized I missed explicitly answering your TL;DR

Taking your question a bit too literally, the truthy version string comparison be "1.0" < "1.0+abc". But pedantry aside, the version identifier 1.0 compares less than to 1.0+atc, and conversely the version identifier 1.0+abc compares greater than to 1.0. I.e. packaging.version.parse("1.0") < packaging.version.parse("1.0+abc").

Assuming all other package metadata (or specifically, that affecting the solve) were identical (which very well may not be the case, particularly in the tensorflow example you cited), and both were the highest current version available, and neither were yanked, and both had wheels for your platform, and there were no other local versions of 1.0, and they both were a valid solve for your Python version, current environment and supported wheel tags, and you were using the latest pip version to install, you’d get 1.0+abc. Basically, TL;DR, everything else equal, 1.0+abc.

As far as I am aware, yes, to the extent the pip solver is deterministic, but if and only if you fully specify all the variables and constraints above, and they do not change.

1 Like