(For context, this is a concrete proposal split out from Reconciling `packaging` module & The Dependency Specifier Spec's non-version to version comparison rules - #14 by henryiii)
PEP 508 has some very strange rules related to packaging markers, and packaging has never been compliant with these rules (and uv is not either). In short, the spec states any version comparison operator (like >, >=, and ==) must treat the specifier as a version, even if the table lists it as a string, trying to parse it as a version, and if it canāt parse, then using Python string comparison rules as fallback. This means implementation_name > "cpython" is valid and is supposed to evaluate False for cpython, but True for pypy. And the implementation is supposed to try to convert these to versions first, which in packaging is actually fairly expensive when done many times.
Hereās the table (minus a column for clarity here):
| Marker | Type | Sample values |
|---|---|---|
os_name |
String | posix, java |
sys_platform |
String | linux, linux2, darwin, java1.8.0_51 (note that ālinuxā is from Python 3 and ālinux2ā from Python 2) |
platform_machine |
String | x86_64 |
platform_python_implementation |
String | CPython, Jython |
platform_release |
String | 3.14.1-x86_64-linode39, 14.5.0, 1.8.0_51 |
platform_system |
String | Linux, Windows, Java |
platform_version |
String | #1 SMP Fri Apr 25 13:07:35 EDT 2014Java HotSpot(TM) 64-Bit Server VM, 25.51-b03, Oracle CorporationDarwin Kernel Version 14.5.0: Wed Jul 29 02:18:53 PDT 2015; root:xnu-2782.40.9~2/RELEASE_X86_64 |
python_version |
Version | 3.4, 2.7 |
python_full_version |
Version | 3.4.0, 3.5.0b1 |
implementation_name |
String | cpython |
implementation_version |
Version | 3.4.0, 3.5.0b1 |
extra |
String | toml |
extras |
Set of strings | {"toml"} |
dependency_groups |
Set of strings | {"test"} |
It should be noted, of these string fields, only one of them actually might look like a version: platform_release (platform.release()). On some systems, like macOS, this is a valid version. The only other one that anyone on GitHub has ever tried to use with a comparison thatās not equality is sys_platform >= "win32", which is obviously hoping that it would cover an imaginary "win64" (but "win128" would compare less!).
There is one valid use for platform_release; when you are on a system that you know has a valid PEP 440 style version, you can in theory gate it:
# scipy is not supported on Mac M1 with Mac OS < 12.0
scipy; platform_system != "Darwin" or platform_machine != "arm64" or platform_version >= "12"
In packaging<22, LegacyVersion was used, so these comparisons always returned False, and did not fall back to Python string comparison, since thatās how LegacyVersion worked. Starting in packaging 22, with the removal of LegacyVersion, these started throwing InvalidVersion errors instead (also not spec compliant). The spec does not specify if short circuit evaluation is required (since it basically has fallbacks for everything, thereās not really a point), so this means the above expression, in packaging 22-25.0 fails on any system that doesnāt have a valid PEP 440 version here, rendering it useless unless you only support the subset of systems where this does convert to a version. This was an issue adopting newer packaging versions in pip, and has basically resulted in every project that is actively maintained having to stop relying on this mechanism - less than 50 examples remain on GitHub of this. A significant number of pull requests and issues are open on packaging with various ways to fix these issues.
From some discussions with the uv team and looking at the code, uv follows the table, and does not try to convert everything to Version, though it does implement string comparisons; sys_platform == "win32" and platform_release > "9" would fail on Windows 11, for example, while it would pass on packaging - but packaging would currently crash with an error if this line was evaluated on most Linux systems.
Now we are faced with an issue: packaging 25.1 is about ready for release, and the way it handles invalid version comparisons has changed. As the code stands now, this will return False again, like it used to. Also, weāve been really focused on performance; reading every Version on PyPI is 2x faster, and constructing/using SpecifierSet is ~3x faster (partially because we construct fewer Versions, which are costly due to needing a regex, even at 2x faster). This should improve the performance of the pip resolver, which constructs thousands of versions and specifier sets. One of the remaining areas where we are running Version on is on every dependency marker.
So there are three problems:
- The behavior is changing from an error to a non-standards complaint behavior (even though itās the same behavior from years ago)
- We still have to try to construct versions on every single item in the above table. We are having to try to run the version regex on every marker,
"cpython","x86_64", etc. - There are weird bugs open, like
v0canāt be used as the name of an extra, because it parses as a version. We have to take an expression likething; extra == "v0"and parse the v0 as a version according to the spec, since==is a version comparison, the type of the field is meaningless according to the spec.
So Iād like to propose the following spec changes to align the spec with the way these have been handled since the beginning in packaging, and reduce our work required as well. Itās a minimal change; larger changes could be worked on later if someone wanted to work on a PEP for cleaning this up. But hereās my proposal:
- Change the spec to state only
Versionvalues must have the āconvert to version if possibleā behavior. This will allow implementations to fix errors like using"v0"as an extra, and provide a performance boost.uvis doing this anyway. - Make
platform_releaseaVersion(could be indicated asstring | Versionin the table to help users realize it will often fail the conversion, but it keeps the legacyVersionbehavior above.) - Define
>and<as alwaysFalse, and<=,>=as equivalent to==, for strings and failed Version conversions. This is the legacy (<22) and current (in main) behavior ofpackaging. Python string ordering is never reliable; even if it happens to work going from8to9, it will break on the next release because9is more than10. And this requires that other languages, like Rust, follow Pythonās rules for string ordering.
This should only affect <50 (legacy) packages on GitHub, and it will do the right thing for them as well (making the packaging <22 behavior official). (Most of these have other typos, like 21 instead of 12 for the macOS version, so pretty sure they are dead projects, but it wonāt break them).
Iād like to do this now, since we are replacing an Error with our old behavior, which was not spec compliant, so packages may start appearing expecting this behavior (again).
(Iād also be fine if we kept the string comparison (drop bullet 3 above), and changed packaging to support that; there has been pushback since this isnāt reliable even for version-like values, and itās viewed as easier to change a False to a True than the other way around, due to the asymmetry in markers not supporting not, and historically it has returned False here. Iāll quite Konsti from the uv discussion here, hopefully he doesnāt mind: āthis version-to-string-comparion fallback behavior is a big footgun and i wholeheartedly endorse removing itā.)