The discussion in Clarify naming of .dist-info directories has reminded me of a couple subtle problems with PEP 427 and its handling of versions in wheel filenames, and now that progress is being made on escaping of versions in sdist and .dist-info
directory names, this seems like as good a time as any to bring it up.
Problem number 1: PEP 427 requires that the version component of a wheel filename only contain alphanumeric characters, underscores, and periods, with all other characters converted to underscores; however, this overlooks the fact that, per PEP 440, version strings can also contain exclamation points (to denote version epochs) and plus signs (to denote local version identifiers). Converting these two symbols to underscores causes a loss of information that makes it impossible in certain cases to compare the versions of two wheels just by inspecting their filenames; for example, 1!2
, 1+2
, and 1-2
all end up escaped as 1_2
(which, incidentally, is not a valid PEP 440 version; see below). The wheel project ran into this problem in issue 268, which led to them greatly loosening their wheel filename regex, and the author of PEP 427 has written that applying the same escaping rules to the version component as to the other filename components is āprobably a mistakeā, yet there does not appear to have ever been any follow-up on this.
I would thus like to request that the relevant standards be amended to allow !
and +
in version components of wheel, sdist, and .dist-info
names. For the record, a scan yesterday of the 1,396,899 wheels on PyPI found 58 with exclaimation points in their versions and 244 with plus signs (the latter presumably uploaded before Warehouse started blocking local versions), in comparison to the 998 with underscores in their version components.
The second problem with version escaping as currently specified is its blanket transformation of all hyphens to underscores. Under PEP 440, hyphens and underscores in version strings are completely interchangeable, with one exception: the post
in a post-release specifier can be replaced by a hyphen and only by a hyphen. (Interestingly, this restriction contradicts the statement later in the PEP that ā[PEP 440] allows [the underscoreās] use anywhere that -
is acceptable.ā) So if we start out with a version string of the form 1.0-1
(an alternative spelling of the canonical 1.0.post1
), it gets escaped to 1.0_1
, which is not a valid PEP 440 version.
Possible ways to handle this are:
-
Amend PEP 440 and
packaging
to permit underscores in place ofpost
-
Require versions to be canonicalized before escaping, thereby eliminating all hyphens without affecting PEP 440 validity. (The
.dist-info
name proposal already requires project names to be canonicalized, but not versions.) -
Amend the escaping rules for version components to be āReplace all hyphens with underscores, except for those hyphens that indicate an implicit post release, which should instead be replaced with the string
.post
.ā -
Require versions to be escaped by converting them to an equivalent form modulo canonicalization that does not contain a hyphen and leave it up to the wheel, sdist, and
.dist-info
generators exactly what they want to do. -
Document that version strings in file & directory names need to be unescaped before use. Assuming that
!
and+
are allowed unescaped in version components, this leaves the hyphen as the only character in a valid PEP 440 string that needs escaping, and so unescaping is justs.replace("_", "-")
.