This is first raised in pipxproject/pipx#528. I’m copying some of the comments I left there so people don’t need to read through the whole thread containing tangentally related things.
The only mention I can find regarding
.dist-info names are in PEP 376 and PyPA Specifications. All other specifications (including the wheel specification) only refers to one of these. PEP 376 defines the directory name as
name + '-' + version + '.dist-info'
but does not otherwise say what values can be used for either
version. The PyPA specification expands on this:
This directory is named as
Distribution versionsfields corresponding to Core metadata specifications. The name field must be in normalized form (see PEP 503 for the definition of normalization).
The problem is, none of the wide-spread tools producing
.dist-info directories actually do this. Instead, both the name and version parts have their dashes replaced by underscores (persuambly to avoid ambiguity since the dash is used to separate
version), and that’s it. The dots are not replaced, and the name not lower-cased (both mandated by PEP 503). Existing tools inspecting installed packages (
importlib.metadata) also use these rules to discover packages.
Since it is not realistic to fix all the tools out there to follow the specification (for more than one reason), I intend to propose a pull request to the PyPA specification to define the rules as the followings instead, to match reality:
- The name part should replace any running dash (
-) and underscore (
_) sequences by a single underscore (
_), and any running dot (
.) sequences by a single dot (
.). This is similar to the normalisation rule in PEP 376, but with two differences. First, the underscore character is used instead of dash, to avoid ambiguity when parsing the directory name. The dot character is also treated differently for backward compatibility reasons.
- The version part should always use the normalised form according to rules defined in PEP 440. This means that the version part never contains a dash (
-) character, again eliminating ambiguity for parsers.