What extras names are treated as equal and why?

Sorry, thought I sent this yesterday…

To note, this previous thread has some substantial discussion about what extras names are allowed.

In particular, per PEP 508 and confirmed by my testing, at least in requirements specifiers (not necessarily metadata), the following extras names are disallowed in packaging and the tools that rely on it for parsing, including pip, etc, and results in a fast failure and an error:

  • No characters outside of [A-Za-z0-9._-] are allowed anywhere in specified extra names (regardless of unicode character class)
  • Otherwise valid punctuation (., - and _) is not allowed as the leading or trailing character
  • Case-folding is not specified, but neither is explicitly contradicted by the spec and is implemented in practice in pip (AFAIK likewise at the packaging level, though I haven’t explicitly checked the code to verify)

As such, while backends may allow extras names not matching this spec to be stored in core metadata (though the nominal spec restricts this fairly similarly, to “valid Python identifiers” as of Python 2), it is not possible for anyone to actually specify them as requirements, so if any package has been using them, such extras have been uninstallable and unusable as-is anyway (both following the spec and in practice).

Thus, the rules for allowable distribution package names in requirements specifiers (both per PEP 508 and in practice) are identical to those for extras, since both are built on the base identifier specification. Therefore, the existing PEP 503 canonicalization logic can safely be applied (preferably using _ as the replacement char instead of -, for consistency with previous implementations and with Metadata 2.1), as @hroncok proposed originally, as any non-conforming extras names have been unspecifiable and uninstallable anyway. Therefore, there is only one delta to the existing safe_extra() that @uranusjr mentioned and I later clarified, i.e. runs of one or more - and . are not normalized to _, which as @uranusjr stated

So @hroncok , this would support your proposal that started this off,

Aside from the tweak of normalizing to _ instead of -, to conform to the discussion here, safe_extra() and Metadata 2.1. This would be particularly easy to implement in packaging, since it is only a one-character difference from canonicalize_name (or could even be used as-is, if we accept a change of _ to - as the normalization character).

1 Like