What extras names are treated as equal and why?

hroncok · March 16, 2021, 2:26pm

Packaging (and hence our tooling) considers this as two extra names and we reduce the problem to foo[bzr] && foo[baz].

I’d happily help to define it explicitly.

That happens on “build time”. While definitively important, it is not what is determinative for this problem. I need to know what conversions (should) happen in installers when they resolve dependencies. Does pip follow the same logic?

It parses the requirement via packaging:

ALPHANUM = Word(string.ascii_letters + string.digits)
PUNCTUATION = Word("-_.")
IDENTIFIER_END = ALPHANUM | (ZeroOrMore(PUNCTUATION) + ALPHANUM)
IDENTIFIER = Combine(ALPHANUM + ZeroOrMore(IDENTIFIER_END))
EXTRA = IDENTIFIER

So it fails if it contains characters like ! or @. Good.

It seems that it does not consider any of the listed punctuation as equal.
It seems that it simply ignores the case.

Let’s standardize what packaging already uses to parse it? It can begin with a digit, but not a punctuation.

For 100% backwards compatibility, we could define the normalization rule as:

def normalize(extra_name):
    return extra_name.lower()

If we are not afraid of changes, we could define it to what PEP 503 does for names:

def normalize(extra_name):
    return re.sub(r"[-_.]+", "-", extra_name).lower()

That would require some changes in pip (or some of the vendored libraries). It would only blow up if some projects define multiple extras with names that only differ in punctuation. I’d say that is a no-risk, but maybe I am an optimist.

Should I PEP this? The PEP would define the rules of “valid” extra names (defined by current packaging parser) and their normalized form (one of the above).