I submitted a pull request with some technical, proofreading and a few copyediting changes to the text of the PEP.
There was one substantive, rather significant issue with the PEP’s content, however, that should be discussed here, however—the normalization algorithm it specifies does not appear to be the one that represented the final rough consensus on the previous thread. Furthermore, its properties and quirks directly contradict several of the claimed advantages and stated motivations for it elsewhere in the PEP (unlike said algorithm), greatly diminish its practical benefit, and mean that it does not actually solve the original issue that sparked the PEP to begin with, as cited therein (that adhoc-ssl
does not compare equal to adhoc_ssl
).
The normalization algorithm currently cited in the PEP is:
re.sub('[^A-Za-z0-9.-]+', '_', name).lower()
However, as discussed on the previous issue, the algorithm should instead be
re.sub('[^A-Za-z0-9]+', '_', name).lower()
(i.e., the previous algorithm, except with .
and -
also normalized to _
).
In real-world practice, the latter is exactly equivalent to PEP 503 normalization except with _
as the replacement character, because per PEP 508 and as actually implemented in packaging tools, no characters outside of [A-Za-z0-9._-]
have been allowed anywhere in specified extra names.
Using the latter instead of the former means that:
- Normalization is actually useful, as the only actual normalization the former algorithm does on currently possible extras names is making is making
test__extra
equivalent totest_extra
, whereas the latter means thattest_extra
,test--extra
andtest.extra
will all be normalized totest_extra
. - The original issue that sparked the PEP, “the extra
adhoc-ssl
was not considered equal to the nameadhoc_ssl
by pip”, is actually solved. - The normalized form will always be a valid Python identifier, as currently required by the Extras spec (whereas the normalization proposed by the PEP, contradicting its claim, has no practical effect on any currently possible Extras name’s validity as a Python identifier, and allows both
.
and-
which are invalid characters anywhere in such.) - The strange, unexpected and confusing behavior with
test__extra
being normalized totest_extra
, buttest--extra
being left alone, is avoided (by normalizing both totest_extra
); to wit, the PEP itself is confused on that point, as it states “Runs of characters, unlike PEP 503, do not get collapsed, e.g. ___ stays the same.” when in fact,___
is collapsed (as I described on the previous thread, while “—” is not. - The normalization is consistent between project and extras names, except for the replacement character
As likewise discussed on the previous thread, this has effectively no greater real-world backward compatibility impact than the currently-specified behavior, as the only cases that would be meaningfully affected are very unlikely, fundamentally user-hostile and (based on pip’s behavior), appear to be mostly be currently broken anyway:
(to note, given the problem identified by the OP and my later testing, it appears that these extras cannot even currently be selected with pip
to begin with) and
which, to note, due to the strangeness of the currently-specified implementation, the above actually has it backwards—a--b
is not normalized, but a__b
is normalized to a_b
.