What extras names are treated as equal and why?

CAM-Gerlach · January 9, 2022, 10:22pm

It seems the next step here is to write a short PEP specifying what @uranusjr mentioned. @uranusjr or @hroncok , is this something you’re actively working on or plan to in the near future? If not, maybe I can help.

Just to confirm, what is the exact normalization procedure currently proposed? @uranusjr , you mentioned that the logic should follow pkg_resources.safe_extra(), but then later highlighted a couple pathological user-hostile corner cases. Just to be clear, is

re.sub('[^A-Za-z0-9]+', '_', extra).lower()

the desired process, to avoid these issues while still preserving full meaningful backward compat?

Actually, as confirmed by my testing, in safe_extra(), any runs of the normalization character (_) are normalized to _, but runs of - and . are not normalized. So a__b does normalize to a_b, but a--b and a..b remain as-is. The above procedure handles this case more sensibly, as well as the other ones you mention.

In terms of spec implementation, it seems PEP should mention the need to both revise the PEP 508 language on the topic, and update/correct the text in the Provides-Extra field of the Core Metadata spec. The former is not currently hosted on the PyPA specifications site; perhaps the PEP could take the opportunity to formally declare such? The latter is, and so can be updated there; given this tweak is just to match existing established practice and doesn’t add, remove or substantially change the semantics of a metadata field, I’d think it doesn’t need a new core metadata version? @pf_moore , any insight on either of these?

Finally, regarding implementation in packaging tools, @uranusjr is your intent that this be implemented in packaging (e.g. packaging.utils.canonicalize_extra), and then pip can call that on both sides of the comparison when getting the extra, and setuptools and other backends can call it when writing Provides-Extra?

It looks like what’s happening here in the former case is that array_types is getting normalized to array-types per the rules for distribution names in PEP 503, just like the name part of the PEP 508 requirements specifiers in that context. However, the actual extras names themselves it is checking against are normalized per the rules implemented by safe_extra().

Unless I’m missing something, the fact that the normalization is not internally consistent on each side of the comparison seems like an bug, regardless of what the final normalization rule should be. @uranusjr , should this be addressed as such, or do you still prefer awaiting the outcome of this PEP as to what the normalization should be?