What extras names are treated as equal and why?

A gentle bump. I offer to work on a PEP, but I’d like to have a consensus first.

Sorry for the late reply (I was actually reminded by an entirely unrelated thread elsewhere that wants to add a uncommon extra name…)

Yes, pip normalises all extra names when they come in with the same safe_extra() function. So

  1. Only ASCII alphanumerics, -, _, and . are allowed in an extra name.
  2. - and . are normalised to _ for comparison. So foo[my-bar], foo[my_bar], and foo[my.bar] are equivalent.
  3. Case is folded.

I think this would actually not cause any backwards compatibility issues for pip. Can’t say for other installers, but I don’t really care (they should’ve asked for a clarification like this thread before doing something different).

Does this count as a “small specification change” described in PyPA Governance - Specification Updates? I’d say let’s first send a clarification PR to PyPA specifications first, and write a PEP only if someone thinks it is required. :slightly_smiling_face:

1 Like

From the rules referenced in that thread:

If a change being considered this way has the potential to affect software interoperability, then it must be escalated to the distutils-sig mailing list for discussion, where it will be either approved as a text-only change, or else directed to the PEP process for specification updates.

It’s not entirely clear what constitutes “approval” here, but I’d take the view that getting a consensus is sufficient, with the PEP-delegate having a veto (not that the latter is relevant in this case - see next sentence :wink:).

I’m fine with doing this as a spec update, as long as no-one else wants to argue that it needs to be a PEP.

If that is the intended behavior, there must be a bug somewhere:

(venv) [tmp]$ pip install -U pip
Successfully installed pip-21.1

(venv) [tmp]$ pip install 'webscrapbook[adhoc-ssl]'
Collecting webscrapbook[adhoc-ssl]
  Downloading webscrapbook-0.40.0-py3-none-any.whl (146 kB)
     |████████████████████████████████| 146 kB 2.1 MB/s 
WARNING: webscrapbook 0.40.0 does not provide the extra 'adhoc-ssl'
Installing collected packages: MarkupSafe, werkzeug, jinja2, itsdangerous, click, lxml, flask, commonmark, webscrapbook
Successfully installed MarkupSafe-1.1.1 click-7.1.2 commonmark-0.9.1 flask-1.1.2 itsdangerous-1.1.0 jinja2-2.11.3 lxml-4.6.3 webscrapbook-0.40.0 werkzeug-1.0.1

(venv) [tmp]$ pip install 'webscrapbook[adhoc_ssl]'
Installing collected packages: pycparser, cffi, cryptography
Successfully installed cffi-1.14.5 cryptography-3.4.7 pycparser-2.20

That is why I started this thread.

Works for me! I’ve only said PEP because I assumed it was required.

Note that I said it needs “consensus” - I’d like to see a few more people agree that the proposed behaviour is acceptable here before it goes to a PR. The key is to give interested parties a chance to see the proposal and comment in a well-known forum, and the tracker for a PR doesn’t qualify for that IMO.

Indeed! I checked again and pip’s extra normalisaton behaviour is quite convoluted and eratic. There is code to normalise extras, but I couldn’t find anywhere the logic is ever reached, and can’t help but wonder maybe this worked at some point in the past and silently broke without anyone noticing. I guess that’s yet another reason to have an enforced specification around this…


There has been no disagreement here, but neither there was agreement. Is there any place I need to go to and promote this discussion there?

I don’t know, to be honest. Maybe it needs a PEP just to raise visibility and trigger a proper discussion. At a minumum, given that setuptools and pip are mentioned in the thread, I think you need to get explicit confirmation from the maintainers of those projects that they don’t have a problem with whatever you are proposing. @uranusjr has commented here already, so I guess that’s sufficient for pip (I’m also a pip maintainer but I haven’t researched the question and how it would affect pip).

If you can’t get sufficient voices confirming it can be done as a spec update, the fallback is that it’s a PEP.

I think a PEP is needed, but have had not enough time to write it yet. The specification should be fairly short, but is substential enough to be its own document IMO. It needs to standardise at least two things:

  1. What extra names should be considered equivalent. This should use pkg_resources.safe_extra() since it’s the only logic that can possibly work without breaking things.

  2. How extras should be compared in a PEP 508 environment marker. This is needed because PEP 508 does not sufficiently define how the value of extra should be compared. There are two possibilities:

    1. The standard should mandate all metadata-producing tools to normalise an extra before putting it into a marker (i.e. should write e.g. foo; extra == 'x.y' instead of foo; extra == 'X.Y').
    2. Marker evaluation logic (e.g. packaging.marker) must be amended to perform normalisation when comparing markers (i.e. Marker("extra == 'X.Y'").evaluate({'extra': 'x.y'}) must return True).

    I think the latter solution is likely more viable, since we can’t fix metadata in existing packages on PyPI.

This topic actually came up again when I was working on pip’s importlib.metadata support. It’s really a PITA that I wish more people can take interest in.

(See _iter_egg_info_dependencies in pip/_internal/metadata/importlib/_dists.py.)

1 Like

I think option (1) is better (we recently had a similar debate about whether project names should be explicitly normalised, which I think we should avoid repeating) but we should also require (2) for compatibility with existing un-normalised data. (That’s the “Transition plan” section I’m proposing we include in new PEPs :slightly_smiling_face:)

1 Like

For anyone else who, like me, doesn’t know what that involves:

GitHub link

Seems reasonable: loose on the input, strict on the output.

Is it just how to normalize extras, or is there something bigger you’re referring to?

Can we make extra validation/normalization the same as package name validation/normalization? It looks like it’s pretty similar, at least. That would be useful for simplicity, and also to keep our options open for reifying extras as part of the package name in the future.

For marker evaluation: simplest might be to declare that extra can only appear on the left-hand side of a == or !=, and that this then uses normalizing comparison rules?

If we do that then I don’t really see the point of also mandating that tools produce a specific string form.

It’s theoratically possible, but I’m not sure if it’s a good idea to declare all existing package managers broken to persue theoratical purity. You’d get no objection from me if you write that into a PEP, but I’m not going to try writing that PEP myself and defending the decision against user complaints.

Can you give some examples of extra names that would be broken? From a quick look it seems like safe_extra and PEP 503 normalization are equivalent but I could easily be missing some details.

a£b. Also, safe_extra uses _ as the replacement character whereas PEP 503 uses -.

I don’t think the replacement character matters, since users are always supposed to re-normalize before doing any comparisons. So if pip or whatever wants to prefer one replacement character or another internally, it doesn’t affect anything.

I guess the difference in £ is that safe_extra treats it as punctuation, a£b == a-b, while PEP 503 says that it’s illegal? And same for every other character that’s not ASCII alphanumerics, -, . or _?

The safe_extra approach doesn’t seem very useful – "sure, you can write your extra name in greek or cyrillic, but all extras written in those alphabets will be interpreted as if you had written a single "_"". And making those characters illegal probably wouldn’t be too disruptive – I doubt many people are using them? But idk if it’s like, “literally no-one” or “1 package” or “100 packages”, so maybe it would be disruptive enough to not be worth it, not sure.

The replacement character itself does not matter, but the problem is safe_extra does not normalise - (and .). Under safe_extra rules, a£b and a-b are not equivalent—a£b normalises to a_b, and a-b is a normalised form.

This is a problem for a package currently using - in the extra name. For example, if a package declares an extra a-b and dependency foo ; extra == 'a-b', pip install package[a_b] currently does not install foo, while a PEP-503-based rule will. This is more problematic if the package has both a-b and a_b declared as extras, although this is so fundamentally user-hostile I’d hope no package would do it. The same applies to .; foo ; extra == 'a.b' can currently be selected with [a.b], but PEP 503 would allow [a-b].

BTW, there an additional minor difference between safe_extra and PEP 503 normalisation. PEP 503 specifies any running non-alphanumeric sequences be normalised into a single dash, which means a--b becomes a-b, but safe_extra only replaces running non-safe sequences, so a__b normalises to a__b. But again, anyone relying on this is probably too user-hostile to be a meaningful consideration.

I encounter this issue too with flit and self-refering optional-dependencies:

name = "etils"

array_types = ["numpy"]
all = ["etils[array_types]"]

pip install -e .[all] is failing with:

WARNING: etils 0.2.0 does not provide the extra 'array-types'
  • "etils[array_types]" is normalized to etils[array-types]
  • But not array_types = ["numpy"].

The current fix is to replace array_types = ["numpy"]array-types = ["numpy"].
But this behavior feels inconsistent.

See: Inconsistency in `_` -> `-` normalization · Issue #503 · pypa/flit · GitHub for the original issue.

It seems the next step here is to write a short PEP specifying what @uranusjr mentioned. @uranusjr or @hroncok , is this something you’re actively working on or plan to in the near future? If not, maybe I can help.

Just to confirm, what is the exact normalization procedure currently proposed? @uranusjr , you mentioned that the logic should follow pkg_resources.safe_extra(), but then later highlighted a couple pathological user-hostile corner cases. Just to be clear, is

re.sub('[^A-Za-z0-9]+', '_', extra).lower()

the desired process, to avoid these issues while still preserving full meaningful backward compat?

Actually, as confirmed by my testing, in safe_extra(), any runs of the normalization character (_) are normalized to _, but runs of - and . are not normalized. So a__b does normalize to a_b, but a--b and a..b remain as-is. The above procedure handles this case more sensibly, as well as the other ones you mention.

In terms of spec implementation, it seems PEP should mention the need to both revise the PEP 508 language on the topic, and update/correct the text in the Provides-Extra field of the Core Metadata spec. The former is not currently hosted on the PyPA specifications site; perhaps the PEP could take the opportunity to formally declare such? The latter is, and so can be updated there; given this tweak is just to match existing established practice and doesn’t add, remove or substantially change the semantics of a metadata field, I’d think it doesn’t need a new core metadata version? @pf_moore , any insight on either of these?

Finally, regarding implementation in packaging tools, @uranusjr is your intent that this be implemented in packaging (e.g. packaging.utils.canonicalize_extra), and then pip can call that on both sides of the comparison when getting the extra, and setuptools and other backends can call it when writing Provides-Extra?

It looks like what’s happening here in the former case is that array_types is getting normalized to array-types per the rules for distribution names in PEP 503, just like the name part of the PEP 508 requirements specifiers in that context. However, the actual extras names themselves it is checking against are normalized per the rules implemented by safe_extra().

Unless I’m missing something, the fact that the normalization is not internally consistent on each side of the comparison seems like an bug, regardless of what the final normalization rule should be. @uranusjr , should this be addressed as such, or do you still prefer awaiting the outcome of this PEP as to what the normalization should be?

Unless I’m missing something, the fact that the normalization is not internally consistent on each side of the comparison seems like an bug

I think this a pip bug too. I previously reported it in Inconsistency in `_` -> `-` normalization (extras) · Issue #10757 · pypa/pip · GitHub.

They closed the bug redirecting here:

Moving discussion to What extras names are treated as equal and why? sine this is ultimately a bug in the specification and pip can only follow what the rules allow it to. pip would receive a fix automatically once this gets resolved at the PEP level and implemented in packaging .