PEP 685: Comparison of extra names for optional distribution dependencies

That’s true during the transition. I don’t know if @pf_moore wants a change to the PEP before it starts getting implemented?

2 Likes

Do you have a suggestion for how we’d mitigate this issue? It seems to me that all we can do is:

  1. Note the issue, so that if people hit it they have a place to find out what happened and what to do.
  2. Prioritise implementing PEP 685 in consumers before producers.

If you want to add a clarifying note covering these two points in the “backward compatibility” section of the PEP, I’d be fine with that. (It’s not technically backward compatibility, but that feels like the best section). If we’d had a “Transition Plan” section as I proposed in this post, that would have been a better place for it. I guess we could add that section if you want to.

Those 2 steps sound fine to me.

The reliable way to avoid it (at least during the transition) would be only normalizing on read rather than write, as mentioned above, since producer-side without consumer side normalization creates as many (if not more) short-term problems than it solves, whereas I’m not clear on how much practical benefit it brings. It makes no difference to consumers that normalize, and to those that don’t, it means requested extras names will no longer correctly match with those in the metadata, since normalization is only applied on one side of the comparison, and to a currently substantially-less-common form.

On the other hand, normalizing on output does ensure all tools use the normalized form, we are bumping the metadata version for this and its not clear if this is much of a problem in practice, since apparently pip may be already normalizing via PEP 503, at least in some places, so if that’s the case, producer-side normalization might actually unbreak some currently common consumer use cases.

I would assume the number of non-alphanumeric extra names is so small that we need not worry and just add a note.

I will add a note to the PEP when I get a chance.

We can actually test that assumption, thanks to @pf_moore 's data above. Of the 7337 unique extras in all PyPI wheels, 2333 of them are non-alphanumeric. The distribution of names with at least one of the given character works out like this:

[('_', 1153),
 ('-', 985),
 ('.', 222),
 (' ', 78),
 ('=', 12),
 ('"', 11),
 (':', 10),
 ('<', 9),
 ("'", 6),
 (',', 4),
 ('/', 3),
 ('[', 2),
 (']', 2),
 (')', 2),
 ('(', 2),
 (';', 2),
 ('+', 2),
 ('>', 1)]

While this seems pretty high, it comes with a huge caveat—there is almost certainly a large bias toward short, common, alphanumeric extras names (dev, test, lint, docs, etc) among the total number of usages, so its not really possible to directly infer their total prevalence from just these data; it may be that 10% of all packages define at least one non-alphanumeric extra, or only 0.1%. To resolve this, @pf_moore would have to count the actual number of total extras that were non-alphanumeric, rather than just the uniques.

There’s also another side of it, though—if even one dependency of a given package anywhere in the stack requires a non-alphanumeric extras name that doesn’t already use -, things are liable to break—if pip is not already doing this normalization, which I’m unclear on; if so, we’d be seeing pretty close to this amount of breakage already in certain cases, which producer-side normalization would negate.

There’s also a bias because I did nothing to distinguish between the latest version of packages and older versions. A package with an initial release, 10 years ago, declaring an extra 1>2 which was changed in the next release to one_gt_two, would still show up with both extra names…

A basic transition plan added in PEP 685: add a tranisition plan · python/peps@4d8bc00 · GitHub . Please feel free to open PRs to tweak if you want. Otherwise I will focus my attention on updating the specifications for this PEP.

1 Like

Add changes introduced by PEP 685 by brettcannon · Pull Request #1070 · pypa/packaging.python.org · GitHub has the update to the specifications at packaging.python.org.

My PR to update the specs on packaging.python.org hasn’t been merged yet and I’m a bit reticent to work on packaging w/o this PR merged. Anything I can do to help move the PR forward?

I’ve reviewed and merged it.

2 Likes

I’m trying to add PEP 685 support to whey, but I can’t get the regular expression shown in the PEP on PyPUG to actually match seemingly valid names with hyphens:

>>> re.match("^([a-z0-9]|[a-z0-9]([a-z0-9-](?!-))*[a-z0-9])$", "dev-test")
>>> re.match("^([a-z0-9]|[a-z0-9]([a-z0-9-](?!-))*[a-z0-9])$", "dev")
<re.Match object; span=(0, 3), match='dev'>

Am I doing something wrong, or is there a mistake in the regex?

The pattern doesn’t seem to appear in PEP 685, but is in the PyPA specification Core metadata specifications — Python Packaging User Guide

Perhaps it should be a double hypen in the negative lookahead?

A

Oh, you’re right. I had both open and must have gotten confused.

A double hyphen does indeed work, and correctly rejects multiple hyphens in a row:

>>> re.match("^([a-z0-9]|[a-z0-9]([a-z0-9-](?!--))*[a-z0-9])$", "dev-test")
<re.Match object; span=(0, 8), match='dev-test'>
>>> re.match("^([a-z0-9]|[a-z0-9]([a-z0-9-](?!--))*[a-z0-9])$", "dev--test")
>>> re.match("^([a-z0-9]|[a-z0-9]([a-z0-9-](?!--))*[a-z0-9])$", "dev---test") 

I will use that for now. Thanks.

TL;DR: The additions/changes to the spec appear to contain several inconsistencies vs. what is actually stated in the PEP.

I meant to and should have reviewed @brettcannon 's PR to notice this, but unfortunately it slipped through the cracks at the time.

Discrepancies:

  • The PEP states that for Metadata 2.3, valid extras are as specified in PEP 508 for names, and should be normalized on write and comparison per PEP 503, whereas the spec requires extras names in core metadata, project source metadata (pyproject.toml) and requirements specifiers match a new restrictive format that is the result of applying PEP 503 normalization to PEP 508 names without reference to such normalization or the allowed input form, while retroactively changing the valid names for prior Core Metadata versions.
  • Furthermore, the spec specifies a regex not mentioned in the PEP (that appears to not actually match many names valid under even this more restrictive specification, as @domdfcoding first mentioned
  • The spec also states that tools should warn and may error if extras names in older metadata versions do not conform to the normalized form, not just PEP 508 or if a conflict is present as the PEP specifies.

Implications:

  • This requires users manually modify their existing extras names in optional-dep keys, dep and optional-dep values, requirements specifiers and other places, rather than automatically normalizing them, which imposes a non-trivial burden on a far larger population of authors and users with otherwise perfectly working and PEP 508-valid extras names, rather than just names that don’t conform to PEP 508 or that clash in normalized form (which should be orders of magnitude less common, and are actually broken).
  • Furthermore, it means that the rules for names and extras in core metadata and requirements specifiers are not, in fact, consistent with those for other names, as was @pradyunsg 's primary justification for proposing PEP 503 normalization rather than PEP 503 except with _ as the replacement character, which would match existing standards and conventions.
  • Also, normalization is not mentioned in either the context of either requirements specifiers or entry points, only “restrictions”, when the PEP never specified that entry points must be pre-normalized in these contexts, only by tools comparing them.

I initially included a more detailed analysis of the specific inconsistencies, but I ended up just summarizing them in bullets and elide the full version for brevity. I’m happy to provide more detail, quotes and links if people would like, as well as a PR to the packaging site for whatever we decide on this.

It’s not retroactive if you read the spec as stating what is true for core metadata 2.3. If you would like to add a note about old behaviour then that’s fine, but I don’t view it as a retroactive change.

Yep, there’s a typo in the regular expression (and thanks to @domdfcoding for finding it!). I have a fix in Fix the regular expression validating extra names by brettcannon · Pull Request #1076 · pypa/packaging.python.org · GitHub .

But the regex isn’t meant to reject anything valid; it simply uses more regular expression features to let the actual pattern reject illegal names instead of having to do a separate check for the multiple hyphen problem which is in the spec. You don’t have to use the regex, but it does help encapsulate the logic that’s written in the previous paragraph more succinctly.

I’m not quite following what your concern is. Are you referring to the added references to PEP 685 in Add changes introduced by PEP 685 by brettcannon · Pull Request #1070 · pypa/packaging.python.org · GitHub for every other location in the spec that referencing extras? If that’s the concern, then I disagree that the mentions are in anyway problematic. A key goal with this PEP was to get the whole extras situation under control so that all names when written are normalized. Not mentioning where extra names should be heading in the spec outside of the core metadata section of the spec would hinder that goal.

BTW I’ve opened Adhere to PEP 685 when evaluating markers with extras by hroncok · Pull Request #545 · pypa/packaging · GitHub before seeing the recent discussion here.