PEP 685: Comparison of extra names for optional distribution dependencies

CAM-Gerlach · March 11, 2022, 5:21am

Actually, Option 1 does not do that, as test-extra, test.extra and test_extra (and test--extra and test.extra) are all left alone, not normalized to one common form. The only change that Option 1 makes to PEP 508-valid extras names is normalizing test__extra to test_extra. This is the fundamental difference between Option 1 and Option 2 (and why you’re not seeing different spellings of dev_lint get normalized to the same name).

I did previously do some limited testing of this which appeared to suggest at least some current tools reject extras that have non-PEP 508-conforming names, at least in some contexts. However, I would be concerned that tools should be tolerant of extras produced by older or other tools that may not have conforming names, given the current situation.

Given my background and expertise being a pretty good match for that and it being a more valuable use of my skills than copyediting others’ PEPs, its something I’m been meaning to offer to help with as one of my next projects, but I want to finish PEP 639 and some remaining PEP infra/documentation work before overextending myself further.

To be fair, we have a PEP (clarifying this being part of the motivating purpose for said PEP), so isn’t this a moot point as we can just specify it there (as PEPs are change proposals to the existing canoncial PyPA specifications, rather than standalone specifications)?

CAM-Gerlach · March 11, 2022, 6:00am

I’ve been taking care of a family situation over the past couple days, so I’ll save a detailed pass on the specific changes in the PEP for tomorrow, but responding to a few higher-level points on the content:

I fully agree with standardizing on PEP 508 for valid extras names (which is identical to the requirements for Name in core metadata), provided the PEP 503 or Option 2 normalized form (which are identical for valid such names) is always used to compare them.

Just to note, it is stated a few places in the PEP that PEP 508 is looser than Provides-Extra spec, but that is not strictly the case. Namely, the former requires extras names neither start or end with _ (or . and -), whereas valid Python identifiers can start or end with _. Also, non-ASCII alphanumeric characters are valid in Python 3 identifiers (though not in Python 2 identifiers, which was presumably still relevant at the time the spec was written), so if tools or users have interpreted these as being valid under the Provides-Metadata spec, then these will also be invalid.

It would probably be worth mentioning such cases should at least be mentioned in the relevant place(s) and clarifying the current text.

To note, Option 2 (with - instead of _) is PEP 503 normalization, provided the source names are valid per PEP 508 (which for this is identical to the format specified for the Name core metadata field). Therefore, applying it to PEP 508-valid package names and extras produces the same result as the specific regex used by packaging for PEP 503.

The one area where it differs is for existing extras that do not conform to PEP 508, where it replaces invalid characters with valid ones. This allows users to still specify extras for existing packages that are invalid under the updated specification (and the existing PEP 508 one, but not necessarily the Provides-Extra spec). This would seem to follow the principle of “loose on input, strict on output” and addresses most remaining backward compatibility concerns, but most of these names were always invalid, it requires handling conflicts and adds complexity.

Runs of _, unlike PEP 503, do not get collapsed, e.g. ___ stays the same.

Seems like we’re still confused on what this code actually does Runs of _, do get collapsed, as do runs of other characters outside alphanumeric and -/. (and normalized to _), while runs of - and . do not, unlike PEP 503.

pradyunsg · March 11, 2022, 8:24am

Uhm… if we’re calling them invalid right now and they don’t work today, why bother stretching to ensure that we don’t reject them in the future?

Hmmm… maybe this could also say:

… tools SHOULD warn when the extra name is invalid and ignore such an extra name. Alternatively, tools MAY raise an error, in effect refusing to process handle such an extra name.

I think it’s probably ~20-30 hours of copy-pasting + copy-editting work, to be honest – for someone who knows what they’re doing.

I agree that we shouldn’t block other improvements on doing that, but I wouldn’t expect this to be that much work. It’s just that it’ll be grindy and repetitive. It also requires someone else to sit and review 1000s of lines of prose (~1-20 hours, depending on familiarity + how much we trust that the first person did it correctly).

brettcannon · March 11, 2022, 10:57pm

True. I will tweak the PEP. I have also created an open issue around whether we need to bump the core metadata version for this.

No, I’ve just had a lot going on IRL and so my brain wasn’t reading the regex properly.

brettcannon · March 11, 2022, 11:06pm

Bump the core metadata version to 2.3.
Have metadata writers raise errors when encountering invalid names (based on the core metadata version).
Have metadata writers warn when a name wouldn’t be valid in the future.
At least ignore invalid names when reading, potentially raise an error.
Minor tweaks.

pf_moore · March 11, 2022, 11:28pm

Looking at this, if a sdist has metadata version 2.3 (or 2.2) but doesn’t have a “Dynamic” field at all, it’s not obvious to me from PEP 643 whether that would be treated as “everything is static because it’s not specified as dynamic”, or if it would mean that everything is dynamic (for backward compatibility) because the rules only apply “When (Dynamic is) found in the metadata of a source distribution”.

I don’t recall if I had a particular interpretation in mind when I wrote the PEP, unfortunately But with hindsight, I think it would be better to go with the backward compatible approach, as otherwise tools might hold off on moving to later metadata standards because they’d need to handle Dynamic first.

Would anyone object if I submitted a clarification for PEP 643 which made that point explicit:

If a project specifies metadata version 2.2 or later, but the Dynamic field is not present at all, then for backward compatibility, all fields are assumed to be dynamic. However, projects SHOULD explicitly include the Dynamic field if at all possible, rather than relying on this behaviour.

(I’m specifically thinking that setuptools might want to add support for the new license expression metadata when it’s finalised, and this would have a metadata version >2.2. I wouldn’t want that to be blocked on getting support for dynamic sorted out).

Or am I being too cautious here? It’s pretty easy just to add a Dynamic field listing everything if you have no better way of knowing what might be static, and your tool has to change to bump the metadata version anyway.

layday · March 12, 2022, 10:09am

That doesn’t make sense to me. If you don’t have any dynamic fields, will you need to add a dummy Dynamic entry to denote that your metadata are static? What should its value then be?

pf_moore · March 12, 2022, 10:15am

Oops, good point. So I think this is a more substantial question than I’d originally thought. I’ll repost my question as a separate topic, to avoid hijacking this one.

uranusjr · March 12, 2022, 6:21pm

I feel the PEP text should explicit state what is considered a valid extra name. Currently the rule is sort of inferred by various other rules and it is not easy for a tool to know how a user-provided extra name should be validated before normalisation.

CAM-Gerlach · March 14, 2022, 4:57am

I opened a PR with some clarifications, copyedits and other fixes:

github.com/python/peps

PEP 685: Copyedit and fix various minor issues following further changes

python:main ← CAM-Gerlach:pep-685-copyedit-v2

opened 04:39AM - 14 Mar 22 UTC

CAM-Gerlach

+34 -32

Clarifications: * Refine description of PEP 503 substitution to avoid implying …that only the substitution character (`-`) is replaced or that only collapsing is done, and slightly expand examples. * Add that a critical distinction between `safe_extra()` and PEP 503 is normalizing non-PEP 508 characters, avoid implying that only collapsing (as opposed to both collapsing and replacement) is not done for `-` and `.`, and briefly mention the further inconsistency with the function's docstring. * Tweak discussion of PyPI results to explicitly make clear that potential "clashes" were global, not project-specifc * Clarify what specifically was being followed in Setuptools and that a key factor in going with PEP 503 instead was a lack of real-world backward compat issues Copyedits: * Fix some grammar issues * Add clarifying verbiage in a handful of places * Avoid repetition in a few places * Use more appropriate diction in a couple spots Fixes: * Add a few missing commas * Add verbatim around a few appropriate elements * Use inline links per PEP 12 and clean Discourse URLs * Update Post-History per new PEP 12 guidance * Remove empty Open Issues section

Also, for those interested, @pf_moore 's aforementioned thread above is here:

I had a few additional, more substantive questions and comments on the PEP text:

Tools generating metadata MUST raise an error if an invalid extra name is provided as appropriate for the specified core metadata version. If an older core metadata version is specified and the name would be invalid with newer core metadata versions, tools SHOULD warn the user.

How is the core metadata version determined/specified? AFAIK, there is no user-facing way to specify this, at least in the build backends I’m familiar with. Per the discussion in @pf_moore 's thread, should this just be the latest and we can just simplify this to just raise an error if an invalid extra names is provided (since the core metadata spec is only specifying the current version, I’m not sure it makes sense to discuss older versions)?

Tools SHOULD warn users when an invalid extra name is read and SHOULD not use the name to avoid ambiguity.

What does it mean to “not use the name”? Should tools error out? Or just not write/install the extra? That seems like rather unexpected and undesirable behavior; if the tool is going to not just try to use the name anyway (presumably with normalization), it would be better to just error out rather than do something other than what the user explicitly requested.

Moving to PEP 503 normalization and PEP 508 name acceptance, it allows for all preexisting, valid names to continue to be valid.

Perhaps this should be clarified to say that valid extras specifiers per PEP 508 will continue to remain valid, since as mentioned above, this isn’t strictly true for the existing spec for Provides-Extra in core metadata.

hugovk · March 14, 2022, 7:20am

It’s good the rationale section follows the regex spec (re.sub(r"[-_.]+", "-", name).lower()) with a prose description (" This collapses any run of the substitution character down to a single character, e.g. --- gets collapsed down to - .")

The specification section also repeats the regex spec. It would be nice to also have a prose description here.

I realise (currently) the two are identical, but I can see people directly linking to https://peps.python.org/pep-0685/#specification so there’s value in seeing that relevant information immediately.

pf_moore · March 14, 2022, 9:01am

You only quoted parts of the relevant paragraph. Here’s the full paragraph:

Tools generating metadata MUST raise an error if a user specified two or more extra names which would normalize to the same name. Tools generating metadata MUST raise an error if an invalid extra name is provided as appropriate for the specified core metadata version. If an older core metadata version is specified and the name would be invalid with newer core metadata versions, tools SHOULD warn the user. Tools SHOULD warn users when an invalid extra name is read and not use the name to avoid ambiguity. Tools MAY raise an error instead of a warning when reading an invalid name if they so desire.

The only context in which “an older core metadata version” can be specified is in the case of a metadata consumer reading data generated by a tool that hasn’t been updated for the new spec. In that case, the consumer should warn, so that the user knows that the extra will have to change in future.

You seem to be thinking in terms of multiple valid metadata formats (one for each version). That’s not a helpful way of thinking of things - there’s only one metadata format, and it gets updated over time. The expectation is that all tools and projects will conform to the current metadata spec. The version numbering is solely to manage the fact that once metadata is generated, it doesn’t get rewritten, so handling legacy formats is a necessary evil, but only for metadata consumers. Producers should always follow the rules for the current (latest) specification^[1].

Yes, the spec says

For broader compatibility, build tools MAY choose to produce distribution metadata using the lowest metadata version that includes all of the needed fields.

IMO that’s a mistake, and we should remove it, and expect build tools to produce metadata that conforms to the latest standard (yes, I will go back to my post on the other thread where I said this wasn’t worth the effort and update it )

Obviously there will always be delays in implementing spec changes, though. ↩︎

CAM-Gerlach · March 14, 2022, 8:23pm

Ah okay, thanks, that does make more sense in the context of a metadata consumer—I had been thinking just in terms of a producer.

Actually, the latter was more or less what I intended to meant by

but I was only thinking in terms of the context of a metadata producer rather than a consumer.

Why, you don’t say…

I’ve opened a pull request as pypa/packaging.python.org#1063; should I open (yet) another dedicated thread to ensure this has appropriate visibility, or should we just wait for further discussion on the existing thread?

brettcannon · March 14, 2022, 10:30pm

That’s how I interpreted the PEP.

It currently says …

brettcannon:

The `core metadata`_ specification will be updated such that the allowed
names for `Provides-Extra`_ matches what :pep:`508` specifies for names.
This will bring extra naming in line with that of the Name_ field.

What are you specifically after here? Do you want me to copy ^([A-Z0-9]|[A-Z0-9][A-Z0-9._-]*[A-Z0-9])$ into the PEP?

Feel free to send a PR, but people to read the entire PEP. Plus the actual specification will be reflected at packaging.python.org and not this PEP, so that won’t be important long-term (if this PEP gets accepted).

brettcannon · March 14, 2022, 10:33pm

We seem to be getting into verbage discussions more than technical details. And since this all has to be translated into a PR for packaging.python.org, I don’t want to get too hung up on how things are written as long as the semantic meaning is correct.

So is there any more technical feedback on this PEP?

CAM-Gerlach · March 14, 2022, 10:40pm

I’d consider at least this issue to potentially be more than verbiage, as it relates to what we expect and are telling tools to do.

brettcannon · March 14, 2022, 11:32pm

Sorry, missed that as it was interleaved with core metadata version stuff.

“Not use it” means to not “install the extra” to me.

And just to be clear, this is for reading metadata as the PEP says, not writing.

I don’t think so if they were warned by the tool as the PEP suggests.

You could view it as another form of an error. But there’s a concern of breaking folks who may read poorly written metadata. For instance, let’s say a person specifies they want to install an extra that has an ambiguous name and it’s embedded in a workflow pipeline that’s worked up until now. In the next version the distribution updates their core metadata version. Is it better to have that pipeline completely blow up due to an error about this, or “In the face of ambiguity, refuse the temptation to guess” and simply not install something? I don’t know, but I don’t think someone should guess regardless, hence the “SHOULD” suggestion.

But the following part of the PEP explicitly allows tools to error out as a “MAY” if they think that may be better. So the PEP is basically saying, “at minimum, don’t guess, but you could error out if you think that’s better for your users”. This ties into “be loose on the input, strict on the output” where erroring out could be viewed as being strict on input. So I think the PEP says what the bare minimum is (“SHOULD”), and suggests a potentially stronger reaction if it makes sense (“MAY”).

pf_moore · March 14, 2022, 11:37pm

Speaking as a pip maintainer, I’d rather the PEP stays as it is in this regard. I can understand what it means (I believe) and it’s open enough to allow pip to make appropriate choices.

Who benefits from making this any more explicit? This is the section about consumers, and if a consumer assumes PEP 685 is being followed, they will never even see invalid extras (because a conformant producer is not allowed to write invalid data). So what a consumer does with data that doesn’t follow PEP 685 isn’t really up to PEP 685 to specify…

CAM-Gerlach · March 14, 2022, 11:40pm

I guess my view is that I’d rather refuse the temptation to guess that the user just wants the package without the extra and rather fail early with an explicit error message.

The alternative is to have things blow up somewhere else when a required dependency isn’t available, and have the user have to figure out that’s the problem, then figure out that its because the extra isn’t getting installed, then backtrack and figure out where in their pipeline the extra isn’t getting installed as requested, and finally find the warning and hopefully draw the connection. In an ideal world, they’d see the warning prominently and immediately realize what it entails, but at least in the context you seem to be referring to, it may well be buried deep in their CI output.

That said, unlike you I’m neither a packaging expert nor a tool maintainer, and this isn’t likely to be a common case regardless, so if having considered this you’ve determined the present approach is best, then it very likely is.

ofek · March 16, 2022, 11:46pm

I’m having difficulty understanding where we ended up at regarding the decision. Is this the desired outcome (PEP 508 regex + PEP 503 normalization)? Normalize the names of optional dependency groups by ofek · Pull Request #163 · ofek/hatch · GitHub