To be honest, I think you’re over-complicating things, but it’s your PR so it’s up to you, ultimately.
Err, well, wouldn’t it be up to @brettcannon as the PEP author?
I thought you were talking about the PR to update the docs to reflect the PEP. My apologies if I misunderstood, but if it’s about the PEP then that’s been submitted for review now, so I don’t expect it to be changed at this point.
Yeah, but as @brettcannon is the author of the PEP motivating these updates, I would defer to his guidance on whether he’d like my help with the PR, and if so on what to update, if he has a preference.
On reflection while reviewing this section of the PEP, I think it could be worded slightly more clearly. To be specific, I think the intent is fine, but the phrasing could be clearer, so I will submit a PR to the PEP which IMO improves the wording. This won’t affect my decision on the PEP, though.
It is with great pleasure that I confirm that PEP 685 is approved. Congratulations to @brettcannon and thanks to everyone who participated in the discussions.
There’s not much else to say on this PEP, it’s a nice, clean and well-focused change that provides clarity on a previously ill-defined aspect of packaging. I’d strongly encourage tools to adopt it as soon as possible.
Hatchling already does as of v0.21.0, in case anyone wants to try it out.
[edit: corrected link]
For those spec pages that are just referring to PEPs currently, yes. I only have so much bandwidth and rewriting all packaging specs that still directly reference PEPs is not a small undertaking.
Thanks so much! I have gone ahead and marked the PEP as accepted. The PEP could jump straight to “final” or wait until we feel enough of the community has adopted it (e.g.
I have also opened Implement PEP 685 · Issue #526 · pypa/packaging · GitHub against
packaging so it can adopt this PEP.
I did (belatedly) think of one potential backward compat issue with PEP 685 (in particular, having producers write out the normalized form combined with using
- instead of
_ for the normalization character) to be aware of.
If a package’s metadata is produced with a PEP 685-conforming tool and an extra has a name that includes a
_ (or a
.), it is written out in its PEP 503-normalized form (with
-). If tools consuming that metadata are not PEP 685-aware, i.e. they do not perform PEP 503 normalization on existing user inputs (e.g. a user attempting to install a package with that extra, a
requirements.txt specifies it, or a dependent package requires it), then existing working requirement specifiers with the existing non-normalized name will stop working until the consuming tools are updated, or the normalized form of the extras name is specified instead.
This could be avoided by not having metadata producers write out the normalized form of the extras name to the core metadata (or mostly mitigated by choosing
_ instead of
- as the normalization character, given the latter is much more popular currently), but I’m not sure this is enough of a problem in practice to warrant that, since
pip at least (by far the most popular tool in a position to encounter this problem) does appear to already normalize extras names, and to
- as well, so either these cases already work, or they already don’t (and in practice,
- and producer-side normalization may actually be less problematic for this existing case, the one which originally motivated this PEP). But I just wanted to mention that so others are aware.
That’s true during the transition. I don’t know if @pf_moore wants a change to the PEP before it starts getting implemented?
Do you have a suggestion for how we’d mitigate this issue? It seems to me that all we can do is:
- Note the issue, so that if people hit it they have a place to find out what happened and what to do.
- Prioritise implementing PEP 685 in consumers before producers.
If you want to add a clarifying note covering these two points in the “backward compatibility” section of the PEP, I’d be fine with that. (It’s not technically backward compatibility, but that feels like the best section). If we’d had a “Transition Plan” section as I proposed in this post, that would have been a better place for it. I guess we could add that section if you want to.
Those 2 steps sound fine to me.
The reliable way to avoid it (at least during the transition) would be only normalizing on read rather than write, as mentioned above, since producer-side without consumer side normalization creates as many (if not more) short-term problems than it solves, whereas I’m not clear on how much practical benefit it brings. It makes no difference to consumers that normalize, and to those that don’t, it means requested extras names will no longer correctly match with those in the metadata, since normalization is only applied on one side of the comparison, and to a currently substantially-less-common form.
On the other hand, normalizing on output does ensure all tools use the normalized form, we are bumping the metadata version for this and its not clear if this is much of a problem in practice, since apparently pip may be already normalizing via PEP 503, at least in some places, so if that’s the case, producer-side normalization might actually unbreak some currently common consumer use cases.
I would assume the number of non-alphanumeric extra names is so small that we need not worry and just add a note.
I will add a note to the PEP when I get a chance.
We can actually test that assumption, thanks to @pf_moore 's data above. Of the 7337 unique extras in all PyPI wheels, 2333 of them are non-alphanumeric. The distribution of names with at least one of the given character works out like this:
[('_', 1153), ('-', 985), ('.', 222), (' ', 78), ('=', 12), ('"', 11), (':', 10), ('<', 9), ("'", 6), (',', 4), ('/', 3), ('[', 2), (']', 2), (')', 2), ('(', 2), (';', 2), ('+', 2), ('>', 1)]
While this seems pretty high, it comes with a huge caveat—there is almost certainly a large bias toward short, common, alphanumeric extras names (
docs, etc) among the total number of usages, so its not really possible to directly infer their total prevalence from just these data; it may be that 10% of all packages define at least one non-alphanumeric extra, or only 0.1%. To resolve this, @pf_moore would have to count the actual number of total extras that were non-alphanumeric, rather than just the uniques.
There’s also another side of it, though—if even one dependency of a given package anywhere in the stack requires a non-alphanumeric extras name that doesn’t already use
-, things are liable to break—if pip is not already doing this normalization, which I’m unclear on; if so, we’d be seeing pretty close to this amount of breakage already in certain cases, which producer-side normalization would negate.
There’s also a bias because I did nothing to distinguish between the latest version of packages and older versions. A package with an initial release, 10 years ago, declaring an extra
1>2 which was changed in the next release to
one_gt_two, would still show up with both extra names…
A basic transition plan added in PEP 685: add a tranisition plan · python/peps@4d8bc00 · GitHub . Please feel free to open PRs to tweak if you want. Otherwise I will focus my attention on updating the specifications for this PEP.