As promised in What extras names are treated as equal and why? , here is the PEP to standardize how to normalize and compare extra names.
A rendered version can be found at PEP 685 – Comparison of extra names for optional distribution dependencies | peps.python.org .
PEP: 685
Title: Comparison of extra names for optional distribution dependencies
Author: Brett Cannon <brett@python.org>
PEP-Delegate: Paul Moore <p.f.moore@gmail.com>
Discussions-To: https://discuss.python.org/t/14141
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 08-Mar-2022
Post-History: 08-Mar-2022
Abstract
========
This PEP specifies how to normalize `distribution extra <Provides-Extra_>`_
names when performing comparisons.
This prevents tools from either failing to find an extra name, or
accidentally matching against an unexpected name.
Motivation
==========
The `Provides-Extra`_ core metadata specification states that an extra's
name "must be a valid Python identifier".
:pep:`508` specifies that the value of an ``extra`` marker may contain a
letter, digit, or any one of ``.``, ``-``, or ``_`` after the initial character.
Otherwise, there is no other `PyPA specification
<https://packaging.python.org/en/latest/specifications/>`_
which outlines how extra names should be written or normalization for comparison.
Due to the amount of packaging-related code in existence,
it is important to evaluate current practices by the community and
standardize on one that doesn't break most code, while being
something tool authors can agree to following.
The issue of there being no standard was brought forward by an
`initial discussion <https://discuss.python.org/t/7614>`__
noting that the extra ``adhoc-ssl`` was not considered equal to the name
``adhoc_ssl`` by pip 22.
Rationale
=========
:pep:`503` specifies how to normalize distribution names::
re.sub(r"[-_.]+", "-", name).lower()
This collapses any run of the substitution character down to a single
character,
e.g. ``---`` gets collapsed down to ``-``.
This does **not** produce a valid Python identifier as specified by
the core metadata 2.2 specification for extra names.
`Setuptools 60 does normalization <https://github.com/pypa/setuptools/blob/b2f7b8f92725c63b164d5776f85e67cc560def4e/pkg_resources/__init__.py#L1324-L1330>`__
via::
re.sub(r'[^A-Za-z0-9-.]+', '_', name).lower()
The use of an underscore/``_`` differs from PEP 503's use of a
hyphen/``-``.
Runs of ``.`` and ``-``, unlike PEP 503, do **not** get collapsed,
e.g. ``..`` stays the same.
For pip 22, its
"extra normalisation behaviour is quite convoluted and erratic" [pip-erratic]_,
and so its use is not considered.
.. [pip-erratic] https://discuss.python.org/t/what-extras-names-are-treated-as-equal-and-why/7614/10?
Specification
=============
When comparing extra names, tools MUST normalize the names being compared
using the semantics outlined in `PEP 503 for names <https://peps.python.org/pep-0503/#normalized-names>`__::
re.sub(r"[-_.]+", "-", name).lower()
The `core metadata`_ specification will be updated such that the allowed
names for `Provides-Extra`_ matches what :pep:`508` specifies for names.
This will bring extra naming in line with that of the Name_ field.
Because this changes what is considered valid, it will lead to a core
metadata version increase to ``2.3``.
For tools writing `core metadata`_,
they MUST write out extra names in their normalized form.
This applies to the `Provides-Extra`_ field and the `extra marker`_
when used in the `Requires-Dist`_ field.
Tools generating metadata MUST raise an error if a user specified
two or more extra names which would normalize to the same name.
Tools generating metadata MUST raise an error if an invalid extra
name is provided as appropriate for the specified core metadata version.
If an older core metadata version is specified and the name would be
invalid with newer core metadata versions,
tools SHOULD warn the user.
Tools SHOULD warn users when an invalid extra name is read and not use
the name to avoid ambiguity.
Tools MAY raise an error instead of a warning when reading an
invalid name if they so desire.
Backwards Compatibility
=======================
Moving to :pep:`503` normalization and :pep:`508` name acceptance, it
allows for all preexisting, valid names to continue to be valid.
Based on research looking at a collection of wheels on PyPI [pypi-results]_,
the risk of extra name clashes is limited to 73 clashes when considering
even invalid names,
while *only* looking at valid names leads to only 3 clashes:
1. dev-test: dev_test, dev-test, dev.test
2. dev-lint: dev-lint, dev.lint, dev_lint
3. apache-beam: apache-beam, apache.beam
By requiring tools writing core metadata to only record the normalized name,
the issue of preexisting, invalid extra names should be diminished over
time.
.. [pypi-results] https://discuss.python.org/t/pep-685-comparison-of-extra-names-for-optional-distribution-dependencies/14141/17?u=brettcannon
Security Implications
=====================
It is possible that for a distribution that has conflicting extra names, a
tool ends up installing distributions that somehow weaken the security
of the system.
This is only hypothetical and if it were to occur,
it would probably be more of a security concern for the distributions
specifying such extras names rather than the distribution that pulled
them in together.
How to Teach This
=================
This should be transparent to users on a day-to-day basis.
It will be up to tools to educate/stop users when they select extra
names which conflict.
Reference Implementation
========================
No reference implementation is provided aside from the code above,
but the expectation is the `packaging project`_ will provide a
function in its ``packaging.utils`` that will implement extra name
normalization.
It will also implement extra name comparisons appropriately.
Finally, if the project ever gains the ability to write out metadata,
it will also implement this PEP.
Rejected Ideas
==============
Using setuptools 60's normalization
-----------------------------------
Initially this PEP proposed following setuptools to try and minimize
backwards-compatibility issues.
But after checking various wheels on PyPI,
it became clear that standardizing **all** naming on :pep:`508` and
:pep:`503` semantics was easier and better long-term.
Open Issues
===========
N/A
Copyright
=========
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.
.. _core metadata: https://packaging.python.org/en/latest/specifications/core-metadata/
.. _extra marker: https://peps.python.org/pep-0508/#extras
.. _Name: https://packaging.python.org/en/latest/specifications/core-metadata/#name
.. _packaging project: https://packaging.pypa.io
.. _Provides-Extra: https://packaging.python.org/en/latest/specifications/core-metadata/#provides-extra-multiple-use
.. _Requires-Dist: https://packaging.python.org/en/latest/specifications/core-metadata/#requires-dist-multiple-use