PEP 804: An external dependency registry and name mapping mechanism

Hello everyone!

On behalf of my co-authors @pradyunsg, @rgommers, @mgorny and @msarahan, I’d like to share our work on this new PEP 804 “An external dependency registry and name mapping mechanism”.

This PEP is meant to complement PEP 725 by adding a central registry of known identifiers and a mapping mechanism to translate the ecosystem-agnostic identifiers to names known to target package managers.

From previous rounds of PEP 725 community reviews, it was clear that a significant number of people wanted a way to ensure external dependency specifiers have an agreed-upon canonical form and recommended (possibly enforceable) ways of spelling them for each package. This PEP will allow implementing that.

To that end, we propose a series of schemas and a few examples in external-metadata-mappings, which can also be browsed in the accompanying Streamlit app.

A CLI and Python API is provided by the package pyproject-external, which allows you to perform tasks like “Print and validate the [external] table in this PEP-725-ready package” or “give me the install command to provide the external dependencies I need to build this project”. You can see it in action in external-deps-build’s CI.

Please feel free to check the rendered version.

Full PEP text:

PEP: 804
Title: An external dependency registry and name mapping mechanism
Author: Pradyun Gedam pradyunsg@gmail.com,
Ralf Gommers ralf.gommers@gmail.com,
Michał Górny mgorny@quansight.com,
Jaime Rodríguez-Guerra jaime.rogue@gmail.com,
Michael Sarahan msarahan@gmail.com
Discussions-To: Pending
Status: Draft
Type: Standards Track
Topic: Packaging
Requires: 725
Created: 03-Sep-2025
Post-History: 03-Sep-2025,

Abstract

This PEP specifies a name mapping mechanism that allows packaging tools to map
external dependency identifiers (as introduced in :pep:725) to their
counterparts in other package repositories.

Motivation

Packages on PyPI often require build-time and runtime dependencies that are not
present on PyPI. :pep:725 introduced metadata to express
such dependencies. Using concrete external dependency metadata for
a Python package requires mapping the given dependency identifiers to the specifiers
used in other ecosystems, which would allow to:

  • Enabling tools to automatically map external dependencies to packages in other
    packaging repositories/ecosystems,
  • Including the needed external dependencies with the package
    names used by the relevant system package manager on the user’s system
    in
    error messages emitted by Python package installers and build frontends,
    as well as allowing the user to query for those names directly to obtain install
    instructions.

Packaging ecosystems like Linux distros, conda, Homebrew, Spack, and Nix need
full sets of dependencies for Python packages, and have tools like pyp2rpm_
(Fedora), Grayskull_ (conda), and dh_python_ (Debian) which attempt to
automatically generate dependency information from the metadata available in
upstream Python packages. Before PEP 725, external dependencies were handled manually,
because there was no metadata for this in pyproject.toml or any other
standard metadata file. Enabling its automatic conversion is a key benefit of
this PEP, making Python packaging easier and more reliable. In addition, the
authors envision other types of tools making use of this information; e.g.,
dependency analysis tools like Repology_, Dependabot_ and libraries.io_.

Rationale

Prior art

The R language has a System Requirements for R packages < GitHub - rstudio/r-system-requirements: System requirements for R packages >__ with a central
registry that knows how to translate external dependency metadata to install
commands for package managers like apt-get. This registry centralises the
mappings for a series of Linux distributions, and also Windows. macOS is not
present. The "Rule Coverage" of its README < r-system-requirements/README.md at 7314012a48d38854c19f439e1c2d2e4b383fe7ea · rstudio/r-system-requirements · GitHub >__
used to show that this system improves the chance of success of building packages
from CRAN from source. Across all CRAN packages,
Ubuntu 18 improved from 78.1% to 95.8%, CentOS 7 from 77.8% to 93.7% and openSUSE
15.0 from 78.2% to 89.7%. The chance of success depends on how well the registry
is maintained, but the gain is significant: ~4x fewer packages fail to build on
Ubuntu and CentOS in a Docker container.

RPM-based distributions, like Fedora, can use a rule-based implementation < Wanting a singular packaging tool/vision - #117 by encukou >__
(NameConvertor) in pyp2rpm_. The main rule is that the RPM name for a PyPI package is
f"python-{pypi_package_name}". This seems to work quite well; there are a
few variants like Python version specific names, where the prefix contains the
Python major and minor version numbers (e.g. python311- instead of
python-).

Gentoo follows a similar approach to naming Python packages, using the dev-python/
category and some well-specified rules <``https://projects.gentoo.org/python/guide/package-maintenance.html``>__.

Conda-forge has a more explicit name mapping, because the base names are the
same in conda-forge as on PyPI (e.g., numpy maps to numpy), but there
are many exceptions because of both name collisions and renames (e.g., the PyPI
name for PyTorch is torch while in conda-forge it’s pytorch). There are
several name mappings efforts maintained by different teams. Conda-forge’s infrastructure
generates one in regro/cf-graph-countyfair < cf-graph-countyfair/mappings/pypi at master · regro/cf-graph-countyfair · GitHub >.
Grayskull maintains its own curated mapping <
grayskull/grayskull/strategy/config.yaml at main · conda/grayskull · GitHub >.
Prefix.dev created the parselmouth mappings < GitHub - prefix-dev/parselmouth >__
to support conda and PyPI integrations in their tooling. A more complete overview of
their approaches, strengths and weaknesses can be found in
conda/grayskull#564 < Tame the PyPI / Conda mapping chaos · Issue #564 · conda/grayskull · GitHub >__.

The OpenStack <``https://www.openstack.org/``>__ ecosystem also needs to deal with
some mapping efforts. All of them focus on Linux distributions, exclusively.
pkg-map < pkg-map — diskimage-builder 3.39.1.dev22 documentation >__
accompanies diskimage-builder and provides a file format where the user defines
arbitrary variable names and their corresponding names in the target distro
(Red Hat, Debian, OpenSUSE, etc). See example for PyYAML < diskimage-builder/diskimage_builder/elements/svc-map/pkg-map at 5bc5f8aff3b40b1918ce72660f1dba38c3c4f27a · stbenjam/diskimage-builder · GitHub >.
bindep <https://opendev.org/opendev/bindep>
defines a file bindep.txt
(see example < bindep/bindep.txt at master - bindep - OpenDev: Free Software Needs Free Tools >__)
where users can write down dependencies that are not installable from PyPI. The format is
line-based, with each line containing a dependency as found in the Debian ecosystem.
For other distributions, it offers a “filters” syntax between square brackets where users
can indicate other target platforms, optional dependencies and extras.

The need for mappings is also found in other ecosystems like SageMath < `sage --package create --pypi`: Fill `distros/` · Issue #36356 · sagemath/sage · GitHub >,
but also by end-users themselves who want to install PyPI packages with their system
package manager of choice (example StackOverflow question <``https://unix.stackexchange.com/q/761371``>
).

Governance and maintenance costs of name mappings

The maintenance cost of external dependency mappings to a large number of packaging
ecosystems is potentially high. We choose to define the registry in such
a way that:

  • A central authority maintains the list of recognized DepURLs and the
    known ecosystem mappings.
  • The mappings themselves are maintained by the target packaging ecosystems.

Hence this system is opt-in for a given ecosystem, and the associated
maintenance costs are distributed.

Generating package manager-specific install commands

Python package authors with external dependencies usually have installation
instructions for those external dependencies in their documentation. These
instructions are difficult to write and keep up-to-date, and are usually only
covering one or at most a handful of platforms. As an example, here are SciPy’s
instructions for its external build dependencies (C/C++/Fortran compilers,
OpenBLAS, pkg-config):

  • Debian/Ubuntu: sudo apt install -y gcc g++ gfortran libopenblas-dev liblapack-dev pkg-config python3-pip python3-dev
  • Fedora: sudo dnf install gcc-gfortran python3-devel openblas-devel lapack-devel pkgconfig
  • CentOS/RHEL: sudo yum install gcc-gfortran python3-devel openblas-devel lapack-devel pkgconfig
  • Arch Linux: sudo pacman -S gcc-fortran openblas pkgconf
  • Homebrew on macOS: brew install gfortran openblas pkg-config

The package names vary a lot, and there are differences like some distros
splitting off headers and other build-time dependencies in a separate
-dev/-devel package while others do not. With the registry in this PEP,
this could be made both more comprehensive and easier to maintain through a tool
command with semantics of “show this ecosystem’s preferred package manager
install command for all external dependencies”
. This may be done as a
standalone tool, or as a new subcommand in any Python development workflow tool
(e.g., Pip, Poetry, Hatch, PDM, uv).

To this end, each ecosystem mapping can provide a list of package managers
known to be compatible, with templated instructions on how to install and query
packages. The provided install command templates are paired with query command templates
so those tools can check whether the needed packages are already present without
having to attempt an install operation (which might be expensive and have unintended
side effects like version upgrades).

Registry design

The mapping infrastructure has been designed to present the following components and properties:

  • A central registry of PEP 725 identifiers (DepURLs), including at least the
    well-known generic and virtual identifiers considered canonical.
  • A list of known ecosystems, where ecosystem maintainers can register their name mapping(s).
  • A standardized schema that defines how mappings should be structured. Each mapping can
    also provide programmatic details about how their supported package manager(s) work.

The above documents are provided as JSON files validated by accompanying JSON schemas.
A Python library and CLI is provided to query and utilize these resources. The user can
configure which system package manager they prefer to use for the default package mappings
and command generation (e.g. a user on Ubuntu may prefer conda, brew or spack
instead of apt as their package manager of choice to provide external dependencies).

Specification

Three schemas are proposed:

  1. A central registry of known DepURLs, as introduced in PEP 725.
  2. A list of known ecosystems and the canonical URL for their mappings.
  3. The ecosystem-specific mappings of DepURLs to their
    corresponding ecosystem specifiers, plus details of their package manager(s).

Central registry

The central registry defines which identifiers are recognized as canonical,
plus known aliases. Each entry MUST provide a valid DepURL in the
field id, with an optional free form description text. Additionally
an entry MAY refer to another entry via its provides field, which takes
a string or a list of strings already defined as id in the registry. This is useful
for both aliases (e.g. dep:generic/arrow and dep:github/apache/arrow) and
concrete implementations of a dep:virtual/ entry (e.g. dep:generic/gcc
would provide dep:virtual/compiler/c). Entries without provides content
or, if populated, only with dep:virtual/ identifiers, are considered
canonical. The provides field MUST NOT be present in dep:virtual/ definitions.

Having a central registry enables the validation of the [external] table.
All involved tools MUST check that the provided identifiers are well formed.
Additionally, some tools MAY check whether the identifiers in use are recognized as
canonical. More specifically:

  • Build backends, build frontends, and installers SHOULD NOT do any validation
    of identifiers being canonical by default.
  • Uploaders like twine SHOULD validate if the identifiers are canonical
    and warn or report an error to the user, with opt-out mechanisms. They
    SHOULD suggest a canonical replacement, if available.
  • Index servers like PyPI MAY perform the same validation as the uploaders and
    reject the artifact if necessary.

This registry SHOULD also centralize authoritative decisions about its
contents, such as which entry of a collection of aliases is preferred as
canonical, or which versioning scheme applies to virtual DepURLs (see Appendix
B). The corresponding answers are not given in this PEP; instead we delegate
that responsibility to the central registry maintainers.

Mappings

The mappings specify which ecosystem-specific identifiers provide the canonical
entries available in the central registry. A mapping mainly consists of a list
of dictionaries, in which each entry consists of:

  • an id field with the canonical DepURL.

  • an optional free form description text.

  • a specs field whose value MUST be one of:

    • a dictionary with three keys (build, host, run). The values
      MUST be a string or list of strings representing the ecosystem-specific package
      identifiers as needed as build-, host- and runtime dependencies (see PEP 725 for
      details on these definitions).

    • for convenience, a string or a list of strings are also accepted as a
      shorthand form. In this case, the identifier(s) will be used to populate
      the three categories mentioned in the item above.

    • an empty list, which is understood as the ecosystem not having packages to
      provide such dependency.

  • a specs_from field whose value is a DepURL from which the specs
    field will be imported. Either specs or specs_from MUST be present.

  • an optional urls field whose value MUST be a URL, a list of URLs, or a
    dictionary that maps a string to a URL. This is useful to link to external
    resources that provide more information about the mapped packages.

The mappings SHOULD also specify another section package_managers, reporting
which package managers are available in the ecosystem and how to use them. This field MUST
take a list of dictionaries, with each of them reporting the following fields:

  • name (string), unique identifier for this package manager. Usually, the executable name.
  • commands (list of dictionaries), the commands to run to install the mapped package(s) and
    check whether they are already installed.
  • specifier_syntax: instructions on how to map a subset of PEP 440 specifiers to
    the target package manager. Three levels of support are offered: name-only, exact-version-only,
    and version-range compatibility (with per-operator translations).

Each mapping MUST have a canonical URL for online retrieval. These mappings
MAY also be packaged for offline distribution in each platform. The authors
recommend placing in the standard location for data artifacts in each operating
system; e.g. $XDG_DATA_DIRS on Linux and others, ~/Library/Application Support on
macOS, and %LOCALAPPDATA% for Windows. The subdirectory identifier MUST
be external-packaging-metadata-mappings. This data directory SHOULD only
contain mapping documents named {ecosystem-identifier}.mapping.json. The central
registry and known ecosystem documents MAY also be distributed in this directory,
as registry.json and known-ecosystems.json, respectively.

Known ecosystems

The list of known ecosystems has two roles:

  1. Reporting the canonical URL for its mapping.
  2. Assigning a short identifier to each ecosystem. This is the identifier
    that MUST be used in the mapping filenames mentioned above so they can be
    found in the local filesystem.

For ecosystems corresponding to Linux distributions, the identifier MUST be the
one reported by their os-release < os-release >__
ID parameter. For other ecosystems, it MUST be decided during the submission to
the list of known ecosystems document. It MUST only use the characters allowed in
os-release’s ID field, as per this regex [a-z0-9\-_.]+.

Schema details

Three JSON Schema documents are provided to fully standardize the registries and mappings.

Central registry schema
^^^^^^^^^^^^^^^^^^^^^^^

The central registry is specified by the following
JSON schema < external-metadata-mappings/schemas/central-registry.schema.json at main · jaimergp/external-metadata-mappings · GitHub >__:

$schema


.. list-table::
    :widths: 25 75

    * - Type
      - ``string``
    * - Description
      - URL of the definition list schema in use for the document.
    * - Required
      - False

``schema_version``

.. list-table::
:widths: 25 75

* - Type
  - ``integer``
* - Required
  - False

definitions


.. list-table::
    :widths: 25 75

    * - Type
      - ``array``
    * - Description
      - List of DepURLs currently recognized.
    * - Required
      - True

Each entry in this list is defined as:

.. list-table::
    :header-rows: 1
    :widths: 20 25 40 15

    * - Field
      - Type
      - Description
      - Required
    * - ``id``
      - ``DepURLField`` (``string`` matching regex ``^dep:.+$``)
      - DepURL
      - True
    * - ``description``
      - ``string``
      - Free-form field to add some details about the package. Allows Markdown.
      - False
    * - ``provides``
      - ``DepURLField | list[DepURLField]``
      - List of identifiers this entry connects to.
        Useful to annotate aliases or virtual package implementations.
      - False
    * - ``urls``
      - ``AnyUrl | list[AnyUrl] | dict[NonEmptyString, AnyUrl]``
      - Hyperlinks to web locations that provide more information about the definition.
      - False

Known ecosystems schema
^^^^^^^^^^^^^^^^^^^^^^^

The known ecosystems list is specified by the following
`JSON Schema <https://github.com/jaimergp/external-metadata-mappings/blob/main/schemas/known-ecosystems.schema.json>`__:

``$schema``
~~~~~~~~~~~

.. list-table::
    :widths: 25 75

    * - Type
      - ``string``
    * - Description
      - URL of the mappings schema in use for the document.
    * - Required
      - False

``schema_version``

.. list-table::
:widths: 25 75

* - Type
  - ``integer``
* - Required
  - False

ecosystems


.. list-table::
    :widths: 25 75

    * - Type
      - ``dict``
    * - Description
      - Ecosystems names and their corresponding details.
    * - Required
      - True

This dictionary maps non-empty string keys referring to the ecosystem identifiers
to a sub-dictionary defined as:

.. list-table::
    :header-rows: 1
    :widths: 20 25 40 15

    * - Key
      - Value type
      - Value description
      - Required
    * - ``Literal['mapping']``
      - ``AnyURL``
      - URL to the mapping for this ecosystem
      - True

Mappings schema
^^^^^^^^^^^^^^^

The mappings are specified by the following
`JSON Schema <https://github.com/jaimergp/external-metadata-mappings/blob/main/schemas/external-mapping.schema.json>`__:

``$schema``
~~~~~~~~~~~

.. list-table::
    :widths: 25 75

    * - Type
      - ``string``
    * - Description
      - URL of the mappings schema in use for the document.
    * - Required
      - False

``schema_version``

.. list-table::
:widths: 25 75

* - Type
  - ``integer``
* - Required
  - False

name


.. list-table::
    :widths: 25 75

    * - Type
      - ``string``
    * - Description
      - Name of the schema
    * - Required
      - True

``description``

.. list-table::
:widths: 25 75

* - Type
  - ``string | None``
* - Description
  - Free-form field to add information this mapping. Allows
    Markdown.
* - Required
  - False

mappings


.. list-table::
    :widths: 25 75

    * - Type
      - ``array``
    * - Description
      - List of DepURL-to-specs mappings.
    * - Required
      - True

Each entry in this list is defined as:

.. list-table::
    :header-rows: 1
    :widths: 20 25 40 15

    * - Field
      - Type
      - Description
      - Required
    * - ``id``
      - ``DepURLField`` (``string`` matching regex ``^dep:.+$``)
      - DepURL, as provided in the central registry
      - True
    * - ``description``
      - ``string``
      - Free-form field to add some details about the package. Allows Markdown.
      - False
    * - ``urls``
      - ``AnyUrl | list[AnyUrl] | dict[NonEmptyString, AnyUrl]``
      - Hyperlinks to web locations that provide more information about the definition.
      - False
    * - ``specs``
      - ``string | list[string] | dict[Literal['build', 'host', 'run'], string | list[string]]``
      - Ecosystem-specific identifiers for this package. The full form is a dictionary
        that maps the categories ``build``, ``host`` and ``run`` to their corresponding
        package identifiers. As a shorthand, a single string or a list of strings can be
        provided, in which case will be used to populate the three categories identically.
      - Either ``specs`` or ``specs_from`` MUST be present.
    * - ``specs_from``
      - ``DepURLField`` (``string`` matching regex ``^dep:.+$``)
      - Take specs from another mapping entry.
      - Either ``specs`` or ``specs_from`` MUST be present.
    * - ``extra_metadata``
      - ``dict[NonEmptyString, Any]``
      - Free-form key-value store for arbitrary metadata.
      - False

``package_managers``

.. list-table::
:widths: 25 75

* - Type
  - ``array``
* - Description
  - List of tools that can be used to install packages in this
    ecosystem.
* - Required
  - True

Each entry in this list is defined as a dictionary with these fields:

.. list-table::
:header-rows: 1
:widths: 20 25 40 15

* - Field
  - Type
  - Description
  - Required
* - ``name``
  - ``string``
  - Short identifier for this package manager (usually the command name)
  - True
* - ``commands``
  - ``dict[Literal['install', 'query'], dict[Literal['command', 'requires_elevation', 'multiple_specifiers'], list[str] | bool | Literal['always', 'name-only', 'never']]]``
  - Commands used to install or query the given package(s). Only two keys
    are allowed: ``install`` and ``query``. Their value is a dictionary
    with:

    - a required key ``command`` that takes a list of strings
      (as expected by ``subprocess.run``).

    - an optional ``requires_elevation`` boolean (``False`` by default)
      to indicate whether the command must run with elevated permissions
      (e.g. administrator on Windows, superuser on Linux and macOS).

    - an enum ``multiple_specifiers`` that determines whether the command
      accepts multiple package specifiers at the same time, accepting one of:

        - ``always``, default in ``install``.

        - ``name-only``, the command only accepts multiple specifiers if they do
          not contain version constraints.

        -  ``never``, default in ``query``.

    Exactly one of the ``command`` items MUST include a ``{}`` placeholder,
    which will be replaced by the mapped package identifier(s). The
    ``install`` command SHOULD support the placeholder being replaced by
    multiple identifiers, ``query`` MUST only receive a single identifier
    per command.
  - True
* - ``specifier_syntax``
  - ``dict[Literal['name_only', 'exact_version', 'version_ranges'], None | list[str] | dict[Literal['and', 'equal', 'greater_than', 'greater_than_equal', 'less_than', 'less_than_equal', 'not_equal', 'syntax'], None | str | list[str]]``
  - Mapping of allowed PEP440 version specifiers to the syntax used in this
    package manager. Three top-level keys are expected and required:

    - ``name_only`` MUST take a list of strings as the syntax used for specifiers
      that do not contain any version information; it MUST include the placeholder
      ``{name}``.

    - ``exact_version`` MUST be ``None`` or a list of strings that describe
      the syntax used for specifiers that only express exact version
      constraints; in the latter case, the placeholders ``{name}``
      and ``{version}`` MUST be present in at least one of the strings
      (although not necessary the same string for both).

    - ``version_ranges`` MUST be ``None`` or a dictionary with the
      following required keys:

      - the key ``syntax`` takes a list of strings where at least one MUST
        include the ``{ranges}`` placeholder (to be replaced by the
        maybe-joined version constraints, as determined by the value of
        ``and``). They MAY also include the ``{name}`` placeholder.

      - the keys ``equal``, ``greater_than``, ``greater_than_equal``,
        ``less_than``, ``less_than_equal``, and ``not_equal`` take a string
        if the operator is supported, ``None`` otherwise. In the former case,
        the value MUST include the ``{version}`` placeholder, and MAY include
        ``{name}``.

      - the key ``{and}`` takes a string used to join multiple version
        constraints in a single token, or ``None`` if only a single
        constraint can be used per token. In the latter case, the different
        constraints will be "exploded" into several tokens using the
        ``syntax`` template.

      When ``exact_version`` or ``version_ranges`` are set to ``None``, it
      indicates that the respective types of specifiers are not supported
      by the package manager.

  - True

Examples

Registry, known ecosystems and mappings
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A simplified registry would look like this:

.. code-block:: js

{
“$schema”: “https://raw.githubusercontent.com/jaimergp/external-metadata-mappings/main/schemas/central-registry.schema.json”,
“schema_version”: 1,
“definitions”: [
{
“id”: “dep:generic/zlib”,
“description”: “A Massively Spiffy Yet Delicately Unobtrusive Compression Library”
},
{
“id”: “dep:generic/libwebp”,
“description”: “WebP codec is a library to encode and decode images in WebP format. This package contains the library that can be used in other programs to add WebP support”
},
{
“id”: “dep:generic/clang”,
“description”: “Language front-end and tooling infrastructure for languages in the C language family for the LLVM project.”
}
]
}

A minimal list of known ecosystems with a single entry would look like this:

.. code-block:: js

{
“$schema”: “https://raw.githubusercontent.com/jaimergp/external-metadata-mappings/main/schemas/known-ecosystems.schema.json”,
“schema_version”: 1,
“ecosystems”: {
“conda-forge”: {
“mapping”: “https://raw.githubusercontent.com/jaimergp/external-metadata-mappings/refs/heads/main/data/conda-forge.mapping.json”
}
}

That hypothetical conda-forge mapping (conda-forge.mapping.json), with only a couple entries
for brevity, could look like:

.. code-block:: js

{
“schema_version”: 1,
“name”: “conda-forge”,
“description”: “Mapping for the conda-forge ecosystem”,
“mappings”: [
{
“id”: “dep:generic/zlib”,
“description”: “zlib data compression library for the next generation systems. From zlib-ng/zlib-ng.”,
“specs”: “zlib-ng”, // Simplest form
“urls”: {
“feedstock”: “https://github.com/conda-forge/zlib-ng-feedstock”
}
},
{
“id”: “dep:generic/libwebp”,
“description”: “WebP image library. libwebp-base ships libraries; libwebp ships the binaries.”,
“specs”: { // expanded form with single spec per category
“build”: “libwebp”,
“host”: “libwebp-base”,
“run”: “libwebp”
},
“urls”: {
“feedstock”: “https://github.com/conda-forge/libwebp-feedstock”
}
},
{
“id”: “dep:generic/clang”,
“description”: “Development headers and libraries for Clang”,
“specs”: { // expanded form with specs list
“build”: [
“clang”,
“clangxx”
],
“host”: [
“clangdev”
],
“run”: [
“clang”,
“clangxx”,
“clang-format”,
“clang-tools”
]
},
“urls”: {
“feedstock”: “https://github.com/conda-forge/clangdev-feedstock”
}
},
],
“package_managers”: [
{
“name”: “conda”,
“commands”: {
“install”: {
“command”: [
“conda”,
“install”,
“{}”
],
“multiple_specifiers”: “always”,
“requires_elevation”: false,
},
“query”: {
“command”: [
“conda”,
“list”,
“-f”,
“{}”
],
“multiple_specifiers”: “never”,
“requires_elevation”: false,
}
},
“specifier_syntax”: {
“exact_version”: [
“{name}=={version}”
],
“name_only”: [
“{name}”
],
“version_ranges”: {
“and”: “,”,
“equal”: “={version}”,
“greater_than”: “>{version}”,
“greater_than_equal”: “>={version}”,
“less_than”: “<{version}”,
“less_than_equal”: “<={version}”,
“not_equal”: “!={version}”,
“syntax”: [
“{name}{ranges}”
]
}
}
}
]
}

The following repository provides examples of how these schemas could look like in real cases.
They are not meant to be prescriptive, but just illustrative of how to apply these schemas:

  • Central registry <``https://github.com/jaimergp/external-metadata-mappings/blob/main/data/registry.json``>__.

  • Known ecosystems <``https://github.com/jaimergp/external-metadata-mappings/blob/main/data/known-ecosystems.json``>__.

  • Mappings:

    • Arch-linux <``https://github.com/jaimergp/external-metadata-mappings/blob/main/data/arch-linux.mapping.json``>__.

    • Chocolatey <``https://github.com/jaimergp/external-metadata-mappings/blob/main/data/chocolatey.mapping.json``>__.

    • Conan <``https://github.com/jaimergp/external-metadata-mappings/blob/main/data/conan.mapping.json``>__.

    • Conda-forge <``https://github.com/jaimergp/external-metadata-mappings/blob/main/data/conda-forge.mapping.json``>__.

    • Fedora <``https://github.com/jaimergp/external-metadata-mappings/blob/main/data/fedora.mapping.json``>__.

    • Gentoo <``https://github.com/jaimergp/external-metadata-mappings/blob/main/data/gentoo.mapping.json``>__.

    • Homebrew <``https://github.com/jaimergp/external-metadata-mappings/blob/main/data/homebrew.mapping.json``>__.

    • Nix <``https://github.com/jaimergp/external-metadata-mappings/blob/main/data/nix.mapping.json``>__.

    • PyPI <``https://github.com/jaimergp/external-metadata-mappings/blob/main/data/pypi.mapping.json``>__.

    • Scoop <``https://github.com/jaimergp/external-metadata-mappings/blob/main/data/scoop.mapping.json``>__.

    • Spack <``https://github.com/jaimergp/external-metadata-mappings/blob/main/data/spack.mapping.json``>__.

    • Ubuntu <``https://github.com/jaimergp/external-metadata-mappings/blob/main/data/ubuntu.mapping.json``>__.

    • Vcpkg <``https://github.com/jaimergp/external-metadata-mappings/blob/main/data/vcpkg.mapping.json``>__.

    • Winget <``https://github.com/jaimergp/external-metadata-mappings/blob/main/data/winget.mapping.json``>__.

pyproject-external CLI
^^^^^^^^^^^^^^^^^^^^^^

The following examples illustrate how the name mapping mechanism may be used.
They use the CLI implemented as part of the pyproject-external package.

Say we have cloned the source of a Python package named my-cxx-pkg with a
single extension module, implemented in C++, linking to zlib, using pybind11,
plus meson-python as the build backend:

.. code:: toml

[build-system]
build-backend = 'mesonpy'
requires = [
  "meson-python>=0.13.1",
  "pybind11>=2.10.4",
]

[external]
build-requires = [
  "dep:virtual/compiler/cxx",
]
host-requires = [
  "dep:generic/zlib",
]

With complete name mappings for apt on Ubuntu, this may then show the
following:

.. code:: bash

# show all external dependencies as DepURLs
$ python -m pyproject_external show .
[external]
build-requires = [
    "dep:virtual/compiler/cxx",
]
host-requires = [
    "dep:generic/zlib",
]

# show all external dependencies, but mapped to the autodetected ecosystem
$ python -m pyproject_external show --output=mapped .
[external]
build_requires = [
    "g++",
    "python3",
]
host_requires = [
    "zlib1g",
    "zlib1g-dev",
]

# show how to install external dependencies
$ python -m pyproject_external show --output=command .
sudo apt install --yes g++ zlib1g zlib1g-dev python3

We have not yet run those install commands, so the external dependency may be
missing. If we get a build failure, the output may look like:

.. code::

$ pip install .
...
× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.

This package has the following external dependencies, if those are missing
on your system they are likely to be the cause of this build failure:

  dep:virtual/compiler/cxx
  dep:generic/zlib

If Pip has implemented support for querying the name mapping registry, the end
of that message could improve to:

.. code:: bash

The following external dependencies are needed to install the package
mentioned above. You may need to install them with `apt`:

  g++
  zlib1g
  zlib1g-dev

If the user wants to use conda packages and the mamba package manager to
install external dependencies, they may specify that in their
~/.config/pyproject-external/config.toml (or equivalent) file:

.. code:: toml

preferred_package_manager = "mamba"

This will then change the output of pyproject-external:

.. code:: bash

$ python -m pyproject_external show --output command .
mamba install --yes --channel=conda-forge --channel-priority=strict cxx-compiler zlib python

The pyproject-external CLI also provides a simple way to perform
[external] table validation against the central registry to check
whether the identifiers are considered canonical or not:

.. code-block:: bash

$ python -m pyproject_external show --validate grpcio-1.71.0.tar.gz
WARNING  Dep URL 'dep:virtual/compiler/cpp' is not recognized in the
central registry. Did you mean any of ['dep:virtual/compiler/c',
'dep:virtual/compiler/cxx', 'dep:virtual/compiler/cuda',
'dep:virtual/compiler/go', 'dep:virtual/compiler/c-sharp']?
[external]
build-requires = [
    "dep:virtual/compiler/c",
    "dep:virtual/compiler/cpp",
]

pyproject-external API
^^^^^^^^^^^^^^^^^^^^^^

The pyproject-external Python API also allows users to do these operations programmatically:

.. code-block:: python

>>> from pyproject_external import External
>>> external = External.from_pyproject_data(
      {
        "external": {
          "build-requires": [
            "dep:virtual/compiler/c",
            "dep:virtual/compiler/cpp",
          ]
        }
      }
    )
>>> external.validate()
Dep URL 'dep:virtual/compiler/cpp' is not recognized in the central registry. Did you
mean any of ['dep:virtual/compiler/c', 'dep:virtual/compiler/cxx',
'dep:virtual/compiler/cuda', 'dep:virtual/compiler/go', 'dep:virtual/compiler/c-sharp']?
>>> external = External.from_pyproject_data(
      {
        "external": {
          "build-requires": [
            "dep:virtual/compiler/c",
            "dep:virtual/compiler/cxx",  # fixed
          ]
        }
      }
    )
>>> external.validate()
>>> external.to_dict()
{'external': {'build_requires': ['dep:virtual/compiler/c', 'dep:virtual/compiler/cxx']}}
>>> from pyproject_external import detect_ecosystem_and_package_manager
>>> ecosystem, package_manager = detect_ecosystem_and_package_manager()
>>> ecosystem
'conda-forge'
>>> package_manager
'pixi'
>>> external.to_dict(mapped_for=ecosystem, package_manager=package_manager)
{'external': {'build_requires': ['c-compiler', 'cxx-compiler', 'python']}}
>>> external.install_command(ecosystem, package_manager=package_manager)
# {"command": ["pixi", "add", "{}"]}
['pixi', 'add', 'c-compiler', 'cxx-compiler', 'python']
>>> external.query_commands(ecosystem, package_manager=package_manager)
# {"command": ["pixi", "list", "{}"]}
[
  ['pixi', 'list', 'c-compiler'],
  ['pixi', 'list', 'cxx-compiler'],
  ['pixi', 'list', 'python'],
]

Grayskull
^^^^^^^^^

A prototype proof of concept implementation was contributed to Grayskull, a conda recipe generator for
Python packages, via conda/grayskull#518 <``https://github.com/conda/grayskull/pull/518``>__.

In order to use the name mappings for the recipe generator of our package, we
can now run Grayskull_:

.. code::

$ grayskull pypi my-cxx-pkg
#### Initializing recipe for my-cxx-pkg (pypi) ####

Recovering metadata from pypi...
Starting the download of the sdist package my-cxx-pkg
my-cxx-pkg 100% Time:  0:00:10   5.3 MiB/s|###########|
Checking for pyproject.toml
...

Build requirements:
  - python                                 # [build_platform != target_platform]
  - cross-python_{{ target_platform }}     # [build_platform != target_platform]
  - meson-python >= 0.13.1                 # [build_platform != target_platform]
  - pybind11 >= 2.10.4                     # [build_platform != target_platform]
  - ninja                                  # [build_platform != target_platform]
  - libboost-devel                         # [build_platform != target_platform]
  - {{ compiler('cxx') }}
Host requirements:
  - python
  - meson-python >=0.13.1
  - pybind11 >=2.10.4
  - ninja
  - libboost-devel
Run requirements:
  - python

#### Recipe generated on /path/to/recipe/dir for my-cxx-pkg ####

Backwards Compatibility

There is no impact on backwards compatibility.

Security Implications

This proposal does not impose any security implications on existing projects.
The proposed schemas, registries and mappings are available resources for downstream
tooling to use at their own will, in whatever way they find suitable.

We do have some recommendations for future implementors. The mapping schema
proposes fields to encode instructions for command execution
(package_managers[].commands). A tampered mapping may change these
instructions into something else. Hence, tools should not rely on internet
connectivity to fetch the mappings from their online sources. Instead:

  • they should vendor the relevant documents in the distributed packages,
  • or depend on prepackaged, offline distributions of these documents,
  • or implement best-practices for authenticity verification of the fetched documents.

The install commands have the potential to modify the system configuration of the user.
When available, tools should prefer creating ephemeral, isolated environments for the
installation of external dependencies. If the ecosystem lacks that feature natively,
other solutions like containerization may be used. At the very least, informative messaging
of the impact of the operation should be provided.

How to Teach This

There are at least four audiences that may need to get familiar with the contents of this PEP:

  1. Central registry maintainers, who are responsible for curating the list of
    well-known DepURLs and mapped ecosystems.
  2. Packaging ecosystem maintainers, who are responsible for keeping the
    mapping for their ecosystem up-to-date.
  3. Maintainers of Python projects that require external dependencies.
  4. End users of packages that have external dependency metadata.

Central DepURL registry maintainers

Central DepURL registry maintainers curate the collection of DepURLs and the
known ecosystems. These contributors need to be able to refer to clearly
defined rules for when a new DepURL can be defined. It is undesirable to be
loose with canonical DepURL definitions, because each definition added increases
maintenance effort in the mappings in the target ecosystems.

The central registry maintainers should agree on the ground rules and write them
down as part of the repository documentation, perhaps supported by additional
affordances like issue and pull request templates, or linting tools.

Package ecosystem maintainers usage

Missing mapping entries will result in the absence tailored error messages and
other UX affordances for end users of the impacted ecosystems. It is thus
recommended that each package ecosystem keeps their mappings up-to-date with
the central registry. The key to this will be automation, like linting scripts
(see example at external-metadata-mappings <``https://github.com/jaimergp/external-metadata-mappings/blob/main/scripts/lint-mapping-entries.py``>__),
or periodic notifications via issues or draft submissions.

Establishing the initial mapping is likely to involve a lot of work, but ideally the maintenance on an ongoing basis effort should require smaller effort.

As best practices are discovered and agreed on, they should get documented
in the central registry repository as learning materials for the mapping
maintainers.

Maintainers of Python projects

A package maintainer’s responsibility is to decide the DepURL that best
represents the external dependency that their package needs. This is covered
in :pep:725; the interactive mappings browser demo located at
external-metadata-mappings.streamlit.app <``https://external-metadata-mappings.streamlit.app/``>__
may come handy. The central registry documentation may include examples and
frequently asked questions to guide newcomers with their decisions.

If no suitable DepURL is available for a given dependency, maintainers may
consider submitting a request in the central registry. Instructions on how to do
this should be provided as part of the central registry documentation.

End user package consumers

There will be no change in the user experience by default. This is particularly
true if the user only relies on wheels, since the only impact will be driven by
external runtime dependencies (expected to be rare), and even in those cases
they need to opt-in by installing a compatible tool.

Users that do opt-in may find missing entries in for their target ecosystems, for
which they should obtain informative error messages that point to the relevant
documentation sections. This will allow them to get acquainted with the nature
of the issue and its potential solutions.

We hope that this results in a subset of them reporting the missing entries,
submitting a fix to the affected mapping or, if totally absent, even deciding
to maintain a new one on their own. To that end, they should get familiar with
the responsibilties of mapping maintainers (discussed above).

Reference Implementation

A reference implementation should include three components:

  1. A central registry that captures at a minimum a DepURL and its description. This registry MUST
    NOT contain specifics of package ecosystem mappings.
  2. A standard specification for a collection of mappings. JSON Schema is widely used for schema
    in many text editors, and would be a natural choice for expression of the standard specification.
  3. An implementation of (2), providing mappings from the contents of the central
    registry to the ecosystem-specific package names.

For (1), the JSON Schema is defined at central-registry.schema.json < external-metadata-mappings/schemas/central-registry.schema.json at main · jaimergp/external-metadata-mappings · GitHub >.
An example registry can be found at registry.json <``https://github.com/jaimergp/external-metadata-mappings/blob/main/data/registry.json``>
.
For (2), the JSON Schema is defined at external-mapping.schema.json <``https://github.com/jaimergp/external-metadata-mappings/blob/main/schemas/external-mapping.schema.json``>.
A collection of example mappings for a sample of packages can be found at external-metadata-mappings <``https://github.com/jaimergp/external-metadata-mappings/tree/main/data``>
.
For (3), the JSON Schema is defined at known-ecosystems.schema.json <``https://github.com/jaimergp/external-metadata-mappings/blob/main/schemas/known-ecosystems.schema.json``>.
An example list can be found at known-ecosystems.json <``https://github.com/jaimergp/external-metadata-mappings/blob/main/data/known-ecosystems.json``>
.
The JSON Schemas are created with these Pydantic models <``https://github.com/jaimergp/external-metadata-mappings/blob/main/schemas/schema.py``>__.

The reference CLI and Python API to consume the different JSON documents and [external] tables
can be found in pyproject-external <``https://github.com/jaimergp/pyproject-external``>__.

Rejected Ideas

Centralized mappings governed by the same body

While a central authority for the registry is useful, the maintenance burden
of handling the mappings for multiple ecosystems is unfeasible at the scale of PyPI.
Hence, we propose that the central authority only governs the central registry and
the list of known ecosystems, while the maintenance of the mappings themselves is handled
by the target ecosystems.

Allowing ecosystem-specific variants of packages

Some ecosystems have their own variants of known packages; e.g. Debian’s
libsymspg2-dev. While an identifier such as dep:debian/libsymspg2-dev
is syntactically valid, the central registry should not recognize it as a
well-known identifier, preferring its generic counterpart instead. Users
may still choose to use it, but tools may warn about it and suggest using the
generic one. This is meant to encourage ecosystem-agnostic metadata whenever
possible to facilitate adoption across platforms and operating systems.

Adding more package metadata to the central registry

A central registry should only contain a list of DepURLs and a
minimal set of metadata fields to facilitate its identification (a free-form
text description, and one or more URLs to relevant locations).

We have chosen to leave additional details out of the central registry, and instead
suggest external contributors to maintain their own mappings where they can
annotate the identifiers with extra metadata via the free-form extra_metadata field.

The reasons include:

  • The existing fields should be sufficient to identify the project home,
    where that extra metadata can be obtained (e.g. the repository at the URL will likely
    include details about authorship and licensing).
  • These details can also be obtained from the actual target ecosystems. In some
    cases this might even be preferable; e.g., for licenses, where downstream packaging
    can actually affect it by unvendoring dependencies or adjusting optional bits.
  • Those details may change over the lifetime of the project, and keeping them
    up-to-date would increase the maintenance burden on the governance body.
  • Centralizing additional metadata would hence introduce ambiguities and
    discrepancies across target ecosystems, where different versions may be
    available or required.

Mapping PyPI projects to repackaged counterparts in target ecosystems

It is common that other ecosystems redistribute Python projects with their own
packaging system. While this is required for packages with compiled extensions, it
is theoretically unnecessary for pure Python wheels; the only need for this seems to
be metadata translation. See Wanting a singular packaging tool/vision #68 <``https://discuss.python.org/t/wanting-a-singular-packaging-tool-vision/21141/68``>,
Wanting a singular packaging tool/vision #103 <``https://discuss.python.org/t/wanting-a-singular-packaging-tool-vision/21141/103``>
,
and spack/spack#28282 <``https://github.com/spack/spack/issues/28282#issuecomment-1562178367``>__
for examples of discussions in this direction.

The proposals in this PEP do not consider PyPI → ecosystem mappings, but
the same schemas can be repurposed to that end. After all, it is trivial to build a PURL or
DepURL from a PyPI name (e.g. numpy becomes pkg:pypi/numpy). A hypothetical
mapping maintainer could annotate their repackaging efforts with the source PURL identifier,
and then use that metadata to generate compatible mappings, such as:

.. code:: json

{
  "$schema": "https://raw.githubusercontent.com/jaimergp/external-metadata-mappings/main/schemas/external-mapping.schema.json",
  "schema_version": 1,
  "name": "PyPI packages in Ubuntu 24.04",
  "description": "PyPI mapping for the Ubuntu 24.04 LTS (Noble) distro",
  "mappings": [
    {
      "id": "dep:pypi/numpy",
      "description": "The fundamental package for scientific computing with Python",
      "specs": ["python3-numpy"],
      "urls": {
        "home": "https://numpy.org/"
      }
    }
  ]
}

Such a mapping would allow downstream redistribution efforts to focus on the
compiled packages and instead delegate pure wheels to Python packaging
solutions directly.

Strict validation of identifiers

The central registry provides a list of canonical identifiers, which may tempt
implementors into ensuring that all supplied identifiers are indeed canonical. We
have decided to only recommend this practice for some tool categories, but in no
case require such checks.

It is expected that as the [external] metadata tables are adopted by the
packaging community, the canonical identifier list grows to accommodate the
requirements found in different projects. For example, a new C++ library or a
new language compiler are introduced.

If validation is made too strict and rejects unknown identifiers, this would
introduce unnecessary friction in the external metadata adoption, and require
human interaction to review and accept the newly requested identifiers in
a time-critical manner, potentially blocking publication of the package
that needs a new identifier added to the central registry.

We suggest simply checking that the provided identifiers are well-formed. Future
work may choose to also enforce that the identifiers are recognized as canonical,
once the central registry has matured with significant adoption.

Open Issues

None at this time.

References

Appendix A: Operational suggestions

In contrast with the ecosystem mappings, the central registry and the list of known
ecosystems need to be maintained by a central authority. The authors propose to:

  • Host the external-metadata-mappings and pyproject-external repositories under the PyPA_
    GitHub organization (or equivalent as per :pep:772).
  • Create a maintainers team for these two repositories, seeded with the authors of this PEP and
    regulated as per :pep:772.

Appendix B: Virtual versioning proposal

While virtual dependencies can be versioned with the same syntax as non-virtual
dependencies, its meaning can be ambiguous (e.g. there can be multiple
implementations, and virtual interfaces may not be unambiguously versioned).
Below we provide some suggestions for the central registry maintainers to
consider when standardizing such meaning:

  • OpenMP: has regular MAJOR.MINOR versions of its standard, so would look
    like >=4.5.
  • BLAS/LAPACK: should use the versioning used by Reference LAPACK_, which
    defines what the standard APIs are. Uses MAJOR.MINOR.MICRO, so would look
    like >=3.10.0.
  • Compilers: these implement language standards. For C, C++ and Fortran these
    are versioned by year. In order for versions to sort correctly, we recommend
    using the full year (four digits). So “at least C99” would be >=1999, and
    selecting C++14 or Fortran 77 would be ==2014 or ==1977 respectively.
    Other languages may use different versioning schemes. These should be
    described somewhere before they are used in pyproject.toml.

Copyright

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.

.. _PyPI: https://pypi.org
.. _core metadata: Core metadata specifications - Python Packaging User Guide
.. _setuptools: https://setuptools.readthedocs.io/
.. _setuptools metadata: Building and Distributing Packages with Setuptools - setuptools 80.9.0 documentation
.. _SPDX: https://spdx.dev/
.. _PURL: GitHub - package-url/purl-spec: A minimal specification for purl aka. a package "mostly universal" URL, join the discussion at https://gitter.im/package-url/Lobby
.. _vers: https://github.com/package-url/purl-spec/blob/version-range-spec/VERSION-RANGE-SPEC.rst
.. _vers implementation for PURL: https://github.com/package-url/purl-spec/pull/139
.. _pyp2rpm: GitHub - fedora-python/pyp2rpm: Tool to convert a package from PyPI to RPM SPECFILE or to generate SRPM.
.. _Grayskull: GitHub - conda/grayskull: Grayskull - Recipe generator for Conda
.. _dh_python: Debian Python Policy 0.12.0.0 documentation
.. _Repology: https://repology.org/
.. _Dependabot: Dependabot · GitHub
.. _libraries.io: https://libraries.io/
.. _crossenv: GitHub - robotpy/crossenv: Cross-compiling virtualenv for Python
.. _Python Packaging User Guide: https://packaging.python.org
.. _pyOpenSci Python Open Source Package Development Guide: pyOpenSci Python Package Guide — Python Packaging Guide
.. _Scikit-HEP packaging guide: Redirecting…
.. _PyPA: Python Packaging Authority · GitHub
.. _Reference LAPACK: GitHub - Reference-LAPACK/lapack: LAPACK development repository

13 Likes

Cross referencing the thread for PEP 725 (which is required by this PEP):

https://discuss.python.org/t/103890/

1 Like

Did you discuss providing a standardized hook that can be used to modify the name mapping?

Background: At Sagemath we use now for some time already the external dependencies to automatically create Conda env lock files (which are then used for CI and dev). This works really well in general, but now and then we have the need to overwrite/customize the result that we get from grayskull. In such a usage scenario, you really need to make sure that whatever you are producing definitely works. These customizations are mostly of the form:

  • The external packages are not yet on grayskull or have a wrong mapping (but we don’t want to necessarily wait until this is fixed in the registry)

  • There are bugs in a certain version of a dependency, which we hence would like to exclude (but only for Conda, not for the other systems)

  • Make a few dependencies optional due to known issues (but again, only for Conda and mostly only on Windows

    If you are interested in the code, see sage/tools/update-conda.py at develop · sagemath/sage · GitHub )

Such fine grained control is hard to gain with static metadata, but on the other hand seems to be important if you really want to enable use cases where the external metadata is directly used on CI to setup the correct build & test env.

Not sure if this remark concerns more PEP 804 or 725.

2 Likes
  • The mappings themselves are maintained by the target packaging ecosystems.

In the event that a given distribution doesn’t want to maintain their mapping, could we have a PyPA/community owned mapping instead?

2 Likes

Incidentally, have you been able to get feedback from distribution maintainers as to whether they would contribute mappings?

1 Like

A hook hasn’t come up yet. What we have discussed is that CLI should allow users the “default” mapping in some way (configuration file, CLI flag, environment variable), which would work as a complete replacement. What you are describing is a mechanism to perform an operation like mapping = deep_update(default_mapping, user_override), correct? I feel that this is up to CLI tools to implement, but maybe we could leave a hint that such feature may be desirable?

I guess that could happen and we can leave language there to support it, but I don’t want to dictate that in the PEP so it is an obligation for the central authority. I could imagine a sentence like “The mappings themselves are maintained externally by any given community, preferably with involvement by the target packaging ecosystem.”

I know first hand there’s interest in the conda community, and given the existence of mapping tools or mechanisms in several Linux distributions, I’d imagine some of them would be interested too.

1 Like

That’s an important question indeed. To expand on Jaime’s answer:

  • @msarahan was interested enough to add preliminary support in Grayskull for PEP 725 and the mappings last year already, and is one of the co-authors of this PEP
  • @mgorny is the primary maintainer of Python packages in Gentoo, and is another co-author of this PEP
  • Red Hat has been interested enough in this topic (since it solves a significant pain point for them) to sponsor the work done over the past 6 months - mostly by @jaimergp, and also review & writing by @mgorny and myself - to get this PEP to its current state as well as the needed updates to PEP 725. Which has been amazingly helpful and I’d like to thank them for that here; we wouldn’t have been able to do the ton of prototyping and testing needed to get to this point without that sponsorship, since it was quite a heavy lift.
    • I’m fairly sure that that means there will be RHEL mappings, and probably also related distributions. @tiran may be able to say more here.

There was interest in PEP 725 from quite a few other distro maintainers. We didn’t explicitly circle back with them yet about whether they’d adopt this particular design, but I was kinda expecting that. I’ll take the liberty of listing a couple here[1] so they can hopefully take a look and share to what extent it fits their needs:

I think there’s enough signal that this proposal fills a significant need. I personally expect that most distros who don’t have maintainers that actively participate in Python packaging will write mappings as well, but probably only after it starts saving them work - meaning after a lot of Python packages have the PEP 725 metadata, since as always it’s chicken-and-egg: it starts paying off only after the initial part of the adoption curve has been climbed.


  1. There may have been others that I missed or from whom I didn’t realize they were distro packagers ↩︎

3 Likes

The obvious missing distro here is Debian/Ubuntu. Given that things like Github’s CI workers tend to use Ubuntu, this feels like it’s a fairly key target. Have you engaged at all with the Ubuntu folks on this (pr PEP 725)?

1 Like

I would rather avoid putting additional obligations on PyPA here, but I don’t see a reason not to have community-maintained mappings. I think it makes sense to let distributions have the final say on where and how their mappings are maintained, but I don’t see why distributions couldn’t delegate that to the community.

I suppose the main question would be whether a community should be able to claim a mapping if distribution does not reply to requests for confirmation, or is even outright opposed to having a community-maintained mapping.

1 Like

I believe a smaller scope for this PEP and PEP 725 is focusing on the packages that have the python metadata, and find out how to integrate with the package managers, especially working with/around the sudo permissions.

A simple usecase is python3-foo has a compiled package on your system and you are trying to work with a package bar that requires it. Currently the way to work with it is to create venv with system-packages and have that pre-installed.

What if instead we request distros to package a wheel or a redirect and provide their own index which could be merged with the PyPI one (with some higher priority). This way we can avoid the special venv handling, but still would require the relevant packages to be pre-installed. In a venv scenario this could either copy the files as if provided by a wheel or point it to the read-only files.

A further integration would be for the front-end to do the distro package installations and ask for limited sudo access if necessary (there are better ways that don’t require sudo access also). This would resolve the last piece of requiring the dependencies to be pre-installed. On the distro side, they would just have to translate the package name to their equivalent helpers like py3dist(foo) to check for their existence.

As for the packages that do not have a PyPI counterpart or python metadata in general that these PEPs are focusing on, this can be addressed in a similar way by providing the bare minimum parts that the python index or rather the distro integration would require to work around it. This would be much more manageable for distro maintainers and would allow to dictate their preference, e.g. using clang instead of gcc for the default C compiler.

There would still be the issue of package splitting on the distros, but that is an issue for another day.

I can’t see why the central index of mappings could not list several alternative mappings for the same ecosystem. I wonder whether it makes sense to recognize one of them as the “official”, but ultimately it could be offered as an overridable choice client-side.

A hypothetical configuration file could be:

ecosystem_index = "./my/local/ecosystems.json"  # defaults to the central authority list
ecosystem = "debian-custom"

And ecosystems.json would define debian-custom pointing to the custom mapping.

At the same time, the user could choose to host debian-custom online, and the central authority could choose to accept debian-custom as a valid entry in the registry of ecosystems, but I’d rather not specify that now.

Not much; I don’t know many that are actively involved in Python packaging to be honest. I did email Sandro Tosi, the Debian maintainer who most often reports issues against packages I maintain, a couple of years ago but IIRC didn’t receive an answer back then. The closest person I can think of is @barry, who was an Ubuntu dev in a previous life and really wanted to see PEP 725 finished.

If anyone knows a good person to ask, it’d be great to hear that or to ping them directly to please have a look.

1 Like

I’m not sure I understand this point, since a venv is Python-specific and cannot contain system packages. Do you mean some kind of distro-specific environment rather than a regular Python venv?

You say “smaller scope”, but this is at least two orders of magnitude more work for distro maintainers than maintaining a JSON file with some mappings. This is a related but different problem, I think it’s this: https://peps.python.org/pep-0804/#mapping-pypi-projects-to-repackaged-counterparts-in-target-ecosystems?

To be clear, I think it’s a good idea in principle, but since it’s so much more work I’d consider doing that in a follow-up if there is enough interest from distro maintainers.

This is definitely possible with this mappings system. The maintainers can decide to map dep:virtual/compiler/c to their clang system package, that’s totally up to them.

I mean python3 -m venv –system-site-packages

Well yes-and-no. The issues there will have to be addressed in one form or another, the sudo and package installation handling of the frontend, etc. What the distro maintainers would have to implement is:

  • Index definition and the methods to translate the normalized project names given to pip to the backend
  • Relevant python metadata files. For python-built projects it is already there. For non-python ones, some helper functions would have to be added to make the maintenance on that side easier

Apart from the first point, the distro maintainer work is within the scope of the usual maintainers duty.

The majority of the work is on the front-end side, e.g.:

  • allow defining distro index from a config file
  • provide ways to interact with the index either as an url or python hooks (preferably latter)

Way previous life. But I still have contacts in the Debian/Ubuntu world so I’ll ask around.

3 Likes

Yes, my idea was a mechanism to partially overwrite/fine-tune a given resolution. What I vaguely had in mind was a hook

(depurl, central registry resolution, target system) → custom user resolution (if needed)

But I guess the details don’t matter much at this stage.

At least from my experience I would expect you need such a hook as soon as your package has sufficiently complex dependencies and you would like to have a reproducible env that works “all” the time (eg to install the necessary system deps in cibuildwheels).

I think it would also provide a quite flexible solution to some of the issues discussed in the other PEP thread, like PEP 725: Specifying external dependencies in pyproject.toml (round 2) - #35 by ncoghlan .

2 Likes

Yes, from the feedback we are gathering I can see there’s an ask for two levels of overrides:

  • Package maintainers wishing to provide their known mappings directly without having to use the DepURL -> mappings -> package manager syntax. These overrides are specified directly in pyproject.toml with special syntax or in optional dependency tables.
  • End users or package builders, who may need to override their mappings once the wheel or sdist (respectively) has been distributed. These overrides are specified with additional files passed to their tool of choice via a custom flag.

The former case can also be solved with the latter approach if the package maintainer distributes their own mapping overrides separate from the package metadata. That would prevent the pyproject.toml-bundled mapping metadata from getting stale or outdated if something changes in the target distro(s), and could be patched without having to release new versions of the package.

3 Likes

Hi gang! Interesting idea, it’s definitely a lofty goal to have “one specifier to rule them all” kind of thing, and I applaud this kind of initiative.

Considering that PEP 725 creates a new standard, and the authors propose the virtual addition upstream to the spec, would that inclusion remove the need to maintain a separate DepURL specification?

If so, would it be an even better ecosystem-wide initiative to combine efforts into making the registry mapping part of the work for purldb.readthedocs.io/ instead of creating another bespoke registry, or asking packaging ecosystems to adopt a registry?

There’s some efforts already in play live on https://public.purldb.io/ - it doesn’t appear to be fully launched yet, but maybe there’s already a bunch of work done there on scanning and matching package-names-to-PURLs that might accelerate adoption of PEP 725 if it satisfies the need? See also More decentralized, distributed PURL metadata collection · Issue #727 · aboutcode-org/purldb · GitHub

1 Like

We really wanted to use the original pkg: implementation, and indeed we did open a virtual proposal for pkg: URLs at Support for "virtual packages" / dependencies on interfaces? · Issue #222 · package-url/purl-spec · GitHub and Add `virtual` type by jaimergp · Pull Request #450 · package-url/purl-spec · GitHub, unfortunately with limited feedback. However, even if this went through, it wouldn’t be until the next revision of the standard (according to the milestones), and we would still miss an ergonomic way to express version constraints (the PURL solution is to append a ?vers=... qualifier, %-encoded). Given that context, we compromised on providing an ergonomic identifier that is still compatible with PURL; you can consider DepURL as syntactic sugar sprinkled on top.

From what I can see in their API, their database entries (e.g. this query for apache/arrow) are more based on concrete artifacts identified by PURLs (like those found in SBOMs), and not so much on the “abstract requirements” side of the standard. I’m not sure if the maintainers would be interested in extending the scope of that service, since it seems to be targeted at enumerating PURLs found in the wild. Additionally, our mappings do not only map DepURLs to ecosystem-specific names (not PURLs!), but also detail the package manager syntax used to generate installation commands.

I would have loved to find an existing platform that covered all of our needs, but we didn’t find any. That said, thank you for your question, because I believe that covering this question in Rejected Ideas is an excellent addition to the PEP! :folded_hands:

1 Like

Hello @tobiasdiez, we have been thinking about this ask and have come up with this strategy:

  • Mapping overrides are a UX detail that should not be standardized by the PEP (i.e., part of the Specification), but we would like to provide recommendations on how to solve the problem.
  • Install tools should provide an override flag that allows users to pass their custom mappings, with two modes: replace and extend.
  • In replace mode, matching DepURLs in the canonical mapping are removed and custom ones take their spot. In extend mode, the custom entries are added at the end so they act as a fallback for existing entries.
  • We will implement this in pyproject-external as an example, with demos in external-deps-build.

So, for example, if we had a package with this external table:

[external]
build-requires = [
  "dep:virtual/compiler/c",
]
host-requires = [
  "dep:generic/libffi",
]

And my target ecosystem (let’s say conda-forge) maps dep:virtual/compiler/c to gcc but I want to use clang, I could pass the following JSON:

{
  "$schema": "https://raw.githubusercontent.com/jaimergp/external-metadata-mappings/main/schemas/external-mapping.schema.json",
  "schema_version": 1,
  "name": "conda-forge override",
  "description": "Mapping override for the conda-forge ecosystem",
  "mappings": [
    {
      "id": "dep:virtual/compiler/c",
      "description": "Clang override",
      "specs": "clang"
    }
  ]
}

To pyproject-external like this (CLI flags not determined yet):

$ python -m pyproject_external show sdist/cryptography-46.0.2.tar.gz --output install-command --replace-mapping-entries-with=conda-forge.override.json

Let me know what you think!

2 Likes