PEP 725: Specifying external dependencies in pyproject.toml

Hi all, I’d like to share PEP 725 about specifying PyPI-external dependencies in pyproject.toml with you - co-authored with @pradyunsg.

This PEP proposes a new [external] table with fields for build and runtime dependencies on tools/packages that are external to PyPI (“system dependencies” or “native dependencies”). It does so in a way that mirrors the keys and types of dependencies that can currently be specified in pyproject.toml for PyPI packages as much as possible - with the addition of adding cross-compilation support immediately, because that is much easier than doing it later on. Some of the key needs and ideas captured in this PEP have been discussed on this forum before, e.g.:

Here are the PR to the peps repo and a rendered version.

Full PEP text: PEP: 725 Title: Specifying external dependencies in pyproject.toml Author: Pradyun Gedam , Ralf Gommers Discussions-To: Status: Draft Type: Standards Track Topic: Packaging Content-Type: text/x-rst Created: 17-Aug-2023 Post-History: 17-Aug-2023,

Abstract

This PEP specifies how to write a project’s external, or non-PyPI, build and
runtime dependencies in a pyproject.toml file for packaging-related tools
to consume.

Motivation

Python packages may have dependencies on build tools, libraries, command-line
tools, or other software that is not present on PyPI. Currently there is no way
to express those dependencies in standardized metadata
[#singular-vision-native-deps], [#pypacking-native-deps]. Key motivators for
this PEP are to:

  • Enable tools to automatically map external dependencies to packages in other
    packaging repositories,
  • Make it possible to include needed dependencies in error messages emitting by
    Python package installers and build frontends,
  • Provide a canonical place for package authors to record this dependency
    information.

Packaging ecosystems like Linux distros, Conda, Homebrew, Spack, and Nix need
full sets of dependencies for Python packages, and have tools like pyp2rpm_
(Fedora), Grayskull_ (Conda), and dh_python_ (Debian) which attempt to
automatically generate dependency metadata from the metadata in
upstream Python packages. External dependencies are currently handled manually,
because there is no metadata for this in pyproject.toml or any other
standard location. Enabling automating this conversion is a key benefit of
this PEP, making packaging Python easier and more reliable. In addition, the
authors envision other types of tools making use of this information, e.g.,
dependency analysis tools like Repology_, Dependabot_ and libraries.io_.
Software bill of materials (SBOM) generation tools may also be able to use this
information, e.g. for flagging that external dependencies listed in
pyproject.toml but not contained in wheel metadata are likely vendored
within the wheel.

Packages with external dependencies are typically hard to build from source,
and error messages from build failures tend to be hard to decipher for end
users. Missing external dependencies on the end user’s system are the most
likely cause of build failures. If installers can show the required external
dependencies as part of their error message, this may save users a lot of time.

At the moment, information on external dependencies is only captured in
installation documentation of individual packages. It is hard to maintain for
package authors and tends to go out of date. It’s also hard for users and
distro packagers to find it. Having a canonical place to record this dependency
information will improve this situation.

This PEP is not trying to specify how the external dependencies should be used,
nor a mechanism to implement a name mapping from names of individual packages
that are canonical for Python projects published on PyPI to those of other
packaging ecosystems. Those topics should be addressed in separate PEPs.

Rationale

Types of external dependencies

Multiple types of external dependencies can be distinguished:

  • Concrete packages that can be identified by name and have a canonical
    location in another language-specific package repository. E.g., Rust
    packages on crates.io <https://crates.io/>, R packages on
    CRAN <https://cran.r-project.org/>
    , JavaScript packages on the
    npm registry <https://www.npmjs.com/>__.
  • Concrete packages that can be identified by name but do not have a clear
    canonical location. This is typically the case for libraries and tools
    written in C, C++, Fortran, CUDA and other low-level languages. E.g.,
    Boost, OpenSSL, Protobuf, Intel MKL, GCC.
  • “Virtual” packages, which are names for concepts, types of tools or
    interfaces. These typically have multiple implementations, which are
    concrete packages. E.g., a C++ compiler, BLAS, LAPACK, OpenMP, MPI.

Concrete packages are straightforward to understand, and are a concept present
in virtually every package management system. Virtual packages are a concept
also present in a number of packaging systems – but not always, and the
details of their implementation varies.

Cross compilation

Cross compilation is not yet (as of August 2023) well-supported by stdlib
modules and pyproject.toml metadata. It is however important when
translating external dependencies to those of other packaging systems (with
tools like pyp2rpm). Introducing support for cross compilation immediately
in this PEP is much easier than extending [external] in the future, hence
the authors choose to include this now.

Terminology
‘’‘’‘’‘’‘’’

This PEP uses the following terminology:

  • build machine: the machine on which the package build process is being
    executed
  • host machine: the machine on which the produced artifact will be installed
    and run
  • build dependency: dependency for building the package that needs to be
    present at build time and itself was built for the build machine’s OS and
    architecture
  • host dependency: dependency for building the package that needs to be
    present at build time and itself was built for the host machine’s OS and
    architecture

Note that this terminology is not consistent across build and packaging tools,
so care must be taken when comparing build/host dependencies in
pyproject.toml to dependencies from other package managers.

Note that “target machine” or “target dependency” is not used in this PEP. That
is typically only relevant for cross-compiling compilers or other such advanced
scenarios [#gcc-cross-terminology], [#meson-cross] - this is out of scope for
this PEP.

Finally, note that while “dependency” is the term most widely used for packages
needed at build time, the existing key in pyproject.toml for PyPI
build-time dependencies is build-requires. Hence this PEP uses the keys
build-requires and host-requires under [external] for consistency.

Build and host dependencies
‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’’

Clear separation of metadata associated with the definition of build and target
platforms, rather than assuming that build and target platform will always be
the same, is important [#pypackaging-native-cross]_.

Build dependencies are typically run during the build process - they may be
compilers, code generators, or other such tools. In case the use of a build
dependency implies a runtime dependency, that runtime dependency does not have
to be declared explicitly. For example, when compiling Fortran code with
gfortran into a Python extension module, the package likely incurs a
dependency on the libgfortran runtime library. The rationale for not
explicitly listing such runtime dependencies is two-fold: (1) it may depend on
compiler/linker flags or details of the build environment whether the
dependency is present, and (2) these runtime dependencies can be detected and
handled automatically by tools like auditwheel.

Host dependencies are typically not run during the build process, but only used
for linking against. This is not a rule though – it may be possible or
necessary to run a host dependency under an emulator, or through a custom tool
like crossenv_. When host dependencies imply a runtime dependency, that runtime
dependency also does not have to be declared, just like for build dependencies.

When host dependencies are declared and a tool is not cross-compilation aware
and has to do something with external dependencies, the tool MAY merge the
host-requires list into build-requires. This may for example happen if
an installer like pip starts reporting external dependencies as a likely
cause of a build failure when a package fails to build from an sdist.

Specifying external dependencies

Concrete package specification through PURL
‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’’

The two types of concrete packages are supported by PURL_ (Package URL), which
implements a scheme for identifying packages that is meant to be portable
across packaging ecosystems. Its design is::

scheme:type/namespace/name@version?qualifiers#subpath 

The scheme component is a fixed string, pkg, and of the other
components only type and name are required. As an example, a package
URL for the requests package on PyPI would be::

pkg:pypi/requests

Adopting PURL to specify external dependencies in pyproject.toml solves a
number of problems at once - and there are already implementations of the
specification in Python and multiple languages. PURL is also already supported
by dependency-related tooling like SPDX (see
External Repository Identifiers in the SPDX 2.3 spec <https://spdx.github.io/spdx-spec/v2.3/external-repository-identifiers/#f35-purl>),
the Open Source Vulnerability format <https://ossf.github.io/osv-schema/#affectedpackage-field>
,
and the Sonatype OSS Index <https://ossindex.sonatype.org/doc/coordinates>__;
not having to wait years before support in such tooling arrives is valuable.

For concrete packages without a canonical package manager to refer to, either
pkg:generic/pkg-name can be used, or a direct reference to the VCS system
that the package is maintained in (e.g.,
pkg:github/user-or-org-name/pkg-name). Which of these is more appropriate
is situation-dependent. This PEP recommends using pkg:generic when the
package name is unambiguous and well-known (e.g., pkg:generic/git or
pkg:generic/openblas), and using the VCS as the PURL type otherwise.

Virtual package specification
‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’’

There is no ready-made support for virtual packages in PURL or another
standard. There are a relatively limited number of such dependencies though,
and adoption a scheme similar to PURL but with the virtual: rather than
pkg: scheme seems like it will be understandable and map well to Linux
distros with virtual packages and the likes of Conda and Spack.

The two known virtual package types are compiler and interface.

Versioning
‘’‘’‘’‘’‘’

Support in PURL for version expressions and ranges beyond a fixed version is
still pending, see the Open Issues section.

Dependency specifiers
‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’’

Regular Python dependency specifiers (as originally defined in :pep:508) may
be used behind PURLs. PURL qualifiers, which use ? followed by a package
type-specific dependency specifier component, must not be used. The reason for
this is pragmatic: dependency specifiers are already used for other metadata in
pyproject.toml, any tooling that is used with pyproject.toml is likely
to already have a robust implementation to parse it. And we do not expect to
need the extra possibilities that PURL qualifiers provide (e.g. to specify a
Conan or Conda channel, or a RubyGems platform).

Usage of core metadata fields

The core metadata_ specification contains one relevant field, namely
Requires-External. This has no well-defined semantics in core metadata 2.1;
this PEP chooses to reuse the field for external runtime dependencies. The core
metadata specification does not contain fields for any metadata in
pyproject.toml’s [build-system] table. Therefore the build-requires
and host-requires content also does not need to be reflected in core
metadata fields. The optional-dependencies content from [external]
would need to either reuse Provides-Extra or require a new
Provides-External-Extra field. Neither seems desirable.

Differences between sdist and wheel metadata
‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’

A wheel may vendor its external dependencies. This happens in particular when
distributing wheels on PyPI or other Python package indexes - and tools like
auditwheel_, delvewheel_ and delocate_ automate this process. As a result, a
Requires-External entry in an sdist may disappear from a wheel built from
that sdist. It is also possible that a Requires-External entry remains in a
wheel, either unchanged or with narrower constraints. auditwheel does not
vendor certain allow-listed dependencies, such as OpenGL, by default. In
addition, auditwheel and delvewheel allow a user to manually exclude
dependencies via a --exclude or --no-dll command-line flag. This is
used to avoid vendoring large shared libraries, for example those from CUDA.

Requires-External entries generated from external dependencies in
pyproject.toml in a wheel are therefore allowed to be narrower than those
for the corresponding sdist. They must not be wider, i.e. constraints must not
allow a version of a dependency for a wheel that isn’t allowed for an sdist,
nor contain new dependencies that are not listed in the sdist’s metadata at
all.

Specification

If metadata is improperly specified then tools MUST raise an error to notify
the user about their mistake.

Details

Note that pyproject.toml content is in the same format as in :pep:621.

Table name
‘’‘’‘’‘’‘’

Tools MUST specify fields defined by this PEP in a table named [external].
No tools may add fields to this table which are not defined by this PEP or
subsequent PEPs. The lack of an [external] table means the package either
does not have any external dependencies, or the ones it does have are assumed
to be present on the system already.

build-requires/optional-build-requires
‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’

  • Format: Array of PURL_ strings (build-requires) and a table
    with values of arrays of PURL_ strings (optional-build-requires)
  • Core metadata_: N/A

The (optional) external build requirements needed to build the project.

For build-requires, it is a key whose value is an array of strings. Each
string represents a build requirement of the project and MUST be formatted as
either a valid PURL_ string or a virtual: string.

For optional-build-requires, it is a table where each key specifies an
extra set of build requirements and whose value is an array of strings. The
strings of the arrays MUST be valid PURL_ strings.

host-requires/optional-host-requires
‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’

  • Format: Array of PURL_ strings (host-requires) and a table
    with values of arrays of PURL_ strings (optional-host-requires)
  • Core metadata_: N/A

The (optional) external host requirements needed to build the project.

For host-requires, it is a key whose value is an array of strings. Each
string represents a host requirement of the project and MUST be formatted as
either a valid PURL_ string or a virtual: string.

For optional-host-requires, it is a table where each key specifies an
extra set of host requirements and whose value is an array of strings. The
strings of the arrays MUST be valid PURL_ strings.

dependencies/optional-dependencies
‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’

  • Format: Array of PURL_ strings (dependencies) and a table
    with values of arrays of PURL_ strings (optional-dependencies)
  • Core metadata_: Requires-External, N/A

The (optional) dependencies of the project.

For dependencies, it is a key whose value is an array of strings. Each
string represents a dependency of the project and MUST be formatted as either a
valid PURL_ string or a virtual: string. Each string maps directly to a
Requires-External entry in the core metadata_.

For optional-dependencies, it is a table where each key specifies an extra
and whose value is an array of strings. The strings of the arrays MUST be valid
PURL_ strings. Optional dependencies do not map to a core metadata field.

Examples

These examples show what the [external] content for a number of packages is
expected to be.

cryptography 39.0:

… code:: toml

[external]
build-requires = [
  "virtual:compiler/rust",
]
host-requires = [
  "pkg:generic/openssl",
]

SciPy 1.10:

… code:: toml

[external]
build-requires = [
  "virtual:compiler/c",
  "virtual:compiler/cpp",
  "virtual:compiler/fortran",
  "pkg:generic/ninja",
]
host-requires = [
  "virtual:interface/blas",
  "virtual:interface/lapack",  # >=3.7.1 (can't express version ranges with PURL yet)
]

[external.optional-host-requires]
dependency_detection = [
  "pkg:generic/pkg-config",
  "pkg:generic/cmake",
]

pygraphviz 1.10:

… code:: toml

[external]
build-requires = [
  "virtual:compiler/c",
]
host-requires = [
  "pkg:generic/graphviz",
]

NAVis 1.4.0:

… code:: toml

[project]
optional-dependencies = ["rpy2"]

[external]
build-requires = [
  "pkg:generic/XCB; platform_system=='Linux'",
]

[external.optional-dependencies]
nat = [
  "pkg:cran/nat",
  "pkg:cran/nat.nblast",
]

Spyder 6.0:

… code:: toml

[external]
dependencies = [
  "pkg:cargo/ripgrep",
  "pkg:cargo/tree-sitter-cli",
  "pkg:golang/github.com/junegunn/fzf",
]

jupyterlab-git 0.41.0:

… code:: toml

[external]
dependencies = [
  "pkg:generic/git",
]

[external.optional-build-requires]
dev = [
  "pkg:generic/nodejs",
]

PyEnchant 3.2.2:

… code:: toml

[external]
dependencies = [
  # libenchant is needed on all platforms but only vendored into wheels on
  # Windows, so on Windows the build backend should remove this external
  # dependency from wheel metadata.
  "pkg:github/AbiWord/enchant",
]

Backwards Compatibility

There is no impact on backwards compatibility, as this PEP only adds new,
optional metadata. In the absence of such metadata, nothing changes for package
authors or packaging tooling.

Security Implications

There are no direct security concerns as this PEP covers how to statically
define metadata for external dependencies. Any security issues would stem from
how tools consume the metadata and choose to act upon it.

How to Teach This

External dependencies and if and how those external dependencies are vendored
are topics that are typically not understood in detail by Python package
authors. We intend to start from how an external dependency is defined, the
different ways it can be depended on—from runtime-only with ctypes or a
subprocess call to it being a build dependency that’s linked against—
before going into how to declare external dependencies in metadata. The
documentation should make explicit what is relevant for package authors, and
what for distro packagers.

Material on this topic will be added to the most relevant packaging tutorials,
primarily the Python Packaging User Guide_. In addition, we expect that any
build backend that adds support for external dependencies metadata will include
information about that in its documentation, as will tools like auditwheel.

Reference Implementation

There is no reference implementation at this time.

Rejected Ideas

Specific syntax for external dependencies which are also packaged on PyPI

There are non-Python packages which are packaged on PyPI, such as Ninja,
patchelf and CMake. What is typically desired is to use the system version of
those, and if it’s not present on the system then install the PyPI package for
it. The authors believe that specific support for this scenario is not
necessary (or too complex to justify such support); a dependency provider for
external dependencies can treat PyPI as one possible source for obtaining the
package.

Using library and header names as external dependencies

A previous draft PEP ("External dependencies" (2015) <https://github.com/pypa/interoperability-peps/pull/30>__)
proposed using specific library and header names as external dependencies. This
is too granular; using package names is a well-established pattern across
packaging ecosystems and should be preferred.

Open Issues

Version specifiers for PURLs

Support in PURL for version expressions and ranges is still pending. The pull
request at vers implementation for PURL_ seems close to being merged, at
which point this PEP could adopt it.

Syntax for virtual dependencies

The current syntax this PEP uses for virtual dependencies is
virtual:type/name, which is analogous to but not part of the PURL spec.
This open issue discusses supporting virtual dependencies within PURL:
purl-spec#222 <https://github.com/package-url/purl-spec/issues/222>__.

Should a host-requires key be added under [build-system]?

Adding host-requires for host dependencies that are on PyPI in order to
better support name mapping to other packaging systems with support for
cross-compiling may make sense.
This issue <https://github.com/rgommers/peps/issues/6>__ tracks this topic
and has arguments in favor and against adding host-requires under
[build-system] as part of this PEP.

References

… [#singular-vision-native-deps] The “define native requirements metadata”
part of the “Wanting a singular packaging vision” thread (2022, Discourse):
Wanting a singular packaging tool/vision - #92 by steve.dower

… [#pypacking-native-deps] pypackaging-native: “Native dependencies”
Native dependencies - pypackaging-native

… [#gcc-cross-terminology] GCC documentation - Configure Terms and History,
Configure Terms (GNU Compiler Collection (GCC) Internals)

… [#meson-cross] Meson documentation - Cross compilation
Cross compilation

… [#pypackaging-native-cross] pypackaging-native: “Cross compilation”
Cross compilation - pypackaging-native

… [#pkgconfig-and-ctypes-findlibrary] The “pkgconfig specification as an
alternative to ctypes.util.find_library” thread (2023, Discourse):
`pkgconfig` specification as an alternative to `ctypes.util.find_library`

Copyright

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.

… _PyPI: https://pypi.org
… _core metadata: Core metadata specifications - Python Packaging User Guide
… _setuptools: https://setuptools.readthedocs.io/
… _setuptools metadata: Building and Distributing Packages with Setuptools - setuptools 69.0.3.post20231226 documentation
… _SPDX: https://spdx.dev/
… _PURL: GitHub - package-url/purl-spec: A minimal specification for purl aka. a package "mostly universal" URL, join the discussion at https://gitter.im/package-url/Lobby
… _vers: purl-spec/VERSION-RANGE-SPEC.rst at version-range-spec · package-url/purl-spec · GitHub
… _vers implementation for PURL: https://github.com/package-url/purl-spec/pull/139
… _pyp2rpm: GitHub - fedora-python/pyp2rpm: Tool to convert a package from PyPI to RPM SPECFILE or to generate SRPM.
… _Grayskull: GitHub - conda/grayskull: Grayskull - Recipe generator for Conda
… _dh_python: Debian Python Policy 0.12.0.0 documentation
… _Repology: https://repology.org/
… _Dependabot: Dependabot · GitHub
… _libraries.io: https://libraries.io/
… _crossenv: GitHub - benfogle/crossenv: Cross-compiling virtualenv for Python
… _Python Packaging User Guide: https://packaging.python.org
… _auditwheel: GitHub - pypa/auditwheel: Auditing and relabeling cross-distribution Linux wheels.
… _delocate: GitHub - matthew-brett/delocate: Find and copy needed dynamic libraries into python wheels
… _delvewheel: GitHub - adang1345/delvewheel: Self-contained Python wheels for Windows

25 Likes

Some prior art, though focused primarily on system package managers
in GNU/Linux distributions, is bindep · PyPI which
parses a DSL for listing non-Python-package dependencies in
projects, differentiating them across multiple distributions, and
annotating them by use case (think PEP 508 “extras”). Another is the
pkg-map element of diskimage-builder which serves similar purposes:
pkg-map — diskimage-builder 3.30.1.dev6 documentation

Neither currently looks for their data in pyproject.toml files, but
theoretically could if the specification is flexible enough to cover
their use cases.

1 Like

Great to see this PEP, thanks!

Looking at the examples, it may be worth explaining again for each of them why each dependency is where it is.

I am a bit surprised to see cmake and pkg-config as an optional host dependency for scipy. If I am correct they’re added to the list because meson can use these tools for dependency detection, right? Then I would expect them to be native to the build machine.

2 Likes

How are virtual deps resolved?
Does each platform provide a plugin to do the mapping?

Virtual C will work for a lot of projects, but some must have gcc or llvm.
H is that expressed?

How are minimum versions managed for tools like C compilers and packages?

2 Likes

Sounds cool. Does this support vcpkg? On Windows that seems to be the most convenient way to install external dependencies for writing Python extensions.

1 Like

Thank you @fungi, that is very helpful. I wasn’t aware of these tools yet. The use of name mapping between package managers is a key use case of this new metadata, and we did start on a second PEP and aim to prototype a generic mechanism like that. The prior art section has several other examples of name mapping tools, for R, Fedora and conda-forge: https://github.com/rgommers/peps/blob/pep-name-mapping/pep-9999.rst. This draft is nowhere near ready for submission, but there is a detailed worked example that will hopefully be insightful to how tools can use the [external] metadata.

You are completely right, that example has to be fixed. And yes, it may be quite useful to annotate each example with more comments as to what the dependencies are and why they are where they are. I’ll aim to do that in a follow-up PR.

I hope the link to the “name mapping” PEP draft above helps answer this. I’ll note that I wouldn’t expect any resolving to happen for a build command like pip install some-pkg (at least in the near future); only for things like recipe generators or error messages.

If you must have GCC or Clang specifically, then that’s a regular PURL instead of a virtual package. So pkg:generic/gcc.

For minimum versions, see the open issue on PURL version specifiers. And in addition I’d also expect that one would check that in their build config files for a compiler, and error out for too-old versions.

PURL by design can be extended to any package manager, and I see that a vcpkg contributor opened a PR a few weeks ago to do exactly that: A purl spec for the C/C++ package manager vcpkg by michaelbprice · Pull Request #245 · package-url/purl-spec · GitHub.

That said, vcpkg is almost always not the canonical location for a package, so I’d expect it to be quite rare for a Python package to contain pkg:vcpkg/ (just like it should be quite rare to see pkg:conda/, pkg:brew/ or pkg:nix/). You’re supposed to refer to the primary upstream package location, so another language-specific repository like pkg:cargo/ or otherwise a pkg:generic/ or the VCS repo like pkg:github/.

2 Likes

The writing is quite dense - including in the Motivation section - so I’m having a hard time getting my head around what exactly this is for. Could I possibly see a simple user story for where this is useful? What “runtime dependencies… external to PyPI” are commonly used? And what do you mean about “cross-compilation support” - how can the project depend on code that is compiled for a different platform than the one where the project is running?

Build backends and installers that understand how to work with dependencies outside the Python ecosystem can use this metadata to know what those dependencies are. It’s up to the tools themselves to do something such as install the external dependencies, link to them, etc. This PEP adds a standard way for any project to specify those external dependencies to any tool that might need to know them.

2 Likes

For both of these, see PEP 725 – Specifying external dependencies in pyproject.toml | peps.python.org - that shows projects that this would be useful for as well as what is common.

If you could point to specifics, that would be appreciated! :slight_smile:

If a package is available under one name/PURL in Fedora, another in Conda and yet another in Chocolatey, what should I do?
I guess what I’m missing is some kind of “OR”.


What exactly does virtual:compiler/c mean? Something like “a C compiler can be invoked as cc and can find+compile <Python.h>”?
Just ensuring that a C compiler exists somewhere on the machine doesn’t sound very useful.


tools like pyp2rpm (Fedora)

Note that pyp2rpm does not supporrt pyproject.toml at all, and that is not likely to change soon. You might want to avoid it as an example.

@ksurma is working on a replacement which

  • delegates everything it can to pyproject RPM macros, which AFAIK aren’t used in all the distros pyp2rpm covers :‍(
  • is still experimental (but should work!)
  • would probably want to be an early adopter of this standard

I think you are interested in the name mapping, which we intend to submit as a separate PEP from this one about the metadata itself, as described in my comment higher up.

To comment further on that: I think that Python package authors very likely do not want to know about these name mappings for all possible packaging ecosystems (there may be dozens). They should only use the canonical name. The name mapping files for a given other distro are best maintained by that distro itself.

All virtual:compiler/c expresses is “this package needs a C compiler” - and that’s certainly useful info by itself. I would not want to get into what executable name or environment variable must be defined, that’s again distro (and/or build system) specific. E.g., for MSVC it’s cl rather than cc, and the compiler must be activated with something like vcvarsall.bat rather than only be on the PATH. Those kinds of details should be well out of scope here.

Python.h is a separate question too - maybe the package only builds a pure C shared library and loads it with ctypes. If I were to maintain Python recipes for a distro, I’d probably add a rule like “if a package uses either a C or C++ compiler, then add the Python development headers as a dependency automatically”. But it’s not something that should be prescribed as the semantics of this metadata I believe.

Thanks. We can replace the example with pyp2spec it looks like, happy to do that. Its PypiPackage.python_name still encodes the same name mapping rule for Fedora.

OK, I’ll save my comments on that for when the PEP is up.

Hm. What does the PEP want to accomplish? Make it easy for humans to determine what the requirements are, or allow tools to install the necessary dependencies?

If the latter, how should tools interpret virtual:compiler/c? It sounds most would go with something like “setuptools.Extension should work”, and I don’t think that would be a good outcome.

That’s just a reasonable default, not by any means a hard rule.
For package foo on PyPI, the machine-readable name on Fedora is python3dist(foo). (It’s a virtual provide rather than a name, but you can use it as a name in most contexts, like installation.)

3 Likes

That’s just a reasonable default, not by any means a hard rule.
For package foo on PyPI, the machine-readable name on Fedora is
python3dist(foo). (It’s a virtual provide rather than a name,
but you can use it as a name in most contexts, like installation.)

Let’s say a tool following the proposed spec wants to determine
whether all the listed system requirements are installed, and
possibly inform the user which ones are missing (full disclosure, I
currently help maintain a tool which attempts to cover this use
case). Does Fedora expose an API of some kind to discover if a
package is installed which satisfies python3dist(foo) short of
actually asking the package manager to install it? Naively
requesting the list of installed packages and looking for a match
doesn’t work for those, obviously.

I do agree that virtual packages (Fedora is not the only distro with
such a concept) are likely good to support in such tools, as long as
the corresponding distributions have a way to discover which virtual
package names are supplied by the currently installed set of real
packages.

Of course.

$ rpm -q --whatprovides 'python3dist(requests)'
python3-requests-2.28.2-2.fc38.noarch
[exit code 0]

$ rpm -q --whatprovides 'python3dist(nope-not-here)'
no package provides python3dist(nope-not-here)
[exit code 1]

Note that the name must be normalized, as rpm doesn’t do Python-style normalization.

2 Likes

It’s primarily tools-oriented. PURLs are okay-ish for readability by humans, but not optimal. However they are precise, which is good for tools. And I think that’s why they’re also used by SBOM-related tooling.

Virtual dependencies by definition express a more abstract requirement than “package X from package repository Y”, so I don’t think there is any optimization for humans at the cost of usability for tools. Also, keep in mind that there are a pretty limited number of virtual dependencies that will actually be relevant. I’d be curious to see if there’s many more than given as examples in the PEP (compilers, BLAS, LAPACK, OpenMP, MPI). Maybe folks have a need for something like “a SQL engine” or “an SSL implementation”, or other such APIs that have a number of independent implementations? Not sure of that though - I suspect the vast majority of usage will be compilers.

It depends on the type of tool:

  • A recipe generation tool will translate it to the default for that distro. E.g, a Linux distro that builds everything with GCC may translate it to gcc or gcc-12. Conda-forge will translate it to {{ compiler('c') }}. Spack always has a C compiler available and hence just drops it completely (it only expresses known incompatible compiler versions, see this example).
  • If pip wants to show an error message for external dependencies when a build fails, it may show it unmodified or map it to a more human-friendly “a C compiler”.
  • Other types of tools could do yet other things. E.g., SBOM generation tools may have a rule from this that derives that there may be a runtime library pulled in. E.g., for C there won’t be, but for example for virtual:compiler/fortran it can deduce that there’s a runtime library (e.g., libgfortran.so) pulled into the final build artifact.

For none of these tools setuptools.Extension is of relevance.

To give you an example that’s hopefully close to home (and I hope I’m getting this right, because I’m not very familiar with Fedora): Koschei - scipy shows recent package builds and dependencies for scipy, and it says that gcc-c++ and gcc-gfortran are direct dependencies. Where did this info come from? Probably put there by hand by a packager, because it is not info that’s present in any metadata files. The purpose of virtual dependencies is to let the scipy maintainers say “we need C, C++ and Fortran compilers” and the Fedora tooling to translate that to gcc-c++ and gcc-gfortran.

3 Likes

Is there a list of valid (virtual) dependencies, or can anyone make up / invent / define new ones?
Can build tools somehow validate that they are spelled correctly?
Can one require PostgreSQL or SQL Server (but not MySQL)? (OpenOffice or LibreOffice but not Office 356, …)
Where can users look up how to require e.g. a .NET runtime (Mono or .NET Framework or .NET Core)?

1 Like

Question regarding reusing the Requires-External metadata field.

PEP 621 – Storing project metadata in pyproject.toml doesn’t currently have anything to say about supplying values for this field, but there are users: setuptools-ext is a build backend which exists mainly to specify this field in pyproject.toml using a tool subsection, e.g.:

[build-system]
requires = ["setuptools-ext"]
build-backend = "setuptools_ext"

...

[tool.setuptools-ext]
supported-platform = [
    "RedHat 8.6",
]
requires-external = [
    "systemdep1",
    "systemdep2",
]

The backend is otherwise a wrapper around setuptools. There’s additional tooling and automation which consumes the Requires-External value(s).

Would the acceptance of this PEP have any consequences for existing users of that field?

As a packager I will be very interested in the mappings. How easy do I know that its possible to have a successful build?

For fedora my pysvn deps are:
Requires
apr-devel
gcc-c++
glibc-langpack-en
krb5-devel
make
neon-devel
openssl-devel
python3-devel
python3-pycxx-devel >= 7.1.8
subversion
subversion-devel

On windows I have to compile from source the following to allow pysvn to be built.
build zlib
build openssl
build apr
build serf
build svn

The scripts that build all this have specific build configuration options for all of those packages.

If you are interested all my packaging scripts they are in pysvn / Code / [r2126] /trunk/pysvn/ReleaseEngineering
with windows and macOS folders. The fedora script is defined inside the RPM apec file.

(Koschei shows resolved requirements, but the input also has gcc-gfortran, swig, gcc-c++.)

Yup, this put in place by the packager. But the packager might want to do that even if this PEP is implemented, to make sure tools that just want to satisfy a virtual:compiler/c++ dependency don’t suddenly swap in a different compiler.

I guess the more important requirement here is python3-devel, which (among other things) makes sure <Python.h> is available and all the pieces for building a C/C++ extension (like pkg-config data) are in place. I don’t see a way to map that to requirements in this PEP. So it seems to me that the PEP will encourage “recipe generators” to translate virtual:compiler/c as “compiling C extensions should work” – that is, whatever support is necessary for setuptools.Extension and similar.

Perhaps that’s fine. It’s pretty much what we do in Fedora with python3-devel: there’s not much incentive for reducing the size of the build/debug environments so we don’t worry too much about fine-grained build dependencies. But is it right for all use cases?

In other words,

I’m looking for something that’ll make pip tell me to install Python header files. As I read the current PEP, that something could be virtual:compiler/c, but if it’s so it should be explicit.

2 Likes

Reading through the specification, I’m still a little unclear on how
dependencies for specific tasks should be differentiated. For
example, in the projects I help maintain we may have dependency
profiles like:

  • running integration tests
  • building the documentation
  • generating PDF copies of documentation
  • bundling into a container image

…and so on. Is this the sort of thing optional-dependencies is
intended to be used for? Or are some of those more appropriate for
optional-build-requires instead?

Also is there a way to indicate a dependency is used in multiple
tasks without listing that same dependency multiple times in the
file? The proposed format doesn’t look like it lends itself to that,
so just making sure. (The format used by bindep and our other
tooling associates tasks with dependencies, rather than the other
way around, so would presumably require dereferencing to convert.)

1 Like