PEP 725: Specifying external dependencies in pyproject.toml (round 2)

No, we don’t. We have some “best effort” tooling, which is far from optimal.

If a dependency is unsatisfied in the build environment, we can generally determine that because the build fails. Or rather, if the build fails. If the missing dependency causes some feature to be disabled or some Python extension to be skipped, it’s quite easy to miss that.

If a dependency is present in the build environment but missing (or extraneous) in the distro package lists, the tooling can generally detect that as well. But again, that’s a big if.

Really, at this point the only real solution is to scour the build system (and given PEP 517, this means a bunch of different build systems, many of them using diverse ways of specifying dependencies, often hidden in the middle of lots of building code, split into dozens of files, using a lot of different keywords…) and/or documentation. Having even a “best effort” list in pyproject.toml would be a huge improvement.

Of course, you could argue that this problem is not specific to Python, because it is generic to all software. Still, even if only Python packages, and if only some of them (presuming not all will adapt this PEP) use this, it is a huge improvement.

2 Likes

@mgorny: The tooling I was referring to was determining runtime dependencies for a binary, from a system that has the development tools for it available (i.e. mapping linker NEEDED entries to packages). Practically, this allows you to only specify Build-Depends, and [Install-]Depends are generated at build time.

But I see that the intention is for this mapping to live in the PEP-804 metadata for a generic library (in the run key).

In this sentence:

  • host-requires, build dependencies needed for the host machine but also needed at build time.

“also needed at build time” reads weird. Did you mean “also needed at runtime”? (but I hope that’s not the case: some dependencies might be needed only for building, for example a code generator of some kind).

2 Likes

A code generator would be a build dependency, since it’s something that runs on the build machine. For example, Cython is a pure-Python build dependency, and Bison is an “external” one.

The other definition later in the PEP is clearer: a host dependency “must be available during the build and is built for the host machine’s OS and architecture. These are usually libraries the project links against.”

This area is obviously confusing a lot of people, so maybe the PEPs could give some more examples. In particular, the Examples section of PEP 804 covers pip (which installs packages) and Grayskull (which converts packages to a different ecosystem). But it doesn’t discuss building packages, which is the third major use case of the PEP, and the one where the host vs. build platform distinction matters most.

Here’s a sketch of how I imagine builds working; feel free to use it in the PEP or point out where I’ve got anything wrong:

When not cross-compiling, the host and build platforms are the same, so the build and host dependencies are handled in the same way. The build frontend (e.g. pypa/build) would use pyproject-external or similar to determine what these dependencies are, and do one of the following:

  • Install them locally in the isolated build environment, with a tool like conda.
  • Install them with a system package manager like apt.
  • Show the user a message explaining how to install them manually.

When cross-compiling. the host and build platforms are different, such as building for Android or Pyodide on Linux, or for iOS on macOS.

  • Build dependencies are handled in the same way as the non-cross-compiling case.
  • Host dependencies cannot be installed with the system package manager, because they’re not for the build machine’s platform. So they’ll need to be installed with something like conda. Imagine that conda and conda-forge add support for Android, then the build frontend could do something like this:
    • Create a conda environment for Android (using conda create --subdir).
    • Use pyproject-external to determine the conda install commands for the host dependencies.
    • Run those commands within the Android environment.
    • Arrange for the environment to be used by the build by setting CFLAGS, LDFLAGS, PKG_CONFIG_LIBDIR, etc..

This is effectively what we’re already doing to support Android in cibuildwheel, except the Android environment isn’t created with conda, but by unpacking an Android release from python.org, which only contains Python, OpenSSL and SQLite. We’re still using pypa/build under the hood, but because it doesn’t know about cross-compilation, we’re breaking the build up into smaller steps and using pypa/build’s API to do each of them for the build or host platform as appropriate.

This approach has been good enough to enable several packages to release Android and iOS wheels on PyPI. But I expect that as we get into more complex packages, many of them will need their host dependencies to be pre-built. So this PEP has come at just the right time for those of us who are working on Python cross-compilation, and I’m happy with its overall design.

5 Likes

Thanks @mhsmith, you got that pretty much 100% right.

I’ll put up a PR on the PEPs repo soon with an attempt at improving this language and adding another example along the lines you suggested. And I’ll then post the link here so everyone who had suggestions on this can weigh in.

Thanks for sharing your concrete cross compilation use case. I’ll add that to the Motivation section as well. The design will be useful for any kind of cross compilation need, but it does feel worth it to mention Android/iOS given that cross compiling is the only option there.

Not always impossible. Debian multiarch supports installing packages for a foreign architecture.

I believe some other Linux distributions have multilib support that can install 32-bit libraries on 64-bit platforms (which may not be enough to be useful here).

And MacOS supports fat binaries that contain support multiple architectures (also not sure if that’s actually useful at this level).

2 Likes

The PEP explains its choice of names by saying “the existing key in pyproject.toml for PyPI build-time dependencies is build-requires". But that’s not quite right: the key is requires, in the [build-system] table.

And that gave me an idea: what if instead of an [external] table, each new item was placed in the same position in pyproject.toml as its pure-Python equivalent, with an external prefix? In other words:

[project]
dependencies
optional-dependencies
external-dependencies
optional-external-dependencies

[build-system]
# For "optional" keys, see note below.
requires
external-requires
external-host-requires

[external-dependency-groups]
...

This has the advantage that dependencies which are used at the same time are listed in the same table – in particular, the presence of external-host-requires in the [build-system] table makes it clear that it’s a build-time dependency. We can’t avoid the unfortunate inconsistency between “requires” and “dependencies” that already exists, but at least we’re consistent within each table. And we retain backward compatibility by not altering the meaning of any existing keys.

I didn’t include any “optional” keys in the [build-system] table, because I’m not clear how these would be used. As it stands, extras are an install-time concept, not a build-time concept, which is why [build-system] has no optional-requires key. It’s possible that build-time extras might be added in the future, but like the “OR operator” discussed above, it seems like that’s outside the scope of this PEP.

1 Like

Also on the subject of extras: do we really need a Provides-External-Extra field? The Provides-Extra specification says “It is legal to specify Provides-Extra: without referencing it in any Requires-Dist:", so there shouldn’t be any compatibility problems with using Provides-Extra for both pure-Python and external extras.

1 Like

Thanks for the suggestion. I don’t know where that “it is legal without referencing” came from, but my guess is it was from platform-specific extras in a time before build backends wrote the same METADATA into all wheels. So a Requires-Dist: may be present on one platform but missing on another.

Reusing Provides-Extra would probably dilute the semantic meaning of that field. It currently means “has an extra with Python packages” and it would then change to “has an extra with either Python or external dependencies”. Which in turn determines what you can do with it, like use pip install .[extra-name]. Currently pip and uv don’t fail on missing names, they only produces a warning (again not sure why), but still this doesn’t seem ideal to me.

Provides-Extra also shows up in the sidebar with metadata info about a package version on PyPI. I’m not sure if it’s better or worse to have Python and external extras mixed together there or separate.

I can’t think of a concrete reason that we really shouldn’t go that way though, and it is one less Core Metadata field if we can reuse Provides-Extra which would be nice.

I’m not sure that that would be a win (I personally don’t think so), but either way: that doesn’t seem possible given backwards compatiblity / rollout constraints. E.g. PEP 621 – Storing project metadata in pyproject.toml | peps.python.org says “When specifying project metadata, tools MUST adhere and honour the metadata as specified in this PEP. If metadata is improperly specified then tools MUST raise an error to notify the user about their mistake.” Similar for PEP 518: PEP 518 – Specifying Minimum Build System Requirements for Python Projects | peps.python.org.

Quick test:

$ python -m build -wnx
ERROR Failed to validate `build-system` in pyproject.toml: Unknown properties: external-requires

A new table can safely be added in pyproject.toml, new keys cannot.

Thanks, that seems pretty conclusive.

I can think of another reason: if something that used to be an optional extra is changed into a mandatory dependency, then the package might want to keep on listing it in Provides-Extra to avoid existing users of that extra getting a warning or error from the installer.

I think that’s exactly what you would want to do with it, whether the extra includes Python or external dependencies, or both. How else would a user even request an optional external runtime dependency?

On the other hand, I’m still not clear about how you would request an optional build or host dependency, so maybe these shouldn’t be included in the PEP. See the last paragraph of my previous comment.

I think we need to be very careful here. I don’t expect pip will ever install external dependencies. They are by definition external, and therefore not managed by Python package management tools like pip. I really don’t want pip to have to interface with the plethora of system installers out there, and I don’t think doing so is sustainable.

What I expect is that if pip encounters an external dependency that isn’t installed, we will report that to the user and expect them to manually fix the issue.

2 Likes

What I expect is that if pip encounters an external dependency that isn’t installed, we will report that to the user and expect them to manually fix the issue.

The middle ground we take in bindep is that by default running it will report which system dependencies you’re missing in a human-friendly way, but it has a command-line option to just print a bare list of missing package names so that the user can feed that directly to their platform’s package manager for convenience. This is especially useful in automation, since our CI/CD jobs also rely on it to figure out exactly what system packages to install before installing the Python-based project or running its tests.

Miscellaneous comments:

Support in PURL for version expressions and ranges beyond a fixed version is available via vers URIs (see specification):

Broken link: should be https://github.com/package-url/vers-spec.

Allowing use of ecosystem-specific version comparison semantics

This is listed as a “rejected” idea, but as I understand PEP 804, version comparisons are entirely done by ecosystem-specific tools, in which case they don’t necessarily follow PEP 440 ordering semantics at all.

Maybe this section should be replaced with a discussion of why we require versions to be in PEP 440-compatible syntax.

I don’t think that you can assume that external packages use PEP-440 compatible versions.

1 Like

That’s true: as you pointed out here, virtually no Debian packages have PEP 440-compatible versions, so they couldn’t be selected with ==, but they could still be matched against a PEP 440 version range with >= or similar, if the >= was interpreted by Debian’s own tools.

Is there so much of a difference between using the plethora of system package managers to query installed packages and installing packages? Each require using the mapping of package managers to query/install commands.

(FWIW, I also don’t like the idea of pip installing system packages. But, even if it doesn’t, that doesn’t get pip away from having to interface with the external package managers.)

1 Like

Strictly speaking, yes. “Querying” can generally be reduced to a 0/1 operation: either it’s installed or not. “Installing” may involve lots of problems and side effects, e.g. dependency conflicts, required configuration changes, perhaps even implicitly uninstalling some other package that is needed.

1 Like

If their version can even be represented in a PEP-440 version. Many can’t.

I thought I saw a reference somewhere in this thread to something that defined a Python function tools could call to check for existence of an external dependency. If I’m wrong about that, then no, there’s not that much difference and I wouldn’t expect pip to check whether external dependencies are present either[1].


  1. Unless I’m misreading it, there’s nothing in the PEP that requires tools to do anything with this new metadata. ↩︎

2 Likes