PEP 725: Specifying external dependencies in pyproject.toml

rgommers · November 10, 2023, 12:29pm

Thanks @msarahan, and great to see you here! Congrats on your new role.

It’d be very nice to see an implementation for Grayskull indeed. Recipe generators are a major use case, and Grayskull is near the top of my wish list to try this on. If you’d be willing to work on that, that would be super helpful.

A few weeks ago I did write another prototype, for using [external] metadata directly in wheel build workflows. It worked out even better than I expected, and I had wanted to write a blog post on that before including it in the next iteration on the PEP - but I ran out of time then, so may as well share it now: GitHub - rgommers/external-deps-build.

The basic goal was: insert [external] metadata into sdists of the latest release of each of the top 150 most downloaded packages from PyPI with platform-specific wheels (i.e., containing compiled code), and show that that allows building wheels without any other modifications in clean Docker containers with only a distro Python installed (no GCC, no make, no python-dev headers, etc.). The only supporting tool being the py-show prototype, which maps [external] metadata to distro-specific package names.

End result:

Arch Linux, Python 3.11: 35 out of 37 wheels built successfully (and can be imported, so they at least don’t contain completely broken extensions)
Fedora, Python 3.12: 33/37 successful builds
Conda-forge, Python 3.12: 33/37 builds

Without the external metadata, only 8 packages build successfully - the ones with only optional C extensions and slow pure Python fallbacks (e.g., pyyaml, sqlalchemy, coverage).

The few remaining failures could all be understood, and none were due to the external metadata design. Causes included:

Latest release on PyPI did not support Python 3.12 yet (aiohttp),
Latest release on PyPI did not use pyproject.toml at all and inserting a default 3 line one to use build-backend = "setuptools.build_meta" did not work either (lxml, grpcio),
Invoking make from within setup.py in a subprocess to build a C dependency from source in an uncontrolled manner led to an undefined symbol import error (matplotlib - fixed in their main branch already, because they moved to Meson which has better support for building external dependencies as part of the package build)
Failure to detect a dependency due to a missing pkg-config file (scipy - a known Fedora bug that scipy still needs to work around).

The README of the repo contains more details on results and exact steps used. The demo is basically complete at this point, although it’d be nice to extend it further to other platforms (help very much welcome). The two I had in mind next were:

Windows, with Vcpkg and/or conda-forge
macOS with Homebrew

I don’t see a reason that those wouldn’t work. Iterating on those in CI is a bit time-consuming though, easier to do locally so I’d be particularly interested in someone who develops on Windows trying this out with their favorite package manager. Both conda-forge and Vcpkg should contain all needed dependencies, with the notable exception of MSVC of course, so that should be pre-installed.

It may also be interesting to browse the external metadata for each package. I extracted the number of times each language needing a compiler was used:

Compiler	# of packages
C	35
C++	11
Rust	4
Fortran	2

A few packages show some interesting dependencies where it wasn’t immediately clear what the correct dependency was. E.g., for psycopg2-binary I had to choose whether to depend on PostgresQL or libpq, and after choosing the latter I found that the mapping from:

host-requires = [
  "pkg:generic/libpq",
]

to distro-specific package names was non-uniform (libpq on Fedora and Conda-forge, but postgresql-libs on Arch Linux). The set of dependencies was interesting enough to serve as a proof of concept for a central name mapping repository as well I’d say.

@msarahan I believe Grayskull already has its own name mapping mechanism and PyPI ↔ Conda-forge metadata files. I haven’t thought much about whether to extend that mechanism for a prototype in Grayskull, or use the simple mapping inside the py-show tool in my demo. Hopefully there’s something of use there though - perhaps only the toml files with [external] sections and the sdist patching?

If anyone has some thoughts on this experiment, I’d love to hear them of course. Other than that, the next step is a PEP update which incorporates the feedback to date.