PEP 725: Specifying external dependencies in pyproject.toml

None of the listed tasks are requirements for building a wheel nor for running the contents of the wheel after installation.

When we have a way to specify other environments or task groups, then we should also add support for native dependencies there.

1 Like

None of the listed tasks are requirements for building a wheel nor
for running the contents of the wheel after installation.

When we have a way to specify other environments or task groups,
then we should also add support for native dependencies there.

Apologies, it was unclear to me that the proposed PEP is exclusively
for building wheels. Most of the projects I work on are pure Python,
so our struggles are more with “what system packages do I need to
have a usable development environment for this project on my chosen
distro?” or “what distro packages am I going to need for running
tests in this project?” or instructing our CI system where to find
project-specific system dependencies for generating related
artifacts like documentation.

Those could all be seen as dependencies for building and running the
project, if you consider tests and documentation to actually be part
of your project rather than thinking of it as just the bits written
in Python for which the developers of the project are not the target
users (our tests are written in Python and do have users who want to
run them, those users are just usually the developers who are
maintaining the other software in that project).

2 Likes

That is a good question. I can see it work either way. In analogy to how it works for Python dependencies, I’d lean towards only specifying what is valid syntax, and recommending syntax for common dependencies like compilers. Nothing is stopping you from adding dependencies = ["my-nonsensical-dep"] in pyproject.toml today either. However, a centrally maintained list could also work - it just brings up the questions of where it gets maintained and who is responsible for it. I’d like to hear how others feel about this question.

The point is not to remove the entry in your dependency list, it should still end up there. The point is for your tooling to be able to generate the entry the first time you package a new Python package, and to alert you if anything changes in a new version of that package.

Fair enough. It could work like that - it would require that pip queries the name mapping mechanism to “translate” virtual:compiler/c to the relevant distro-specific packages (e.g., gcc python-devel). I think that’s a useful thing to do indeed.

It is not. But @FRidh’s answer is correct, and it applies to other questions like “Can one require PostgreSQL or SQL Server” too: these things work the same way as for Python build-requires/dependencies/optional-dependencies on purpose (except for the build/host split to support cross-compilation). They’re big topics by themselves, and we don’t want to boil the ocean in this PEP. If boolean conditions or named dependency groups with defined semantics get added for Python dependencies, they can get added for external dependencies at the same time.

Thank you for pointing that out. I am unsure, it’s the first time I hear about this field being used. I’m hoping someone with more experience with core metadata can weigh in about whether this field can and should be reused.

Thanks for sharing @barry-scott. That’s a complicated set of dependencies. It seems like a nice test case actually. I have opened add an example for `pysvn`'s external dependencies · Issue #9 · rgommers/peps · GitHub to work on an example of what the [external] table for pysvn should contain. I could give that a try later this week. Maybe you can comment on that issue with some more context? For example, there doesn’t seem to be a release on PyPI nor a pyproject.toml file, so I’m not sure if it’s helpful to try this.

@rgommers:

  • A recipe generation tool will translate it to the default for that distro. E.g, a Linux distro that builds everything with GCC may translate it to gcc or gcc-12. Conda-forge will translate it to {{ compiler('c') }}. Spack always has a C compiler available and hence just drops it completely (it only expresses known incompatible compiler versions, see this example ).

FWIW, compilers will soon just be regular packages in Spack, and they’ll be able to provide c, cxx, and fortran, openmp, etc. virtuals, which would satisfy dependencies like any other packages. So you could satisfy depends_on("c") with:

  • Building gcc from source, or
  • A binary, or
  • An external package, e.g.:
    packages:
        gcc:
            externals:
            - spec: gcc@12.2.1
              path: /opt/gcc-12
    

The plan is to unify the way packages and compilers are managed. A nice thing that falls out of this is that we can manage other interfaces provided by compilers, like openmp, the same way.

One thing I see missing in the proposal is versioning of virtuals. I suppose you could just build that into the name, but I think it’s easier for package authors to be able to express dependencies on version ranges. e.g., you might want something like (what we’d do):

    depends_on("openmp@4.5:")

where the gcc package could have something like this:

    provides("openmp-c@4.5", when="@6:")
    provides("openmp-fortran@4.5", when="@7:")

We already do this for other versioned interfaces/standards like MPI. We just haven’t been able to do it for compilers until now.

I suppose you could embed the versions of versioned interfaces in the various provided strings, or add another / component to your naming convention, but it gets cumbersome to specify all the interface versions provided by an external package when you don’t have ranges.

2 Likes

I’m not a fan of this implied behaviour. It’s possible to use a compiler without the Python development headers, and I think we need to expect that people will use the proposed mechanism to start describing python-less parts of their projects with such metadata.

For example, it’s common to have a C/C++ library and python wrappers for that library in the same project, call it A-lib and A-py. Now assume that another project B with the same pattern wants its B-lib to depend on A-lib (and then build B-py based on B-lib as usual; this happens a lot with protobuf, for example).

Neither A-lib nor B-lib require the python headers, and IMO they should not be implied by virtual:compiler/c, because that can actually lead to wrong results (e.g. boost will build additional libs if python is present), much less wrong dependency metadata (i.e. whether *-lib needs to be built per python version or not).

Also, it’d be inconsistent between compilers; virtual:compiler/{fortran,rust} would presumably not get that implicit inheritance. I understand that python has a special role here, but perhaps that could also be solved via a virtual package, e.g. virtual:devel/python (which could be satisfied by CPython, PyPy etc. of a given version).

2 Likes

Thanks for the context on upcoming Spack changes @tgamblin, that looks good.

Agreed - there’s an open issue for version range support: PEP 725 – Specifying external dependencies in pyproject.toml | peps.python.org.

Versioning for a well-defined interface like OpenMP is clear, >=4.5 seems well-defined. For compilers it’s more difficult though, because there’s no relationship between version numbers of GCC, Clang, MSVC & co. And even expressing “I need at least C99 and C++14 support” seems tricky. There’s a couple of issues I can think of there:

  • a “how do I spell that” issue. >=99 isn’t helpful if the next version is C11, and using 1999 seems unusual),
  • a “how exact is that expression really” issue. E.g., C99 support implies that complex float must work, but MSVC doesn’t support it despite claiming full support for C99 (and hence complex support was made optional in C11).
  • For packagers this info isn’t all that helpful anyway, because there’s no mapping from >=99 to concrete compiler versions available.

So I suspect that we may be better off not using such version specifiers for compilers and other virtual dependencies when things are that ambiguous. That should instead be handled by feature testing and a clear error at build time - that’ll be easier and more precise.

4 Likes

If the presence of headers makes the result wrong (rather than, say, longer to compile), I’d call this an issue in boost. Either that, or venv is now an insufficient isolation mechanism, since header location is shared with the “main” Python.

I’d argue they should, if compiler/c implicitly contains the headers.

But yes, it would be much better to be explicit. If nothing else, a package could bundle/require its own C header parser – say, Zig or cffi.

So, python should be implicit, but not python-devel?


Hmm, stepping back, the PEP conflates lib/dev packages general. By my reading:

[external]
build-requires = [
  "pkg:generic/openssl",  # need headers -> `openssl-devel` package in Fedora
]
host-requires = [
  "pkg:generic/openssl",  # just library -> `openssl` package in Fedora
]

Is that expected? Should the PEP be more explicit here?

1 Like

I’d argue that this is not a problem in the PEP. Rather, it’s a problem caused by distros splitting up Python into multiple packages. If you obtain Python from python.org, the Windows Store, or macOS, you get all of Python. That includes header files, python-config, python.pc, etc. These are part of Python.

This is true for other projects with headers too. E.g., why do we have blas and blas-dev/blas-devel on a bunch of distros, but not numpy/numpy-devel even though NumPy also installs headers that you only need if you use a C/C++ compiler?

I think that in general, Python package authors should depend on a single package with its canonical name, and if distros split that package up into two or more separate installable packages in their package repository, they should be responsible for dealing with that complexity.

So here:

  • A python dependency is implicit for any use of pyproject.toml, so python-devel should also be implicit,
  • For other dependencies that are named, we need a one-to-many name mapping option. So if I build-depend on pkg:generic/openblas, the distro should map that to either openblas or [openblas, openblas-dev] as appropriate (some split it up, some don’t),

The question has come up twice already now, so I think indeed the PEP should address it. But this is the resolution I’d recommend. Metadata should not contain anything about distro-specific package splitting, and the second PEP on name mapping should enable the desired mapping here and have explicit examples on how to use that.

10 Likes

Dunno about BLAS specifically, but AFAIK a traditional Unixy make install isn’t expected to install headers. (edit: Hmm, maybe it is? I was wrong! But…) Or sources and debug info, for that matter – yet those are necessary for debugging.

If a package doesn’t support being split (like numpy) then yes, a distro deciding to it up would be questionable. But not a priori unforgivable – after all, distros try to serve their users.

FWIW I was thinking we would actually use c@1999:, c@2011:, etc. for this. Or start with c@99 since people say it that way so frequently, but require the 20 prefix for c@2011 and up. It’s unusual but it’s at least clean when the dates map to something monotonically increasing.

  • a “how exact is that expression really” issue. E.g., C99 support implies that complex float must work, but MSVC doesn’t support it despite claiming full support for C99 (and hence complex support was made optional in C11).

The current plan here is to offer language levels as versions with about the same level of precision as the compiler authors provide, i.e. “most of it”. I’m not sure what features that will include, but I don’t think it’s unreasonable to look at a matrix like this one: C++ compiler support - cppreference.com and pick a version at which the compiler “starts” to provide a language level.

To solve issues like the complex float one you mention, and to allow people to use bleeding edge compilers that have far from complete support, we were planning to add additional virtuals, since the features themselves are essentially just interfaces. So you can have a compiler provide virtuals like cxx-lambda, c-complex-float, cxx-mdspan etc., and you can have a package depend on those instead of the language level when finer granularity is needed.

I’ve had people argue that it’s better to only provide feature virtuals, but I think that is unrealistic. There are tons of them, and in Spack, most packagers aren’t even authors. Also, code teams will decide things like “we decided to use C++14 from version 4 onward”, but that doesn’t necessarily mean they picked a subset of features. They might use different features from patch to patch. So I think there is merit to having broader language-level virtuals for the common case, along with more specific ones for more obscure (or less frequently implemented but sometimes required) features.

  • For packagers this info isn’t all that helpful anyway, because there’s no mapping from >=99 to concrete compiler versions available.

I think this should be defined somewhere – I am not sure where it makes sense for the python ecosystem, but seems like some PyPI package could provide the information so it could be easily used by all tools.

For Spack, we plan to provide this information directly on the compiler packages, where we declare the virtuals they provide. So you’d have, in our python DSL, something like:

class Gcc(Package):
    provides("cxx-lambda", when="@4.5:")
    provides("cxx@:2011", when="@4.8.1:")

i.e., Lambdas are available before full C++11 support in GCC 4.8.1.

We are already able to do this to some extent for things like CUDA, but the burden is on packages to provide the information. Since you can inherit metadata from base classes in Spack, we are able to consolidate the information in places like CudaPackage. e.g., these conflict declarations tell Spack which compiler versions are required by particular CUDA versions (which we get from CUDA Compilers · GitHub – yeah it’s a gist).

Most of that has to do with language level. With proper compiler dependencies and language-level virtuals, I expect that:

  1. Specific compiler version information will move into compiler packages (e.g., provides("cxx@:2014", when="@<compiler version>")), and
  2. CudaPackage will just declare dependencies on virtuals (e.g., depends_on("cxx@2014", when="+cuda ^cuda@<cuda version>") – “I need to compile with a compiler that supports C++14 when I’m using this particular version of CUDA”)

I think information like (1) could be provided by some common PyPI package.

(2) is actually more interesting because it’s a common pain point in Spack that I don’t think is addressed in the PEP. External packages/interfaces (like CUDA) force requirements on their dependents. The PEP would allow a project to say it depends on a C compiler or an external CUDA package, but I don’t think it allows you to say “if you use a particular CUDA version, you need a particular C language level”. That requirement comes from CUDA and ideally it’d be expressed by whatever describes CUDA. We’re able to do this with base classes, but I dunno what the equivalent is for pyproject.toml.

So I suspect that we may be better off not using such version specifiers for compilers and other virtual dependencies when things are that ambiguous. That should instead be handled by feature testing and a clear error at build time - that’ll be easier and more precise.

I don’t agree w/this, because we, at least, need this information to be available for dependency resolution. We don’t currently run any compile tests before resolving dependencies, and I don’t think we want to (we don’t even fetch the packages before resolution). For the cross-compiled case, we can’t run tests on the build machine at all.

I am not sure how useful the requirements will be if tooling can’t satisfy them precisely. I think it would be frustrating to depend on a cxx compiler, but to frequently have build tests fail when tools can’t select the right cxx compiler to use.

So if the idea is that this enables tools to have better checks for pre-existing dependencies, it’s a clear improvement. I don’t see that much difference between failing to find a C++ compiler at build time and failing to pass a build test at build time. For tools that actually control the build environment, though, I think there’s a bigger difference, because the tool could actually use the information to get you the compiler you want.

2 Likes

@tgamblin that’s a compelling argument, thank you. I think it’s feasible to add language level versions. I’d like to sleep on it for a bit, and ensure we don’t make this too complex to avoid making it difficult to understand for the average package author. But I think I agree that something like @2011 is useful and understandable.

This seems potentially useful indeed. I’ll note that for common compilers and C/C++, we have collected a lot of info in Toolchain Roadmap — SciPy v1.12.0.dev Manual already (thanks to work by @h-vetinari), because we need it to decide if and when to upgrade.

2 Likes

Based on Paul’s announcement about the PEP process, it might be good to expand the Reference Implementation section to include a rationale for the lack of reference implementation.

2 Likes

Yes I completely agree. Distro packagers get to make their own trade-offs, they have to be pragmatic too and if splitting up a package is on average good for their users, they may do that. All I’m trying to say here is that such decisions shouldn’t affect metadata for Python packages - so let’s give distro packagers the tool to register their decision in some name mapping that can be queried. That should work for both Python package authors and distro packagers I’d think.

2 Likes

I like this approach too – I think it should be the job of the tooling to map canonical package names to distro package names.

But this only makes sense to me because spack, conda, etc. have their own package names… which I guess are “canonical” within those ecosystems. Who defines canonical names and canonical package structure for externals as described in this PEP, though? That should probably be defined a bit more specifically.

For Spack it’s mostly what you describe – the “canonical” whole package is what you get if you build the distribution tarball from the upstream package source. I could poke holes in that definition though. There are build options that can be enabled and disabled, and it can depend on the distro or user preferences which ones are/aren’t. There are also layout issues – e.g., the way Debian mangles the pip install location with /local (don’t get me started on this), or the way different bias implementations have completely different lib names.

In Spack, the package.py author is responsible for writing some code to handle different external layouts, and we have some abstractions like spec["blas"].libs (get the libraries needed to link with this bias provider) to deal with some of them. But it’s very specialized for particular packages, and I’m not sure how you can generalize all of these pyproject.toml externals without something that describes the external packages – i.e., something more than a name.

1 Like

Thanks @abravalheri, good point. The metadata itself is more a spec, with multiple types of consumers. For the “recipe generator” type, we plan to implement support in Grayskull, as part of the work on name mapping. So that’s more an implementation that goes with that second PEP rather than with the metadata one. But I assume that’s fine - if it consumes the metadata and we show that Grayskull can then produce higher-quality recipes for a decent-sized set of packages, I think that should be sufficient here?

2 Likes

In the cases where it is split, I’d like to see the distro skip openblas-dev when the same symbol is merely host-depend -ed on. Though I guess you were already thinking along those lines?

It’s a shame IMO that Numpy (not to mention the rest of the SciPy stack) doesn’t offer this kind of split, like a [dev] feature when installing from Pip. In that world, there’s probably quite a bit more free hard drive space. I’d hazard a guess that quite a large fraction of Numpy/SciPy users have no interest in doing anything at the C or C++ level (and in fact this is exactly why they are using Python with those third-party libraries).

Maybe if there were an option, for those who don’t want to be messed with in that way, to force looking on PyPI for the dependency? (Of course it would break if the developer demands that PyPI provision a shrubbery C compiler, but I hardly think anyone would be surprised by that.)

My thinking is, if you use that option, then the naming and package structure is whatever the PyPI maintainers of that project dictate; otherwise, it’s according to your distro’s name mapping (which you can of course look up yourself to verify what you’ll get on that distro, in case you start getting a lot of bug reports from that distro’s users).

(Although, actually, it’s not clear to me why Numpy would ever be an “external” dependency…)

Just to be sure: would this ever be something that the packager needs to figure out, based on the code’s own idiosyncrasies? Or is it something that’s purely intrinsic to CUDA?

I’m cautiously optimistic about this PEP as it does seem a step towards smoother integration of various types of dependencies.

I have one small question, one medium question, and one-and-a-half larger questions.

The small question is:

The lack of an [external] table means the package either does not have any external dependencies, or the ones it does have are assumed to be present on the system already.

I don’t quite understand that last part. Does that “assumed” part just mean “if there is no [external] table, the package may be doing the old bad way where it just doesn’t tell you what you need and you have to figure it out yourself”? The wording just gave me pause because I can also read “already” as meaning “before the code can be run”, in which case it would mean not declaring the dependencies is the same as declaring them (i.e., in both cases you assume they will be there by the time you’re ready to run).

Perhaps it could be changed to something like “or the ones it does have are undeclared and anyone using the package must use non-metadata means (e.g., reading the source or documentation) to figure out what they are, as was the case prior to this PEP”?

My medium-sized question has to do with what is intended for cases where a PyPI package currently bundles non-Python dependencies (e.g., PyQt). If I write a package for PyPI publication that depends on the PyPI PyQt6 package, am I supposed to list the underlying Qt6 as an external dependency?

In some sense this question is also about how to specify metadata in such a way that tools which do separate Python and non-Python dependencies, as well as those which don’t, can get what they need from the metadata. In other words, how can a package say “if you don’t have Qt6 as a separate lib, then give me the big-bundle Python PyQt6 that has Qt6 inside it, but if you do have Qt6 as a separate lib, then give me that separate Qt6 plus a thinner PyQt6 that is meant to go with it”? Or this all just totally up to a human distributor to suss out by looking at the normal vs. external dependencies and pondering them?

My larger question is (perhaps unsurprisingly) about where this PEP is intended to lead us. What would be helpful to me in understanding this is one or a few “case studies” that sketch out what kind of smoother workflow would be enabled. The PEP currently includes examples of the metadata for some packages, which is helpful, but what remains unclear to me is what sorts of beneficial ripple effects are envisioned from those metadata changes. Something like “right now, to build package X, we have to do such-and-such cumbersome/fragile process, but if tool Y is adapted to read the new metadata, now we can do it in such-and-such simpler/robust way”. And perhaps something similar for benefits (if any) that cascade to those who are installing the packages rather than building them.

I realize that the actual creation of such a future is outside the scope of the PEP, but it seems to me that in some sense the hope of such a future is the PEP’s only reason for existing. :slight_smile: So it would be useful (at least for me) to see some concrete ideas about what we hope would happen, even if they’re in a non-normative “example” section of the PEP or even just in this discussion thread. The PEP can’t control what will happen but it seems reasonable to have some specific preview of what good stuff could be enabled by this PEP.

As it is now, I like the idea of the PEP, but its abstractness (and the discussion of some nitty gritty points in this thread) leave me uncertain about what its practical effect would be.

My additional question is half a question because it’s an addendum of the above. The PEP implicitly accepts that PyPI will remain primary and that there will be a continuing split between dependencies available on PyPI and those not available on PyPI. In my own view, the ultimate progress would be for this split to be closed so that a single mechanism (i.e., tools and repository) can not only declare but provide the needed dependencies for all packages (a la conda).

In the happy event that such a situation comes about, how would this metadata adapt? It seems the answer might be as simple as: “When non-Python dependency X becomes available on FuturePackageRepo[1], the developer of a package that previously listed X under [external] can just move X under [project.dependendencies]”. Is that the idea? However, the different formats of the dependency info (PEP 508-style vs PURL) do leave me with some uncertainty about whether there could be awkward seams in the transition. Or perhaps if this happy situation comes about, PURL and Python-style dependency specifiers would evolve convergently so that they could be converted automatically?


  1. i.e., the version of PyPI that can directly handle non-Python deps ↩︎

Yes, that would indeed make sense. Assuming you meant for a runtime dependency, so if it’s in the dependencies table.

To clarify terminology, since I saw the same “host” confusion once before in this thread: build-requires and host_requires are both build-time dependencies; the difference is that the former run on the build machine (compilers, code generators) and the latter on the host machine (typically libraries you link against).

It wouldn’t ever be, because it’s a Python package on PyPI. I’m not sure what else to reply to that section of your comment - there’s no such thing as looking up or getting external dependencies from PyPI.

Your interpretation is correct. It’s basic backwards compatibility. There’s no [external] table now, so also in the future in the absence of an [external] table, the situation is simply unchanged from what it is today. I’ll try to reword that sentence.

No. Qt6 is a transitive dependency here, and you should never describe those. Your package should contain only metadata for direct dependencies.

You cannot express something like that in metadata, because it requires a runtime check on the user’s machine. Nothing changes here, it’s not possible today and it will not be possible after this PEP. The way to deal with this, if you really want such install logic, is something like a separate metapackage on PyPI which does a runtime check inside a setup.py file, and then dynamically adds the thin or fat PyQt6 package to install_requires based on the result of the runtime check.

Did you see https://github.com/rgommers/peps/blob/pep-name-mapping/pep-9999.rst? The long-form examples will hopefully help, and also this:

“The “Rule Coverage” of its README shows how that improves the chance of success of building packages from CRAN from source. Across all CRAN packages, Ubuntu 18 improves from 78.1% to 95.8%, CentOS 7 from 77.8% to 93.7% and openSUSE 15.0 from 78.2% to 89.7%. The chance of success depends on how well the registry is maintained, but the gain is significant - ~4x fewer packages fail to build on Ubuntu and CentOS in a Docker container.”

Missing external dependencies are a leading cause of build failures when installing Python packages from source. The name mapping mechanism, as one of the things envisioned to be consuming this new metadata, could be used similarly to the R external dependencies. I’d expect the effect to be similar too: better metadata == fewer failing builds.

Yes, the answer will likely be that simple. Beyond that, I don’t think this is the place to discuss the role of PyPI in the far future, there are enough other threads for that.

2 Likes

I am so, so happy to see this discussion and this PEP. This is what I should have done after the Packaging Summit at PyCon 2019. Thanks for making that discussion (among many others!) into such a nice spec, @pradyunsg and @rgommers .

What can I do to help move this forward? It seems like perhaps a prototype implementation could live in Grayskull, as a translator from the package metadata here into the Conda ecosystem equivalent metadata. Is that a helpful exercise?

My interest is more than Conda, though. I’m now employed by NVIDIA on their RAPIDS team, which at one point dropped wheels as a distribution mechanism. There is customer demand to not require Conda, however, so wheels are supported, but their size is presenting the usual problems with distributing something so large. We want to support shared libraries and distribute only our specific bits, rather than having to bundle all of our dependencies.

Disclaimer: My opinions are my own, not NVIDIA’s, but I am making efforts here in the hopes of making NVIDIA software easier to install.

10 Likes

Thanks @msarahan, and great to see you here! Congrats on your new role.

It’d be very nice to see an implementation for Grayskull indeed. Recipe generators are a major use case, and Grayskull is near the top of my wish list to try this on. If you’d be willing to work on that, that would be super helpful.

A few weeks ago I did write another prototype, for using [external] metadata directly in wheel build workflows. It worked out even better than I expected, and I had wanted to write a blog post on that before including it in the next iteration on the PEP - but I ran out of time then, so may as well share it now: GitHub - rgommers/external-deps-build.

The basic goal was: insert [external] metadata into sdists of the latest release of each of the top 150 most downloaded packages from PyPI with platform-specific wheels (i.e., containing compiled code), and show that that allows building wheels without any other modifications in clean Docker containers with only a distro Python installed (no GCC, no make, no python-dev headers, etc.). The only supporting tool being the py-show prototype, which maps [external] metadata to distro-specific package names.

End result:

  • Arch Linux, Python 3.11: 35 out of 37 wheels built successfully (and can be imported, so they at least don’t contain completely broken extensions)
  • Fedora, Python 3.12: 33/37 successful builds
  • Conda-forge, Python 3.12: 33/37 builds

Without the external metadata, only 8 packages build successfully - the ones with only optional C extensions and slow pure Python fallbacks (e.g., pyyaml, sqlalchemy, coverage).

The few remaining failures could all be understood, and none were due to the external metadata design. Causes included:

  • Latest release on PyPI did not support Python 3.12 yet (aiohttp),
  • Latest release on PyPI did not use pyproject.toml at all and inserting a default 3 line one to use build-backend = "setuptools.build_meta" did not work either (lxml, grpcio),
  • Invoking make from within setup.py in a subprocess to build a C dependency from source in an uncontrolled manner led to an undefined symbol import error (matplotlib - fixed in their main branch already, because they moved to Meson which has better support for building external dependencies as part of the package build)
  • Failure to detect a dependency due to a missing pkg-config file (scipy - a known Fedora bug that scipy still needs to work around).

The README of the repo contains more details on results and exact steps used. The demo is basically complete at this point, although it’d be nice to extend it further to other platforms (help very much welcome). The two I had in mind next were:

  • Windows, with Vcpkg and/or conda-forge
  • macOS with Homebrew

I don’t see a reason that those wouldn’t work. Iterating on those in CI is a bit time-consuming though, easier to do locally so I’d be particularly interested in someone who develops on Windows trying this out with their favorite package manager. Both conda-forge and Vcpkg should contain all needed dependencies, with the notable exception of MSVC of course, so that should be pre-installed.

It may also be interesting to browse the external metadata for each package. I extracted the number of times each language needing a compiler was used:

Compiler # of packages
C 35
C++ 11
Rust 4
Fortran 2

A few packages show some interesting dependencies where it wasn’t immediately clear what the correct dependency was. E.g., for psycopg2-binary I had to choose whether to depend on PostgresQL or libpq, and after choosing the latter I found that the mapping from:

host-requires = [
  "pkg:generic/libpq",
]

to distro-specific package names was non-uniform (libpq on Fedora and Conda-forge, but postgresql-libs on Arch Linux). The set of dependencies was interesting enough to serve as a proof of concept for a central name mapping repository as well I’d say.

@msarahan I believe Grayskull already has its own name mapping mechanism and PyPI ↔ Conda-forge metadata files. I haven’t thought much about whether to extend that mechanism for a prototype in Grayskull, or use the simple mapping inside the py-show tool in my demo. Hopefully there’s something of use there though - perhaps only the toml files with [external] sections and the sdist patching?

If anyone has some thoughts on this experiment, I’d love to hear them of course. Other than that, the next step is a PEP update which incorporates the feedback to date.

8 Likes