PEP 804: An external dependency registry and name mapping mechanism

Thanks! From the functionality point of view, this should cover all the important use cases that I can think of. On the other hand, I’m not sure if a simple json-file overwrite is the most ergonomic and flexible solution. You could pretty quickly end up with a dozen overwrite files for various system configurations. But since you will not standardize the overwrite UX in the PEP, it’s also fine to not settle on a specific mechanism for now…

Awesome, thanks! I’ll probably face the challenges of ergonomics once I start prototyping some examples, and that will inform the final recommendations. The proposal above is preliminary in that regard and I’m open for feedback once we have something tangible!

1 Like

I’m probably the best person to talk to for these issues. I think I’m more involved here than anyone else in the Debian community.

General :+1: to this proposal. External dependencies have been a headache forever, and making it easier for users to navigate this is long-overdue.

So, Debian context:

  • Debian packages use a standardized approach to map PyPI modules to Debian packages. The python package providing a public module is typically: python3-{import_name}. That’s not always true. Some sub-communities within Python use a prefix, like python3-django-X. And Python applications (that aren’t typically used as a library) will just be named after the application.
  • It doesn’t make any sense to map an exact dependency into a Debian (or probably any distro) package in a generic context. Debian packages have a packaging revision encoded in the version (e.g. python3.14 version 3.14.0-4 has the -4 Debian suffix). So, exact dependencies of upstream versions map to version ranges of Debian packages. (python3.14==3.4.0 would map to python3.14 >= 3.4.0, python3.14 << 3.4.1~).
  • The specifier_syntax scheme doesn’t look like it allows any way to transform versions PEP-440 pre-release versions into sensible Debian package versions. (Any part of the version that starts with a|b|rc gets a ~ prefix, which sorts it above an empty string in Debian version comparisons.)

Other feedback:

  • The meaning of “query packages” is not defined. I can guess that it means “Display information about the named package (which may or may not be installed)” from context, but this could be spelled out. Other potential meanings: Looking up potentially installable packages, Listing installed packages, Searching for packages matching keywords.
  • How do we handle packages that change name across distro versions. e.g. library packages usually contain the SONAME to allow co-installability of multiple versions during the transition. In your example Ubuntu mapping, dep:generic/libffi maps to libffi8, but a couple of releases ago that would have been libffi7. So, do we list all potential variants of the package name? Or do we version the registry by distro release?
  • The distinction between build, host, and run is not clear in either PEP. It took me a while to understand that this is about providing context in interpreting a PURL. I raised this in the PEP725 thread, but both PEPs could make this clearer.
  • I don’t see any built-in primitives to make derived distributions simple. I guess any Debian derivatives could just be mapped to the same mapping file as Debian (assuming we can have a single file that covers all releases), but we could have an inheritance mechanism to allow derivatives to override things.
2 Likes

How are mappings associated with their intended platform? pyproject-external seems to key on package managers meaning that all Debian/Ubuntu derivatives and versions share the same mapping. Then in the external-metadata repository, the mapping file for apt is called ubuntu without a version yet its description and all subsequent package URLs specifically refer to Ubuntu 24.04 (a.k.a. Noble). We’ve got three contradictory levels of specificity here?

Thank you for the feedback, @stefanor! I’ll reply inline.

It doesn’t make any sense to map an exact dependency into a Debian (or probably any distro) package in a generic context.

conda packaging also has a notion of “builds”, similar to those packaging revisions. What happens there is that when a user asks for pkg==1.2.3, an exact version match takes place, but the solver still selects the most convenient “build variant” for the system. Same with compiled wheels. Isn’t there a way to tell apt or equivalent to pick the latest revision of a given exact version?

The specifier_syntax scheme doesn’t look like it allows any way to transform versions PEP-440 pre-release versions into sensible Debian package versions.

Hm, yes you are right there’s no mechanism (yet?).I hoped we wouldn’t go into pre-releases and the extra can of worms it brings :sweat_smile: I’m inclined to say “not supported” for now, but happy to hear thoughts.

The meaning of “query packages” is not defined.

Thanks, noted. I’ll clarify it means “check if the package is already installed”.

How do we handle packages that change name across distro versions

Our proposal so far is to version the registry. Each distro release gets a new mapping.

I don’t see any built-in primitives to make derived distributions simple.

We haven’t proposed any. Our thinking is that downstream derivatives can take the original mapping, copy it with a different name, and introduce the modifications they need. A cronjob on Github Actions or similar would ensure it’s reasonably up-to-date. Otherwise, we may end up with nested “parent mapping” resolutions, broken links, and a whole sort of complications that I don’t think we need.

How are mappings associated with their intended platform?

I’d say this is not standardized. The pyproject-external implementation proposes keying on the package manager (because so far we’ve only added one example ecosystem per package manager), but in the future, as you point out, several mappings may be available for e.g. apt. In those cases, tooling may want to use fields from os-release (e.g. ID + VERSION_ID?) to match on the mapping name, and then let the user pick the version if there are several.

the mapping file for apt is called ubuntu without a version yet its description and all subsequent package URLs specifically refer to Ubuntu 24.04 (a.k.a. Noble)

That ubuntu mapping can be considered the latest LTS release. Assuming we had mappings for Ubuntu 22.04, maybe it would have been archived as ubuntu-22.04 once 24.04 became available. Or served under a version-specific subdirectory.

I’ll give a thought to the version fields but I want to strike a balance between specificity for Linux distributions and non-Linux ecosystems that don’t need these fields.

1 Like

apt doesn’t do that, I’m afraid. I thought about filing a wishlist bug for something like this, but I don’t even think it would have much of a use-case. Theoretically, it’s a missing feature, but in practice, you’d never use it (and it would need to invent some new syntax to differentiate it from the current = operator). The expectation is that each APT repositories provide you with a single version of a package, the best one it has to provide, not a menu of available versions.

Apt’s job is typically to provide you with the latest version of a package available for a given release, without violating any other constraints on the system.

Given the inability to usefully use == on most Linux distributions (as discussed above), I think not supporting this is reasonable.

That’s still unclear whether this is user-facing output, or an API that communicates by exit-code. If it’s user-facing, some context for the user may be necessary.

If we used dpkg -l {package}, we have the problem that dpkg does not keep information about packages that are not installed on the system, which can lead to confusing output. It will just say no packages found matching ... rather than saying X is not installed.

If we used apt list {package} the presence of [installed] on the line tells you it’s installed, if it’s missing it isn’t. To a novice user, that isn’t exactly clear that something isn’t installed.

So, how well these options would work would depend a lot on the context provided to the user.

OK. That sounds like the best plan to me.

Another reason for this, that I keep forgetting to mention, is version epochs. Debian archives (like any package manager, I’d assume) require package versions to monotonically increase. So, when an upstream switches to a new version scheme and introduces a lower version that the previous one, e.g. upgrading from 20251029 to 1.0.0, we increment an epoch at the start of the version. So, we’d call this 1.0.0 1:1.0.0. That 1: epoch prefix is then required forever into the future (or at least, until it bumps to 2: when the upstream reads another blog post about how versioning should be done).

Epochs can also be introduced when one package takes over a name from another, with a lower version. So, it’s not necessarily due to any fault of the upstream, just unlucky history.

2 Likes

Hm, now we are mapping DepPURLs to names only, but maybe that’s a simplification of the broader case. If the mapping is given by a function, we would be extending the signature from (name) to (name, epoch=0).

Side quest driven by curiosity: What happens when upstream publishes a new version under their own epoch? Let’s say we have a project called peach which used calver but got abandoned at 2019.01.20. A new peach is now making the rounds, using half-year caler, currently at 25.10.4. The distro maintainers publish it as 1:25.10.4. However, modern peach maintainers choose to change to semver and will introduce their own epoch 1 in the upcoming 1:1.0.0 release. The distro has to push to 2:1.0.0, or can they do 2:1!1.0.0?

All of that to say… do we have to track the epoch deltas across releases too? :sob: I’d really love to have an alias package (e.g. peach-with-epoch x.y.z, which depends on peach 1:x.y.z) in that case we can use, so the mapping is still done between names only.

Side quest driven by curiosity: What happens when upstream publishes a new version under their own epoch? Let’s say we have a project called peach which used calver but got abandoned at 2019.01.20. A new peach is now making the rounds, using half-year caler, currently at 25.10.4. The distro maintainers publish it as 1:25.10.4. However, modern peach maintainers choose to change to semver and will introduce their own epoch 1 in the upcoming 1:1.0.0 release. The distro has to push to 2:1.0.0, or can they do 2:1!1.0.0?

As I understand it, a Python package (sdist, wheel) epoch is specific to to the package itself, and not supposed to be part of the project’s literal version. Other downstream artifacts, for example packages in GNU/Linux distros, would apply their own separate epoch following their specific versioning guidelines if warranted, which might be the same as the epoch used in the Python package or might differ. Ultimately it’s up to the package maintainer in every distribution to decide.

1 Like

I’d forgotten that PEP-440 included epochs too. I dug around a bit, and found one example of package with an epoch in PyPI that’s in Debian too. metomi-isodatetime (GH, PyPI, Debian). In this case Debian only started packaging it after its Epoch bump, so PyPI has 1! and Debian only has the implicit 0:. Any versioned-dependencies on it are going to need manual adjustment…

I’m skeptical that epochs will be useful at all. For one because most distros do not support such a concept, so it seems like a stretch to make it a distro-independent concept in the registry. For another because it’s ultra-rare that it’s needed.

The metomi-isodatetime example is something I’ve never seen before in the real world; that package has deleted all it’s non-epoch-1 versions (or there never were any) so probably Python tooling doesn’t properly support epochs. A quick test with pip shows various issues, like pip install metomi-isodatetime>4 working but not giving any stdout output, and pip install "metomi-isodatetime>1!3.0" returning apparent garbage because the ! needs to be escaped on the command line (>1\!3.0 does work fine). Plus there are no tests or docs.

Thinking about whether epochs could be added in the future in a backwards compatible way in case an important enough need should come up seems fine, but adding it now does not appear to be justified. If you want to make a case for inclusion now, I think it needs to come from actual real-world external dependencies with epochs that existing Python packages have. Do you know of any such cases?

From what I can tell they used to be the isodatetime PyPI package, and you can see the epoch bump happen in it. I don’t the history behind their epoch bump or name change.

1 Like

There are about 100 Python packages in Debian that have epochs. A good example is Django (on epoch 3).


But my point here is that we shouldn’t expect to reliably automatically translate upstream versions into distro versions. Sometimes some mangling is going to be required. So I’m sceptical about any scheme that would require something like this.

I also don’t see situations where == relations would make sense, in most linux distros.

1 Like

Yes, that is a good point, thanks for the examples.

That I see a little differently. It’s package authors who decide which versions of any of their dependencies they support: any version, a range with lower and/or upper bound, or a single exact version. In the latter case, that is ==. It’s indeed likely that Linux distros don’t have that exact version [1], because of the “one version per distro release” rule as you point out. At that point, the correct behavior is probably for the user to get an error - the version really is not available. That isn’t not-useful, it just reports the actual situation. There is no other reasonable action that can be taken at that point.

It’d of course be better for the package authors to specify a version range, because that has much better compatibility - but the trade-off there invariably is that it’s more work for package authors. So they sometimes use ==. Distro packagers then usually loosen that restriction and add patches as needed to make things work with a different version; in a standard “build Python package from source using external metadata” scenario we can’t really do that though.


  1. Leaving aside build number issues, I’m talking about the package saying it wants 1.2.0 and the distro only has 1.1.0 or 1.3.0 ↩︎

3 Likes

The meaning of “query” is already defined under “Generating package manager-specific install commands”:

install command templates are paired with query command templates so those tools can check whether the needed packages are already present without having to attempt an install operation (which might be expensive and have unintended side effects like version upgrades).

That implies that the tools will use the query command’s exit code. And I guess this is the reason why the PEP says “query MUST only receive a single identifier per command”: if it accepted more than one, then in the event of a non-zero exit code, the caller wouldn’t know which packages it needed to install.

However, the example command conda list -f {} always exits with a code of 0 whether the package is found or not. Also, it apparently doesn’t accept version specifiers, only names.

Wouldn’t the correct mapping to Debian be libffi-dev, though? My understanding is that since the specification doesn’t provide for the library/devel split, you’d always have to depend on the widest variant by using -dev — and then that should handle the versioning indirection, right?

Yes, I noticed this as part of the research for the write-up and sent a PR upstream, which was merged and released as part of conda 25.9.0:

$ conda --version
conda 25.9.0
$ conda list -n base python > /dev/null; echo  $?
0
$ conda list -n base pythonnnnn > /dev/null; echo  $?
CondaValueError: No packages match 'pythonnnnn'.
1
1 Like

The definition could be clearer. In particular the “mappings schema” section is where the most detailed definition is, and there’s no mention of exit code relevance there.

And if the output is to be ignored, that could be specified.

My understanding is that you’d depend on different things in different contexts. For build-time, it would be -dev and for runtime it could just be the shared library.

There are contexts where runtime could need access to headers (e.g. cffi’s in-line modes), but that’s rare and this scheme doesn’t allow specifying that level of detail.