PEP 794: Import name metadata

brettcannon · June 10, 2025, 10:23pm

I updated PEP 794 – Import Name Metadata | peps.python.org so that it’s less prescriptive about what Import-Name must contain and instead describes what projects should put in there. That means the spec is much looser as discussed, to the point that the only requirements for what can be in Import-Name are:

Importable names (i.e. syntactically valid)
The name(s) must be of something that can be imported on some platform that the project and version is installed on (i.e. don’t lie)

Otherwise everything else is guidance.

I also tried to clean up the specification section so that it was more inline with what I would expect would end up in the actual core metadata spec (sans the examples).

BrenBarn · June 11, 2025, 5:14am

I think the proposal is good overall. I have some comments and one more substantive reservation which I put at the end.

By keeping the information to the import names a project would own if installed, it makes it clear which project maps directly to what import name once the project is installed.

This sentence seems somewhat tautological to me, or else a restatement of the previous paragraph. It seems to be saying “because the information about is about what names a project will own if installed, it tells us about the names a project will own if installed”. This may be because the meaning of “own” was not clear to me until later in the PEP (see below). I’m not sure there is a need to say this here. Maybe it can be combined with the previous paragraph? Something like:

This PEP proposes extending the packaging Core metadata specifications so that project owners can specify the highest-level import names that a project provides and owns if installed. This allows indexes or other tools to create a clear mapping between project names and import names.

Later:

The names specified in Import-Name MUST be importable when the project is installed on some platform for the same version of the project (i.e. the metadata MUST be consistent across all sdists and wheels for a project release).

The way I first read this it sounded like a contradiction, because the first part of the sentence seems to use “some” in the sense of “there exists”, but the second part is saying that the information must be consistent across all (i.e., a “for all” rather than “exists”). It took me a few reads to understand that what you mean is that because the metadata must not vary, it may not be able to capture any cross-wheel variations, so only accuracy on “some” platform is required.

I think the parenthetical here is at least as important as the first part. I think it would be clearer more like:

The metadata MUST be consistent across all sdists and wheels for a project release. This means that the metadata in any one artifact may not reflect the names importable when that artifact is installed (since, e.g., some names may not be provided on all platforms). Rather, for each name specified in Import-Name, there MUST exist some platform on which that name is provided when the project is installed.

Later. . .

If a project is part of a namespace package named ns and it provides a subpackage called ns.myproj (i.e. ns.myproj.__init__ exists), then ns.myproj should be listed in Import-Name, but NOT ns alone as that is not “owned” by the project upon installation (i.e. other projects can be installed which also contribute to ns).

Only here do I understand what is meant by “owned”. It makes sense, but perhaps better to not use that word earlier in the PEP (as I mentioned above) as its meaning is unclear before this explanation. It is only used a couple times earlier on, and I think this is a subtle enough detail that it doesn’t need to be foregrounded at the outset.

In pytest 8.3.5 there would be 3 expected entries:

_pytest

py

pytest

The inclusion of the apparently private _pytest here is surprising. If this is the intention (as discussed in a few earlier posts), I think it should be mentioned somehow in the text.

In the “How to teach this” should there be any mention of build backends or similar tools? From some earlier discussion it seems we’re envisioning a future in which build backends automatically fill in “obvious” values. But things can be nonobvious in different ways; for instance, a package author may understand that their main project’s name will be included, but still be surprised that a private name they also provide is also included. So maybe something more general like “package authors should be taught that they should check their build backend’s documentation to understand how (or whether) it automatically fills in import-names, and should sanity-check the generated metadata”.

My more substantive reservation is that I feel the PEP should somehow address the alternative of “make no specification and simply encourage indexes to provide such a mapping based on the information they already have”. I guess this would go in rejected ideas although I’m still not sure I think this PEP’s gains are worth it without that index support. For instance, in the rationale section:

Various other attempts have been made to solve this, but they all have to make various trade-offs. For instance, one could download every wheel for every project release and look at what files are provided via the Binary distribution format, but that’s a lot of CPU and bandwidth for something that is static information (although tricks can be used to lessen the data requests such as using HTTP range requests to only read the table of contents of the zip file). This sort of calculation is also currently repeated by everyone independently instead of having the metadata hosted by a central index server like PyPI.

Yes, it’s a lot if every tool or person that wants this has to download them all, but it’s not a lot for PyPI because it already has them all and doesn’t have to download anything. And if PyPI provided that information, it’s unclear to me whether anyone would feel the need for it to be in the metadata. (Or if they did, maybe they’d want something different from this PEP, in order to fill in the gaps in whatever PyPI did.) In other words, the “central index server” provision of a bidirectional project-import name mapping is possible with or without this metadata. Moreover, if PyPI does not use this information to provide such a mapping itself, it will still be a pain (albeit a smaller pain) for everyone to download all the metadata for every package. And although such a mapping might be wrong in various ways, so might the proposed metadata. So it still seems to me like the real missing piece is the actual public provision of a complete mapping, not the individual statements by individual packages about what names they provide.

Later in that same section it does give the example of sdists, which can’t obviously be handled in this manner. As usual I hate sdists and think the solution is to just stop supporting them as an install mechanism . But, absent that, I still think it would be helpful if the PEP more directly tackled the question of how this metadata in and of itself can reduce pain (i.e., even if PyPI doesn’t do anything with it), and why it is needed if PyPI could provide a similar service without the metadata.

pf_moore · June 11, 2025, 8:22am

I think it would be very beneficial if the PEP included an example of a use case where only the metadata for a single package was needed. I can’t honestly think of one myself, which suggests that maybe the above point is true - the PEP as it stands is only part of a solution, with the actual need being for a mapping from import name to package name, available from an index.

steve.dower · June 11, 2025, 11:35am

Sure, but we don’t need to specify that. Where’s the specification for how an index should make classifiers browsable/searchable? How invalid is an index that doesn’t have a search feature?

As I said early in the discussion, this is essentially a field containing search keywords. What anyone does with them once they’re available is just as flexible as that, the only thing we need in an interoperability spec is to say where they should be put.

Adding an example of a package index offering a search filtered by import name to the Motivation section ought to cover it.

pf_moore · June 11, 2025, 1:26pm

But if the only use case is to build an import name → project mapping, why not just standardise something that just provides that mapping? All I’m asking for is other use cases that demonstrate that the data needs to be available from an individual wheel, and an installed project (i.e. as part of core metadata).

steve.dower · June 11, 2025, 3:18pm

Because it doesn’t need to be standardised as much as it just need to be done. 90% of uses really just need someone to go through PyPI, use some heuristics to determine top-level modules names, patch any outliers, and put a list up on a CDN so that it can be downloaded efficiently by tools that want the mapping.

For some reason, nobody has done that. But the reason isn’t the lack of a specification for it - it’s because it’s a nice thing to have but ultimately not important enough for anyone to have applied funding/effort. Creating a spec saying “you^[1] must create this list” isn’t the right way to go about getting the list.

Right now, the PEP is basically distributing the heuristics and exceptions process to individual packages, which reduces the cost to whoever eventually makes the list. Which is as far as it ought to go, IMHO.

Again, I’ll equate it to classifiers, which serve no purpose in an individual wheel or an installed project. Do we require use cases for classifiers other than as search keywords?

For some currently unspecified value of “you”, which presumably could only be the Warehouse maintainers, along with anyone else who maintains an index server implementation. ↩︎

pf_moore · June 11, 2025, 3:31pm

Ah, I see now. Sorry for being slow. For some reason, I hadn’t thought of “searching for a classifier” as needing a classifier->project mapping in the same way as we need an import name->project mapping to use this PEP.

I’d still rather that the “given an import name, get me a list of packages that provide it” service be provided by PyPI^[1] rather than having 3rd parties doing bulk queries to maintain an external service. Has anyone asked PyPI whether they would support such an API (either as a Warehouse-specific API or as a standardised index API specification)? But apart from the very nebulous matter of “will the expected use cases result in enough extra load on PyPI to be a concern?” I accept that this is a separate question.

And other indexes ↩︎

brettcannon · June 11, 2025, 10:23pm

I’ll take the paragraph out.

It’s because one is from the rationale which comes before the spec where the second part is from. It’s a bit of a chicken-and-egg problem. I’ll try to clarify it a bit in the first instance.

But I also can’t be too specific since I don’t want to restrict this to just wheels and sdists because who knows what is coming in the future.

Why? _pytest is importable when pytest is installed. The PEP says “projects SHOULD list the highest-level/shortest import name that the
project would “own” when installed”, and that includes _pytest.

I don’t think so as the build back-ends in nearly all cases will just copy what’s in project.import-names in pyproject.toml.

I’ll add a sentence addressing that.

But it’s still going to be inaccurate as the PEP pointed out when saying why parsing a RECORD file isn’t enough.

I haven’t floated that hypothetical passed anyone on the PyPI side of things.

EpicWink · June 11, 2025, 11:38pm

I would have expected some backends to offer to automatically calculate it, if left dynamic in project.toml and when it makes sense (pure Python wheels, explicit package/module inclusion). I think that UX doesn’t need to be included (nor prohibited!) in the PEP though.

BrenBarn · June 12, 2025, 1:37am

This is more or less the way I’m viewing it too, which is why I think it’s okay but I’m still unsure how useful it is in and of itself.

The important difference is that many of the classifiers are impossible to derive from the package contents^[1], while PyPI already has considerable information about the importable names from each package. It’s true that this information may be inaccurate or incomplete in various ways, but so may the information from this new metadata — and also the new metadata will definitely not be available for all the stuff already on PyPI, whereas it still is possible to make a best-effort computation for that stuff based on the existing data.

e.g., you can’t tell that a package is intended for image processing unless the author says it is ↩︎

jamestwebber · June 12, 2025, 1:52pm

Does it really have that information in a usable form that doesn’t run into the edge-cases described above^[1]? I think the point of this metadata is to simplify that process and make the results more reliably reflect the authors’ intent.

and is a reasonable thing for PyPI to compute on every package ↩︎

brettcannon · June 12, 2025, 10:11pm

I suspect some will, but …

It is neither included nor prohibited in the PEP.

But one is way easier to fix than the other.

That’s normal for Python packaging.

And if someone wants to do that then this metadata doesn’t prevent that. If you could view it as the 2 approaches are complimentary or that some automation can supplement the manually-specified metadata.

ncoghlan · June 18, 2025, 4:25am

Regarding private names: the “list may not be exhaustive” wording could (and I think should) be clarified to say that module names starting with an underscore may be considered private implementation details and hence omitted from the metadata.

Regarding platform dependent entries: could we allow the inclusion of environment markers in the names, with the same syntax that we use for dependency declarations? I’m genuinely not sure the complexity would be worth it, though, since the whole question goes away if the platform dependent APIs are nested inside a platform independent parent module instead of being published as top level modules.

brettcannon · June 18, 2025, 11:12pm

But I don’t know if leaving private names out is a good thing. People do use those modules whether they should or not.

Could we? Yes. Should we? …

I don’t think it’s worth it.

ncoghlan · June 19, 2025, 12:55am

The question doesn’t come up with submodules (since the PEP suggests those shouldn’t be listed regardless of whether they are public or not).

I guess I’m viewing this as a kind of “Provides named interface” metadata, and while allowing projects to advertise their private interfaces seems reasonable, I’m not clear on the rationale for recommending that they be advertised when they happen to be top level names.

One potential rationale I can see is as a way of more efficiently detecting potential installation conflicts, offering an initial “definitely a problem here” pass before having to check full RECORD files. If we run with that, then it could just be mentioned in the Motivation section with the following more detailed notes added somewhere:

import name metadata can indicate that an installation conflict exists, but RECORD files still need to be checked to prove the absence of one
the rule for namespace packages would be that at most one package can claim ownership of the namespace itself, but an arbitrary number of packages can add to it

That use case feels like it would need a way for owning packages to explicitly declare namespaces, though. That way installers could warn when a submodule was being added to a parent package that wasn’t declared as a namespace by its owning package.

brettcannon · June 19, 2025, 7:29pm

I don’t view the metadata as advertising the project’s named interface as much as documenting what names it would own if installed which includes the named interface. And since we have a naming convention for what’s “private”, it’s easy to distinguish.

Apparently there’s a need for that: Warn when two packages write to the same module by konstin · Pull Request #13437 · astral-sh/uv · GitHub .

ncoghlan · June 20, 2025, 2:21am

OK, that makes sense, so that would just be a matter of mentioning that extra motivation and including those design notes.

Which leaves the open question as whether we want to allow package authors to declare namespaces. That wasn’t necessary for the interface publication use cases, but for conflict detection there’s a difference between “this package owns this import name, any other package writing to it in any way is a conflict” and “this package sets the rules for this namespace, other packages are expected to add submodules, so only declaring control over the namespace itself is a conflict, declaring control over submodules is fine”.

For spelling (assuming we decide to support the distinction), I’d suggest a second Import-Namespace metadata key (along the lines of what @MegaIng suggested above) rather than making Import-Name more complicated.

brettcannon · June 20, 2025, 7:30pm

So what does that look like? Let’s say we have a project that owns spam.bacon.eggs in the spam.bacon namespace. What would you expect to be in the metadata?

Import-Name: spam.bacon.eggs
Import-Namespace: spam
Import-Namespace: spam.bacon

to be thorough? Or would you leave out Import-Namespace: spam.bacon since if spam is a namespace and you’re claiming down to spam.bacon.eggs then you can infer spam.bacon.eggs is a namespace package? Or would you leave out Import-Namespace: spam since spam.bacon implies spam is a namespace package since you can’t interleave namespace and regular packages?

Knowing you, my guess is you want the thoroughness so that you have an extra check as to why there are any dots in the owned names and to guard against mistakes, even if it is a bit redundant. And to be fair, the amount of users of nested namespaces is very likely a rather small group who could handle typing a top-level namespace name twice.

sirosen · June 20, 2025, 8:16pm

I like the fully explicit version because it means that conflict checking between packages is basically trivial.

Imagine I’m making a tool which helps you validate that you created a set of namespace packages correctly. It takes a list of wheels and verifies that they are internally consistent.

Imagine these two wheels appear:

# pkgA.whl
Import-Name: spam.bacon.eggs
Import-Namespace: spam

# pkgB.whl
Import-Name: spam.bacon

I think this is technically valid (?), but it would have been nicer if the user or their build tool had been explicit about what’s going on here.

effigies · June 20, 2025, 8:25pm

I hope I’m not repeating someone else, but I failed to quickly locate a post that addressed this. I work with some packages that try to provide a namespace underneath their top-level package for plugins to drop things into, e.g.,

# pkgA.whl
Import-Name: spam
Import-Name: spam.workers.common
Import-Name: spam.tasks.common
Import-Namespace: spam.workers
Import-Namespace: spam.tasks

# pkgB.whl
Import-Name: spam.workers.fry
Import-Name: spam.tasks.bacon

Would this be a valid way of declaring that spam is taken, but spam.workers and spam.tasks are open? Or would I need to make spam a Namespace and list out all of the spam.* modules? (There is a spam.__init__.py.)