PEP 794: Import name metadata

I updated PEP 794 – Import Name Metadata | peps.python.org so that it’s less prescriptive about what Import-Name must contain and instead describes what projects should put in there. That means the spec is much looser as discussed, to the point that the only requirements for what can be in Import-Name are:

  1. Importable names (i.e. syntactically valid)
  2. The name(s) must be of something that can be imported on some platform that the project and version is installed on (i.e. don’t lie)

Otherwise everything else is guidance.

I also tried to clean up the specification section so that it was more inline with what I would expect would end up in the actual core metadata spec (sans the examples).

3 Likes

I think the proposal is good overall. I have some comments and one more substantive reservation which I put at the end.

By keeping the information to the import names a project would own if installed, it makes it clear which project maps directly to what import name once the project is installed.

This sentence seems somewhat tautological to me, or else a restatement of the previous paragraph. It seems to be saying “because the information about is about what names a project will own if installed, it tells us about the names a project will own if installed”. This may be because the meaning of “own” was not clear to me until later in the PEP (see below). I’m not sure there is a need to say this here. Maybe it can be combined with the previous paragraph? Something like:

This PEP proposes extending the packaging Core metadata specifications so that project owners can specify the highest-level import names that a project provides and owns if installed. This allows indexes or other tools to create a clear mapping between project names and import names.

Later:

The names specified in Import-Name MUST be importable when the project is installed on some platform for the same version of the project (i.e. the metadata MUST be consistent across all sdists and wheels for a project release).

The way I first read this it sounded like a contradiction, because the first part of the sentence seems to use “some” in the sense of “there exists”, but the second part is saying that the information must be consistent across all (i.e., a “for all” rather than “exists”). It took me a few reads to understand that what you mean is that because the metadata must not vary, it may not be able to capture any cross-wheel variations, so only accuracy on “some” platform is required.

I think the parenthetical here is at least as important as the first part. I think it would be clearer more like:

The metadata MUST be consistent across all sdists and wheels for a project release. This means that the metadata in any one artifact may not reflect the names importable when that artifact is installed (since, e.g., some names may not be provided on all platforms). Rather, for each name specified in Import-Name, there MUST exist some platform on which that name is provided when the project is installed.

Later. . .

If a project is part of a namespace package named ns and it provides a subpackage called ns.myproj (i.e. ns.myproj.__init__ exists), then ns.myproj should be listed in Import-Name, but NOT ns alone as that is not “owned” by the project upon installation (i.e. other projects can be installed which also contribute to ns).

Only here do I understand what is meant by “owned”. It makes sense, but perhaps better to not use that word earlier in the PEP (as I mentioned above) as its meaning is unclear before this explanation. It is only used a couple times earlier on, and I think this is a subtle enough detail that it doesn’t need to be foregrounded at the outset.

In pytest 8.3.5 there would be 3 expected entries:

  1. _pytest
  2. py
  3. pytest

The inclusion of the apparently private _pytest here is surprising. If this is the intention (as discussed in a few earlier posts), I think it should be mentioned somehow in the text.

In the “How to teach this” should there be any mention of build backends or similar tools? From some earlier discussion it seems we’re envisioning a future in which build backends automatically fill in “obvious” values. But things can be nonobvious in different ways; for instance, a package author may understand that their main project’s name will be included, but still be surprised that a private name they also provide is also included. So maybe something more general like “package authors should be taught that they should check their build backend’s documentation to understand how (or whether) it automatically fills in import-names, and should sanity-check the generated metadata”.

My more substantive reservation is that I feel the PEP should somehow address the alternative of “make no specification and simply encourage indexes to provide such a mapping based on the information they already have”. I guess this would go in rejected ideas although I’m still not sure I think this PEP’s gains are worth it without that index support. For instance, in the rationale section:

Various other attempts have been made to solve this, but they all have to make various trade-offs. For instance, one could download every wheel for every project release and look at what files are provided via the Binary distribution format, but that’s a lot of CPU and bandwidth for something that is static information (although tricks can be used to lessen the data requests such as using HTTP range requests to only read the table of contents of the zip file). This sort of calculation is also currently repeated by everyone independently instead of having the metadata hosted by a central index server like PyPI.

Yes, it’s a lot if every tool or person that wants this has to download them all, but it’s not a lot for PyPI because it already has them all and doesn’t have to download anything. And if PyPI provided that information, it’s unclear to me whether anyone would feel the need for it to be in the metadata. (Or if they did, maybe they’d want something different from this PEP, in order to fill in the gaps in whatever PyPI did.) In other words, the “central index server” provision of a bidirectional project-import name mapping is possible with or without this metadata. Moreover, if PyPI does not use this information to provide such a mapping itself, it will still be a pain (albeit a smaller pain) for everyone to download all the metadata for every package. And although such a mapping might be wrong in various ways, so might the proposed metadata. So it still seems to me like the real missing piece is the actual public provision of a complete mapping, not the individual statements by individual packages about what names they provide.

Later in that same section it does give the example of sdists, which can’t obviously be handled in this manner. As usual I hate sdists and think the solution is to just stop supporting them as an install mechanism :wink: . But, absent that, I still think it would be helpful if the PEP more directly tackled the question of how this metadata in and of itself can reduce pain (i.e., even if PyPI doesn’t do anything with it), and why it is needed if PyPI could provide a similar service without the metadata.

1 Like

I think it would be very beneficial if the PEP included an example of a use case where only the metadata for a single package was needed. I can’t honestly think of one myself, which suggests that maybe the above point is true - the PEP as it stands is only part of a solution, with the actual need being for a mapping from import name to package name, available from an index.

2 Likes

Sure, but we don’t need to specify that. Where’s the specification for how an index should make classifiers browsable/searchable? How invalid is an index that doesn’t have a search feature?

As I said early in the discussion, this is essentially a field containing search keywords. What anyone does with them once they’re available is just as flexible as that, the only thing we need in an interoperability spec is to say where they should be put.

Adding an example of a package index offering a search filtered by import name to the Motivation section ought to cover it.

2 Likes

But if the only use case is to build an import name → project mapping, why not just standardise something that just provides that mapping? All I’m asking for is other use cases that demonstrate that the data needs to be available from an individual wheel, and an installed project (i.e. as part of core metadata).

1 Like

Because it doesn’t need to be standardised as much as it just need to be done. 90% of uses really just need someone to go through PyPI, use some heuristics to determine top-level modules names, patch any outliers, and put a list up on a CDN so that it can be downloaded efficiently by tools that want the mapping.

For some reason, nobody has done that. But the reason isn’t the lack of a specification for it - it’s because it’s a nice thing to have but ultimately not important enough for anyone to have applied funding/effort. Creating a spec saying “you[1] must create this list” isn’t the right way to go about getting the list.

Right now, the PEP is basically distributing the heuristics and exceptions process to individual packages, which reduces the cost to whoever eventually makes the list. Which is as far as it ought to go, IMHO.

Again, I’ll equate it to classifiers, which serve no purpose in an individual wheel or an installed project. Do we require use cases for classifiers other than as search keywords?


  1. For some currently unspecified value of “you”, which presumably could only be the Warehouse maintainers, along with anyone else who maintains an index server implementation. ↩︎

2 Likes

Ah, I see now. Sorry for being slow. For some reason, I hadn’t thought of “searching for a classifier” as needing a classifier->project mapping in the same way as we need an import name->project mapping to use this PEP.

I’d still rather that the “given an import name, get me a list of packages that provide it” service be provided by PyPI[1] rather than having 3rd parties doing bulk queries to maintain an external service. Has anyone asked PyPI whether they would support such an API (either as a Warehouse-specific API or as a standardised index API specification)? But apart from the very nebulous matter of “will the expected use cases result in enough extra load on PyPI to be a concern?” I accept that this is a separate question.


  1. And other indexes ↩︎

1 Like

I’ll take the paragraph out.

It’s because one is from the rationale which comes before the spec where the second part is from. It’s a bit of a chicken-and-egg problem. I’ll try to clarify it a bit in the first instance.

But I also can’t be too specific since I don’t want to restrict this to just wheels and sdists because who knows what is coming in the future.

Why? _pytest is importable when pytest is installed. The PEP says “projects SHOULD list the highest-level/shortest import name that the
project would “own” when installed”, and that includes _pytest.

I don’t think so as the build back-ends in nearly all cases will just copy what’s in project.import-names in pyproject.toml.

I’ll add a sentence addressing that.

But it’s still going to be inaccurate as the PEP pointed out when saying why parsing a RECORD file isn’t enough.

I haven’t floated that hypothetical passed anyone on the PyPI side of things.

I would have expected some backends to offer to automatically calculate it, if left dynamic in project.toml and when it makes sense (pure Python wheels, explicit package/module inclusion). I think that UX doesn’t need to be included (nor prohibited!) in the PEP though.

4 Likes

This is more or less the way I’m viewing it too, which is why I think it’s okay but I’m still unsure how useful it is in and of itself.

The important difference is that many of the classifiers are impossible to derive from the package contents[1], while PyPI already has considerable information about the importable names from each package. It’s true that this information may be inaccurate or incomplete in various ways, but so may the information from this new metadata — and also the new metadata will definitely not be available for all the stuff already on PyPI, whereas it still is possible to make a best-effort computation for that stuff based on the existing data.


  1. e.g., you can’t tell that a package is intended for image processing unless the author says it is ↩︎

1 Like

Does it really have that information in a usable form that doesn’t run into the edge-cases described above[1]? I think the point of this metadata is to simplify that process and make the results more reliably reflect the authors’ intent.


  1. and is a reasonable thing for PyPI to compute on every package ↩︎

1 Like

I suspect some will, but …

It is neither included nor prohibited in the PEP. :wink:

But one is way easier to fix than the other.

That’s normal for Python packaging. :sweat_smile:

And if someone wants to do that then this metadata doesn’t prevent that. If you could view it as the 2 approaches are complimentary or that some automation can supplement the manually-specified metadata.

2 Likes

Regarding private names: the “list may not be exhaustive” wording could (and I think should) be clarified to say that module names starting with an underscore may be considered private implementation details and hence omitted from the metadata.

Regarding platform dependent entries: could we allow the inclusion of environment markers in the names, with the same syntax that we use for dependency declarations? I’m genuinely not sure the complexity would be worth it, though, since the whole question goes away if the platform dependent APIs are nested inside a platform independent parent module instead of being published as top level modules.

1 Like