I think the proposal is good overall. I have some comments and one more substantive reservation which I put at the end.
By keeping the information to the import names a project would own if installed, it makes it clear which project maps directly to what import name once the project is installed.
This sentence seems somewhat tautological to me, or else a restatement of the previous paragraph. It seems to be saying “because the information about is about what names a project will own if installed, it tells us about the names a project will own if installed”. This may be because the meaning of “own” was not clear to me until later in the PEP (see below). I’m not sure there is a need to say this here. Maybe it can be combined with the previous paragraph? Something like:
This PEP proposes extending the packaging Core metadata specifications so that project owners can specify the highest-level import names that a project provides and owns if installed. This allows indexes or other tools to create a clear mapping between project names and import names.
Later:
The names specified in Import-Name
MUST be importable when the project is installed on some platform for the same version of the project (i.e. the metadata MUST be consistent across all sdists and wheels for a project release).
The way I first read this it sounded like a contradiction, because the first part of the sentence seems to use “some” in the sense of “there exists”, but the second part is saying that the information must be consistent across all (i.e., a “for all” rather than “exists”). It took me a few reads to understand that what you mean is that because the metadata must not vary, it may not be able to capture any cross-wheel variations, so only accuracy on “some” platform is required.
I think the parenthetical here is at least as important as the first part. I think it would be clearer more like:
The metadata MUST be consistent across all sdists and wheels for a project release. This means that the metadata in any one artifact may not reflect the names importable when that artifact is installed (since, e.g., some names may not be provided on all platforms). Rather, for each name specified in Import-Name
, there MUST exist some platform on which that name is provided when the project is installed.
Later. . .
If a project is part of a namespace package named ns
and it provides a subpackage called ns.myproj
(i.e. ns.myproj.__init__
exists), then ns.myproj
should be listed in Import-Name
, but NOT ns
alone as that is not “owned” by the project upon installation (i.e. other projects can be installed which also contribute to ns
).
Only here do I understand what is meant by “owned”. It makes sense, but perhaps better to not use that word earlier in the PEP (as I mentioned above) as its meaning is unclear before this explanation. It is only used a couple times earlier on, and I think this is a subtle enough detail that it doesn’t need to be foregrounded at the outset.
In pytest 8.3.5 there would be 3 expected entries:
_pytest
py
pytest
The inclusion of the apparently private _pytest
here is surprising. If this is the intention (as discussed in a few earlier posts), I think it should be mentioned somehow in the text.
In the “How to teach this” should there be any mention of build backends or similar tools? From some earlier discussion it seems we’re envisioning a future in which build backends automatically fill in “obvious” values. But things can be nonobvious in different ways; for instance, a package author may understand that their main project’s name will be included, but still be surprised that a private name they also provide is also included. So maybe something more general like “package authors should be taught that they should check their build backend’s documentation to understand how (or whether) it automatically fills in import-names
, and should sanity-check the generated metadata”.
My more substantive reservation is that I feel the PEP should somehow address the alternative of “make no specification and simply encourage indexes to provide such a mapping based on the information they already have”. I guess this would go in rejected ideas although I’m still not sure I think this PEP’s gains are worth it without that index support. For instance, in the rationale section:
Various other attempts have been made to solve this, but they all have to make various trade-offs. For instance, one could download every wheel for every project release and look at what files are provided via the Binary distribution format, but that’s a lot of CPU and bandwidth for something that is static information (although tricks can be used to lessen the data requests such as using HTTP range requests to only read the table of contents of the zip file). This sort of calculation is also currently repeated by everyone independently instead of having the metadata hosted by a central index server like PyPI.
Yes, it’s a lot if every tool or person that wants this has to download them all, but it’s not a lot for PyPI because it already has them all and doesn’t have to download anything. And if PyPI provided that information, it’s unclear to me whether anyone would feel the need for it to be in the metadata. (Or if they did, maybe they’d want something different from this PEP, in order to fill in the gaps in whatever PyPI did.) In other words, the “central index server” provision of a bidirectional project-import name mapping is possible with or without this metadata. Moreover, if PyPI does not use this information to provide such a mapping itself, it will still be a pain (albeit a smaller pain) for everyone to download all the metadata for every package. And although such a mapping might be wrong in various ways, so might the proposed metadata. So it still seems to me like the real missing piece is the actual public provision of a complete mapping, not the individual statements by individual packages about what names they provide.
Later in that same section it does give the example of sdists, which can’t obviously be handled in this manner. As usual I hate sdists and think the solution is to just stop supporting them as an install mechanism
. But, absent that, I still think it would be helpful if the PEP more directly tackled the question of how this metadata in and of itself can reduce pain (i.e., even if PyPI doesn’t do anything with it), and why it is needed if PyPI could provide a similar service without the metadata.