As threatened promised, here is my next planned PEP (I haven’t bothered making it an official PEP yet in case the idea is considered downright bad, which would save me writing the PoC and translating this to reST).
Abstract
This PEP proposes extending the core metadata specification for Python packaging to include a new, repeatable field named Import-Name
to record the import names that a project owns once installed. This also leads to the introduction of core metadata version 2.5.
Motivation
In Python packaging there is no requirement that a project name match the name that you can import from that project. As such, there is no clean, easy, accurate way to go from import name to project name and vice-versa. This can make it difficult for tools that try to help people in discovering the right project to install when they know the import name or knowing what import names a project will provide once installed.
As an example, a code editor may detect a user has an unsatisfied import in a selected virtual environment. But with no way to reliably gather the import names that various projects provide, a code editor cannot accurately provide a user with a list of potential projects to install to satisfy that import requirement (e.g. it is not obvious that import PIL
very likely implies the user wants the Pillow project). This also applies to when a user vaguely remembers the project name but does not remember the import name and would have their memory jogged when seeing a list of import names a package provides. Finally, tools would be able to notify users what import names will become available once they install a project.
Various other attempts have been made to solve this, but they all have to make various trade-offs. For instance, one could download every wheel for every project release and look at what files are provided, but that’s a lot of CPU and bandwidth for something that is static information (although tricks can be used to lessen the data requests such as using HTTP range requests to only read the table of contents of the zip file). This isort of calculation is also currently repeated by everyone independently instead of having the metadata hosted by a central index server like PyPI.
Rationale
This PEP proposes extending the packaging core metadata so that build back-ends can specify the highest-level import names that a project provides and owns if installed. By having the back-ends provide the information it increases the chances it will be specified and makes adoption easier. It also allows for quick pick-up by peoples’ toolchains.
By keeping the information to import names a project would own (i.e. not implicit namespace packages but modules, regular packages, and submodules), it makes it clear what project maps directly to what import name is provided exclusively by the project once installed.
By keeping it to the highest-level name that’s owned, it keeps the data small and allows for inferring implicit namespace packages that a project contributes to. It also helps let the build back-end be accurate with the data for import names when import semantics are the default ones (i.e. the import-related attributes in the sys
module have not been manipulated). This should minimize the need for users to have to provide this information in order for it to be accurate as regular packages and modules manipulating import details typically happen for import names below them. This also allows for inferring the implicit namespace packages the project contributes to. Admittedly, this does mean that if someone accidentally releases a single implicit namespace package that only contains submodules then all of submodules would be individually listed.
Because this PEP introduces a new field to the core metadata, it bumps the latest core metadata version to 2.5.
Specification
The Import-Name
field is a “multiple uses” field. Each entry of Import-Name
represents an importable name that the project provides. The names provided MUST be importable via some artifact the project provides for that version, i.e. the metadata MUST be consistent across all sdists and wheels for a project release to avoid having to read every file to find variances. It also avoids having to declare this field as dynamic in an sdist due to the import names varying across wheels.
The names provided MUST be one of the following:
- Highest-level, regular packages
- Top-level modules
- The submodules and regular packages within implicit namespace packages
provided by the project. This makes the vast majority of projects only needing a single Import-Name
entry which represents the top-level, regular package the project provides. But it also allows for implicit namespace packages to be able to differentiate among themselves (e.g., it avoids having all projects contributing to the azure
namespace via an implicit namespace package all having azure
as their entry for Import-Name
but instead a more accurate entry like azure.mgmt.search
)
The names provided in Import-Name
MUST NOT be filtered based on what is considered private to the project, i.e. it must be exhaustive for names that an import
statement would succeed in using. This is because even “private” names can be imported by anyone and can “take up space” in the namespace of the environment.
Build back-ends SHOULD set Import-Name
on behalf of users when they can infer the import names a project would provide.
Examples
In httpx 0.28.1 there would be only a single entry for the httpx
package as it’s a regular package and there are no other regular packages or modules at the top of the project.
In pytest 8.3.5 there would be 3 entries:
_pytest
(a top-level, regular package)py
(top-level module)pytest
(a top-level, regular package)
In azure-mgmt-search 9.1.0, there would be a single entry for azure.mgmt.search
as azure
and azure.mgmt
are implicit namespace packages.
Backwards Compatibility
As this is a new field for the core metadata and a new core metadata version, there should be no backwards compatibility concerns.
Security Implications
The information provided by build back-ends may not be accurate (either accidentally or on purpose), and so tools should NOT make security-related decisions based on the information provided in an Import-Name
entry.
How to Teach This
Project authors should be taught that build back-ends can now record what namespaces their project provides. They should be told that if their project has a non-obvious namespace from the file structure of the project that they should specify the appropriate information manually. They should have it explained to them that they should use the shortest name possible that appropriately explains what the project provides (i.e. what the specification requires to be recorded).
Users of projects don’t necessarily need to know about this new metadata. While they may be exposed to it via tooling, the details of where that data came from isn’t critical. It’s possible they may come across it if PyPI exposed it (e.g., listed the values from Import-Name
and marked whether the file structure backed up the claims the metadata makes), but that still wouldn’t require users to know the technical details of this PEP. Users may need to learn that if their package leads to all the submodules being listed that they may have wanted a regular package instead.
Reference Implementation
XXX
Rejected Ideas
Re-purpose the Provides
field
Introduced in metadata version 1.1 and deprecated in 1.2, the Provides
field was meant to provide similar information, except for all names provided by a project instead of the distinguishing namespaces as this PEP proposes. Based on that difference and the fact that Provides
is deprecated and thus could be ignored by preexisting code, the decision was made to go with a new field.
Name the field Namespace
While the term “namespace” name is technically accurate from an import perspective, it could be confused with implicit namespace packages.
Open Issues
N/A
Acknowledgments
Thanks to Josh Cannon (no relation) for reviewing drafts of this PEP and providing feedback. Also thanks to everyone who participated in a previous discussion on this topic.
Copyright
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.