One of the longer-term goals in the packaging ecosystem has been a move towards statically defined package metadata, with the ultimate intention that tools can read package metadata without any need to execute Python code. With the adoption of PEP 621, for defining metadata in pyproject.toml
, and PEP 643, for allowing projects to record static metadata in sdists, we are now in a position where most projects can reasonably expect to define metadata statically.
The next big step will be to ensure that all artifacts for a given version of a project have consistent metadata. Once we can do this, it will be possible to simplify processes around resolving sets of requirements quite significantly. In fact, even though it is not currently guaranteed, some tools like Poetry and PDM that produce lock files already assume consistent metadata, to make the problem tractable, and have encountered very few issues as a result.
Why bother?
Being able to assume that all files associated with a particular version of a package have the same metadata will simplify a lot of processes - resolution algorithms, generation of lockfiles, package analysis, etc. Many of these work right now, but they either have to do significant extra work to deal with the possibility of inconsistent metadata, or they simply fail if they find their assumption of consistency is invalid. The result is extra maintainer work, and unnecessarily fragile tools.
What’s the problem?
The reason we can’t simply declare that metadata must be consistent for a given project version is that in order to make a useful standard, we need to address the various edge cases that might come up. So the point of this post is to give people a chance to publicly describe possible situations where a rule that “all artifacts for a given version of a project must have the same metadata” would cause issues.
At the moment, there’s no plan or timescale for implementing a rule like this. The point of this post is simply to collect information to inform such a plan - it’s notoriously hard in the packaging ecosystem to find out how people are pushing the limits of “common practice”, except by implementing something and seeing what breaks. If we can get some discussion of this topic, my hope is that we can spot the issues in advance.
I’ll start with some cases that have come up recently.
Enforcing consistency
The first problem is probably the most fundamental. How do we even enforce such a consistency rule? With sdist metadata, we already have the means to state that every wheel built from a given sdist will have the same metadata (by marking every field as static). To make it universal would mean deprecating, and ultimately removing, the ability to have fields in a sdist marked as dynamic. Is this sufficient? Do we need a further rule that all sdists for a given project version must have the same metadata? Is it even meaningful to talk about multiple sdists for a project version?
Tools like pip already assume that any two artifacts (wheels or sdists) with the same filename are functionally identical, for practical reasons. Maybe we should just make that assumption official?
Visibility of files
Consistency only matters in the context of a tool using the artifacts. So if I edit the metadata of a wheel, but never publish it, and never use it, my actions have no impact. What this means is that in practice, a standard for consistent metadata only applies to sets of files presented to a tool for consideration.
How do we state such a constraint without making it the user’s responsibility to check every file? It doesn’t seem unreasonable for a user to expect a package index like PyPI to only serve conforming packages, so do we need to make it a requirement for indexes to enforce consistency? But what about private directories (accessed via options like pip’s --find-links
)? How do we make it reasonable for maintainers of such directories to ensure the rules are followed?
Installed packages
When a package is installed, the metadata from the wheel is stored in the environment’s site-packages, as per the installed package metadata spec. This metadata needs to be consistent with other sources of metadata for that version of the project, for exactly the same reasons that sdist and wheel metadata need to be consistent.
Patching sdists
Linux distributors routinely patch sdists and build their system packages from the resulting wheels. This patching will, by design, violate the consistency rules we’re discussing. How do we handle this?
Specifically, an installed package must also have consistent metadata, and if that installed package is a distro-packaged version of a Python project, the distro’s patches could violate the consistent metadata requirement.
Is a statement of intent sufficient?
Given all of the above, and any other cases that may come up in the subsequent discussion, is it even worth trying to come up with an enforceable standard? As I already mentioned, tools like Poetry and PDM are currently managing fine assuming consistent metadata, and pip has made similar assumptions for years around interchangeability of files with the same name.
Maybe rather than trying to write a strict standard, all we really need is a formal statement that such assumptions are allowed and expected, and situations where they do not apply will no longer be required to be supported by the Python packaging ecosystem.
I’d be interested in hearing people’s views on this subject, particularly people who are working in areas where metedata consistency is already an issue, such as backend developers, maintainers of tools that produce lockfiles, Linux distro packagers, and maintainers of projects who cannot reasonably publish 100% consistent metadata.