On reflection, I think that the issue here around unpacked sdists is somewhat orthogonal to the intent behind Metadata 2.2. As far as the spec is concerned, the key is that if a tool is given a sdist, and it builds a wheel from it, then the resulting wheel will contain metadata that matches what’s in the PKG-INFO
of the sdist. The spec has no interest in what happens if you unpack a sdist, modify it, and then build - it’s an accident of implementation that the sdist is unpacked by one tool (the frontend) and built by another (the backend) and it’s only that separation that even allows patching.
So the behaviour you describe for flit is perfectly fine - and in practice, you don’t need to check PKG-INFO
because it simply reflects the fact that you cannot get different metadata without changing the source code. As the only fields that are allowed by flit to be dynamic in pyproject.toml
are version and description, and those are picked up from the source code, everything can be static in PKG-INFO
without needing a check (if the check failed, it would be a bug in flit).
For other backends, this may not be true, as they may allow more dynamic generation of metadata. But there’s a statement in PEP 643 which is relevant here:
Backends MUST NOT mark a field as Dynamic
if they can determine that it was generated from data that will not change at build time.
Data that is read from pyproject.toml
(i.e., not in the dynamic
list in that file) must therefore be static in the sdist metadata, and must match the pyproject.toml
value. Any discrepancy between a non-dynamic value in pyproject.toml
and a value in PKG-INFO
must therefore indicate that we are not building from a sdist.
Where that leaves me, in how I think of this, is that if we ignore patching for a moment, backends can reliably use either pyproject.toml
or PKG-INFO
to get values for metadata that is marked as “static” in the PKG-INFO
file, as both approaches are guaranteed, by design, to give the same result.
When we consider patching, we are not “building from a sdist” in the PEP 643 sense, and the rules from PEP 621 therefore apply, which say
Data specified using this PEP is considered canonical. Tools CANNOT remove, add or change data that has been statically specified. Only when a field is marked as dynamic
may a tool provide a “new” value.
So pyproject.toml
takes precedence.
So for static data (see later for dynamic) backends can always safely use pyproject.toml
as the canonical source. They can only use PKG-INFO
safely if they know the sdist hasn’t been patched. The most reliable way of checking for patching is to ensure that the values in pyproject.toml
and PKG-INFO
match.
(As an implementation note, I’ll comment that reading static data from pyproject.toml
is likely to be no slower, and possibly marginally faster, than reading it from PKG-INFO
just because TOML is an easier format to parse than RFC822, and there’s no calculation to do for static data).
Dynamic data is different, though. If a field is marked as dynamic in pyproject.toml
and dynamic in PKG-INFO
, it has to be recomputed at build time. It’s hard to see how anything else is even possible. The complicated case is when the field is marked as dynamic in pyproject.toml
but static in PKG-INFO
. In that case, PKG-INFO is acting as a “frozen” value holding the result of the dynamic calculation done at sdist build time, and Metadata 2.2 explicitly says that wheels must be built using that frozen value.
What that means is:
- If a field is marked as “dynamic” in
pyproject.toml
but “static” in PKG-INFO
…
- And the build backend cannot guarantee that assuming none of the source code has changed recalculating will give the same value
Then, and only then, is the backend required to give PKG-INFO
precedence.
The only case I can see when this might happen is if a backend is calculating the version from VCS metadata like tags, as that data isn’t part of the source code. And that’s a fairly well-known case and tools seem to already have solutions in place (such as environment variables) to replace the VCS data if it’s not available.
The above is my analysis based on re-reading PEPs 643 and 621, and based on my intent as PEP 643 author. For now, I’m offering it simply as a personal interpretation, but I’m confident enough in it that I would be comfortable converting it into a formal pronouncement on the intended behaviour if people wanted me to (assuming, of course, that no-one was able to demonstrate a flaw in my reasoning to me).
I’m sorry, @ofek, but this means that while hatchling’s new behaviour isn’t in violation of Metadata 2.2, it is in violation of PEP 621 (something that Eli Schwartz mentioned on the issue that triggered this discussion, but which had never been brought up here, and which I hadn’t considered the implications of until now).
In terms of patching, it means that patching static metadata in pyproject.toml
is allowed, and the patched data should be respected. But when patching dynamic data, patchers must take care to ensure that there isn’t a static value in PKG-INFO
that would override the patching.
If anyone has any issues with the above analysis, please flag them - I’m not trying to stop the debate by posting this, just to collect my thoughts on where the discussion so far has led us.