Non-sequential acceptance and implementation of new Core Metadata features and changes

I’d been thinking about the issue discussed there, namely adoption of the Dynamic field being a blocker for potential implementations of PEP 639, back when originally rewriting PEP 639, and I think it might point to a broader one (maybe better opened as another thread?), especially given the current state of the packaging ecosystem viz. the lag between specification and implementation, the variances between different tools, and the framing of PEP as proposals for specific discrete changes.

Specifically, because the metadata version is currently incremented with each PEP, the specifications and the frontend and backend implementations are both numerous and decoupled, and PEPs vary in complexity and discreteness, the order in which core metadata features are implemented in any given backend, frontend or other tool may not align with the order the PEPs happen to be accepted and PRs to core metadata are made, but there is no way to indicate support of a “later” feature without an earlier one. Furthermore, the proposed new metadata version in each PEP may switch around depending on the order in which PEPs are accepted, leading to potential confusion and inaccurate assumptions.

For example, some changes which are incremental tightening and clarifications of previous specifications, such as PEP 685, which tools would likely want to adopt quickly, and many tools may already implement in practice. Other PEPs add a new field (PEP 643), that intersects with but does not directly change the other fields. Still other PEPs both add new field(s) and deprecate existing ones (PEP 639); this case is particularly problematic when tools implement a draft version of the field that may change its semantics in the final implementation (as Wheel and Setuptools now have with License-Files), so the version is the only way to reliably indicate whether the field has the standardized semantics specified by that metadata version.

On one hand, besides being the status quo, the current version scheme ensures that metadata producers and consumers continue to evolve sequentially, following versions can depend upon previous ones, and the implementation of new standards is not unnecessarily delayed or even declined. On the other, it requires tools to implement features in the order they happen to be approved and added to the spec, even if a later version is a small but important tightening/loosening of syntax/semantics, while an earlier one is a fairly substantial set of additions and deprecation.

I’m not sure if this is enough of a problem in practice that its worth considering things like adding or switching to a mechanism (bitfield, feature tags, etc) that would allow signalling support for individual features/changes, possibly with periodic rollup version updates (sort of like IEEE 802.11), but I think it might at least be worth discussing the issue and potentially how to address/mitigate this. If so, I can split this off into a new topic for that.

This seems like a separate issue. While it would certainly be potentially useful for the case of “Dynamic”, it doesn’t seem like it’s something we’d be likely to do right now just to address the question in this thread, especially given that PEP 643 is agreed, and potentially implemented by backends (I don’t actually know if any have as yet, but they could well have done so).

Can I suggest that you split this discussion off into a separate thread, both so that it doesn’t get missed, and also so that I can keep this thread focused on “does anyone object if I clarify that using metadata 2.2 or greater means you need to specify dynamic?”

1 Like

On reflection, I don’t personally think this is a good idea. Ultimately, we do not want to have multiple metadata standards. The goal is always that tools produce and consume the latest standard, with all of the features implemented and present.

Any problems caused by tools wanting to adopt features from later standards before earlier ones are purely transitional, and in practice indicate that we’re changing standards too fast, and not allowing tools time to keep up. So rather than adding complexity to the standards to allow tools to “opt in” to features bit by bit, we should probably be helping projects implement the standards as they are.

While I understand that this has the potential to cause problems for people proposing new metadata fields, it should mostly be minor, as we rarely, if ever, add mandatory fields. The case of “Dynamic” is unusual, because version 2.2 assigned a meaning to not having any fields marked as “Dynamic”, which is not 100% backward compatible. I made a mistake there, as I should have noted the issue in the “Backward Compatibility” section of the PEP. For future metadata PEPs, I would suggest they learn from this and make sure that the backward compatibility section addresses the question “If a project updates the metadata version, but makes no other change, is the meaning of the metadata the same? And if not, then is that an issue?”

The backwards compatibility section of PEP 639 does, naturally, address these points. In that case, the only specified instance where the meaning changes, is that a warning is recommended to be issued upon use of the license field and license classifiers (use of them is only disallowed in metadata that contains the new license-expression field).

There is one practical instance, though, which perhaps spawns a related question: wheel and setuptools have already implemented added a License-File field based on an earlier draft of PEP 639, but the current version tweaks the semantics slightly to avoid several problems brought up on the thread and original setuptools and wheel issues (namely, not flattening the original license relative paths to avoid name clashes between licenses, and storing them in a subdir of .dist-info to avoid clutter, name clashes with other current and future files there and make it unambiguous what is a license).

This means that for metadata produced by tools that have implemented an earlier draft version of this PEP, adopting the metadata version in which the standard semantics are adopted does in effect change the meaning of the field, which the PEP does account for by explicitly requiring that both the field be present and the metadata version be the version specified in the PEP or later for anything related to it in the specification to apply, otherwise the behavior is unspecified.

Is this a good approach? Also, has this happened before? More broadly, what is our stance on tools adding additional fields not (yet) defined in the specification? I don’t see anywhere in Core Metadata that specifically addresses this.

One other related point: the Metadata-Version section of the core metadata spec states:

For broader compatibility, build tools MAY choose to produce distribution metadata using the lowest metadata version that includes all of the needed fields.

This means if (say) PEP 639 was accepted before PEP 685 (which we can ensure doesn’t happen, but it could have), and the former became “v2.3” and the latter “v2.4”, that would mean that if the user didn’t specify any License-Expression or License-File fields, tools would be allowed (if not implicitly encouraged) to use a lower metadata version that doesn’t include the standardized extras format and normalization in “v2.4”. Is that something we really want to explicitly endorse? Not specifying it doesn’t stop tools from doing it, but at least it doesn’t implicitly encourage them to.

IMO, if tools extend the spec in ways that aren’t (yet) standardised, that’s their problem. We shouldn’t feel any more constrained by that happening than we do about any other invalid metadata. Which is to say that we should consider the disruption the existence of that metadata “in the wild” might cause, but we can legitimately choose to simply not handle it.

I’m not too bothered, personally. I would prefer to drop the quoted statement, and let the whole thing come under the “be strict in what you produce and lenient in what you consume” principle, so that metadata producers should always produce the latest version that they can, and consumers should be capable of dealing with older versions. I don’t think it matters much in reality, though.

Right, which is why PEP 639 takes a precaution to ensure it won’t cause an issue, but doesn’t actually mention or directly address it as something that otherwise has direct bearing on the PEP.

My question was more as to whether we should actually clarify the stance on adding non-standard metadata fields in the the specification, since currently it addresses removing/not including fields (no for mandatory, yes for optional), and specifies the syntax and semantics of the defined fields, but doesn’t provide any clarify on whether fields not specified are allowed (MAY), advised against (SHOULD NOT) or prohibited (MUST NOT). Whatever the guidance, IMO it would be better to make it explicit than implicit, so that tools have a level playing field and can take into account what the guidance from the PyPA’s end.

That was my thinking as well. Is is worth me submitting a PR to drop the quoted section? Or would it not be worth it, in your opinion?

Based on my experience analyzing existing metadata, I don’t think it’s worth it. Tools that want to deal with what’s on PyPI currently need to be prepared to handle a lot of junk regardless. Tools that want to be strict can be, and will end up rejecting a bunch of existing stuff.

IMO, it’s implied by the existence of the spec that you can’t just add random extra stuff. It’s not like we’re writing legal documents here - PyPA specs simply aren’t written to be unambiguous in the face of deliberately contrary interpretation[1].

Personally, I think it’s wasted effort. But so far, this discussion has just been you and me. See what other people think.

Edit: Your comments on the PEP 685 thread convince me that this would be worthwhile. I doubt many tool implementers actually read this bit of the PEP, which is why I originally thought it wasn’t worth worrying about, but if it’s causing confusion in standards discussion, that seems like a good reason to remove it.

  1. Maybe you could argue that they should be, but to achieve that would involve a lot of work on existing specs, and I personally think it’s effort that could be better spent elsewhere. ↩︎

1 Like

To be clear, PEP 643 is not a blocker for PEP 639. If a hypothetical PEP were to significantly hamper adoption of future additions to the spec, I would recommend to backport simpler changes to the immediately preceding (minor) version, so that e.g. PEP 639 would be introduced in both 2.2.1 and 2.3 simultaneously, rather than uproot the versioning mechanism that we’ve got in place and replace it with something new entirely. I just don’t see the scenario you are describing as something that’s likely to happen or that it would happen often enough that it should warrant a paradigm shift.

FYI, following the discussion on the PEP 685 thread and @pf_moore 's edits here, I’ve opened pypa/ to update this. Should we open (yet) another dedicated thread to ensure this has appropriate visibility, or should we just wait for further discussion here?

I would like to see more consensus from interested parties (in particular, maintainers of some of the tools that create metadata, who are the people affected) that this is OK with them. I don’t care how we collect that consensus - I’ll leave that to you. I’m not looking for a vote, or everyone to be asked to respond, but I do want the change to be based on more than just me having said it seems reasonable…

Opened as