PEP 643: Metadata for Package Source Distributions

Here’s the official version of PEP 643: Metadata for Package Source Distributions. Discussion and feedback is welcome :slightly_smiling_face:

PEP: 643
Title: Metadata for Package Source Distributions
Author: Paul Moore <p.f.moore@gmail.com>
BDFL-Delegate: Paul Ganssle <paul@ganssle.io>
Discussions-To: https://discuss.python.org/t/pep-643-metadata-for-package-source-distributions/5577
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 24-Oct-2020
Post-History: 24-Oct-2020


Abstract
========

Python package metadata is stored in the distribution file in a standard
format, defined in the `Core Metadata Specification`_. However, for
source distributions, while the format of the data is defined, there has
traditionally been a lot of inconsistency in what data is recorded in
the sdist. See `here
<https://discuss.python.org/t/why-isnt-source-distribution-metadata-trustworthy-can-we-make-it-so/2620>`_
for a discussion of this issue.

As a result, metadata consumers are unable to rely on the data available
from source distributions, and need to use the (costly) :pep:`517` build
mechanisms to extract medatata.

This PEP defines a standard that allows build backends to reliably store
package metadata in the source distribution, while still retaining the
necessary flexibility to handle metadata fields that have to be calculated
at build time. It further defines a set of metadata values that must be
fixed when the sdist is created, ensuring that consumers have a minimum
"core" of metadata they can be sure is available.


Motivation
==========

There are a number of issues with the way that metadata is currently
stored in source distributions:

* The details of how to store metadata, while standardised, are not
  easy to find.
* The specification requires an old metadata version, and has not been
  updated in line with changes to the core metadata spec.
* There is no way in the spec to distinguish between "this field has been
  omitted because its value will not be known until build time" and "this
  field does not have a value".
* The core metadata specification allows most fields to be optional,
  meaning that the previous issue affects nearly every metadata field.

This PEP proposes an update to the metadata specification to allow
recording of fields which are expected to be "filled in later", and
updates the sdist specification to clarify that backends should record
sdist metadata using that version of the spec (or later). It restricts
which fields can be "filled in later", so that a core set of metadata is
available, and reliable, when read from a sdist.


Rationale
=========

:pep:`621` proposes a mechanism for users to specify metadata in
``pyproject.toml``. As part of that mechanism, a way was needed to say
that a particular field is defined dynamically by the backend. During
discussions on the PEP, it became clear that the same type of mechanism
would address the issue of distinguishing between "not known yet" and
"definitely has no value" in sdists. This PEP defines the ``Dynamic``
metadata field by analogy with the ``dynamic`` field in :pep:`621`.


Specification
=============

This PEP defines the relationship between metadata values specified in
a sdist, and the corresponding values in wheels built from that sdist.
It requires build backends to clearly mark any fields which will *not*
simply be copied unchanged from the sdist to the wheel.

A new field, ``Dynamic``, will be added to the `Core Metadata Specification`_.
This field will be multiple use, and will be allowed to contain the name
of another core metadata field. The ``Dynamic`` metadata item is only
allowed in source distribution metadata.

If a field is marked as ``Dynamic``, there is no restriction placed on
its value in a wheel built from the sdist. A field which is marked as
``Dynamic``, MUST NOT have an explicit value in the sdist.

If a field is *not* marked as ``Dynamic``, then the value of the field
in any wheel built from the sdist MUST match the value in the sdist.
If the field is not in the sdist, and not marked as ``Dynamic``, then it
MUST NOT be present in the wheel.

Build backends MUST ensure that these rules are followed, and MUST
report an error if they are unable to do so.

The following fields MAY NOT be marked as ``Dynamic``:

* ``Name``
* ``Version``
* ``Summary``
* ``Description``
* ``Requires-Python``
* ``License``
* ``Author``
* ``Author-email``
* ``Maintainer``
* ``Maintainer-email``
* ``Keywords``
* ``Classifier``
* ``Project-URL``

As it adds a new metadata field, this PEP updates the core metadata
format to version 2.2.

Source distributions MUST use the latest version of the core metadata
specification (which will be version 2.2 or later).

The ``Requires-Python`` field for a project may vary by target platform,
but is not allowed to be declared as ``Dynamic`` in the sdist metadata.
To handle this situation, build backends MUST use environment markers on
the ``Requires-Python`` field to allow that metadata to remain common
across the sdist and all wheel archives, rather than generating platform
dependent ``Requires-Python`` metadata as part of the wheel build
process.  Build backends SHOULD also use this approach for other
metadata fields that may vary by target platform (e.g. dependency
declarations).

Backwards Compatibility
=======================

As this proposal increments the core metadata version, it is compatible
with existing sdists, which will use an older metadata version. Tools
can determine whether a sdist conforms to this PEP by checking the
metadata version.


Security Implications
=====================

As this specification is purely for the storage of data that is intended
to be publicly available, there are no security implications.


How to Teach This
=================

This is a data storage format for project metadata, and so will not
typically be visible to end users. There is therefore no need to teach
users how to use the format. Developers wanting to reference the
metadata will be able to find the details in the `PyPA Specifications`_.


Rejected Ideas
==============

1. Rather than marking fields as ``Dynamic``, fields should be assumed
   to be dynamic unless explicitly marked as ``Static``.

   This is logically equivalent to the current proposal, but it implies
   that fields being dynamic is the norm. Packaging tools can be much
   more efficient in the presence of metadata that is known to be static,
   so the PEP chooses to make dynamic fields the exception, and require
   backends to "opt in" to making a field dynamic.

2. Rather than having a ``Dynamic`` field, add a special value that
   indicates that a field is "not yet defined".

   Again, this is logically equivalent to the current proposal. It makes
   "being dynamic" an explicit choice, but requires a special value.  As
   some fields can contain arbitrary text, choosing a such a value is
   somewhat awkward (although likely not a problem in practice). There
   does not seem to be enough benefit to this approach to make it worth
   using instead of the proposed mechanism.

Open Issues
===========

1. Should we allow ``Dynamic`` to be used in wheels and/or installed
   distributions?

   ``Dynamic`` has no obvious meaning in either of these situations, and
   the PEP therefore disallows it. However, backends may find it useful
   to simply copy the field across, and it may have some usefulness in
   recording "other wheels built from the source this came from may have
   different values". However, the value seems marginal, and the added
   complexity involved in explaining the feature does not seem worth it.

   Allowing this could be done in a follow-up proposal if there turned
   out to be sufficient benefit.

2. If a field is marked as ``Dynamic``, but has a value in the sdist
   metadata, how should that be interpreted?

   The simplest answer is to just not allow dynamic fields to have a
   value in the sdist at all. For now, this is what the PEP proposes.
   But is there benefit in having a value which tools can take as a
   "hint" for what the value in the wheel will be?

3. Should this PEP change the canonical source for the sdist
   specification to the `PyPA Specifications`_ document?

   It would be beneficial to collect all of the details of the sdist
   format in one place. However, distribution formats are not currently
   collected there, and making the move would extend the impact of this
   PEP significantly.


References
==========

.. _Core Metadata Specification: https://packaging.python.org/specifications/core-metadata/
.. _PyPA Specifications: https://packaging.python.org/specifications/

Copyright
=========

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.
3 Likes

Would it be better to flip the list of fields that cannot dynamic, into a list of fields that explicitly are allowed? The current method is technically fine, since a new field addition means a new metadata version, and that new version should speficy whether the new field can be dynamic. But using an allow-list would make related specifications easier to read, and future metadata specification writers won’t all need to be burdened to remember to specify dynamic-ness to avoid implementers do things wrong.

3 Likes

I’m fine with doing this, but it seems like a minor improvement at best. Can you give an example of the sort of related specification you mean? I’m having a hard time thinking of anything.

The net effect of this change would be that by default, all new metadata fields would be required to be fixed at sdist build time. I quite like that, to be honest.

This is the reason I proposed it. By related specifications I mean things like future PEPs to propose Core Metadata versions, and the Core Metadata page in PyPA Specifications. Those files don’t always write out all the rules, but usually build on an existing Core Metadata PEP, and I feel this would make it easier for readers to keep track what fields are guaranteed to be static (all are unless explicitly said otherwise).

1 Like

LGTM! In terms of the open issues:

I say “no” for now, and if it needs to be loosened then that’s something to do later; usual “it’s easier to give than take away” logic for standards.

Error. See above about “taking away”. :wink:

The comment says:

distribution formats are not currently
  collected there, and making the move would extend the impact of this
  PEP significantly

, but I don’t understand that logic. Why are PEPs less impactful than that packaging.python.org? I thought PEPs were for guiding the discussion, but they were all expected to end up on packaging.python.org as a spec (or at worst a link from the specs page to the PEP if there was no deviation yet from the PEP)?

Mostly what I meant by that was that there’s no section for “distribution formats”, and the wheel spec isn’t there. By “impactful” I meant “I’d have to do a bigger rewrite of that page in my PR” - so I was basically just being lazy :slightly_smiling_face:

You’re right though, this should be hosted there. I’ll do that.

1 Like

Responses have so far been mostly positive, and the discussion has been fairly quiet. I’m going to take that as implying that people are generally OK with this, so I’ll do a tidy-up pass on the PEP this weekend, with the intention of maybe giving it another week and then submitting it for a decision.

If anyone feels this is too quick, or there’s more to discuss, please speak up.

The biggest open question is this - @xafer pointed out that Requires-Python doesn’t support markers, so not allowing it to be Dynamic limits how projects can use it. I don’t have a particularly good answer here - I’d rather not drop it from the “not allowed to be dynamic” list, but I don’t want to make it impossible for projects to write their metadata.

I think I’m going to have to err on the side of caution and remove it, but I’d be very happy if someone wanted to persuade me to leave it :slightly_smiling_face:

I’d prefer to err on the side of caution and keep Required-Python in the list :slightly_smiling_face: The “it’s easier to give than take away” logic applies here as well; we can always have a Version 2.5 to remove it, if real-world examples proove it is beneficial.

2 Likes

Ditto for me. As I used for justification in PEP 621 for sdist stuff, anything that is expected to be exposed on PyPI should probably be static in an sdist.

3 Likes

https://packaging.python.org/specifications/distribution-formats/ :slight_smile:

We could make that link more obvious on https://packaging.python.org/specifications/ though (I knew it existed because I either wrote it myself, or reviewed the PR adding it, and even I took a second to spot in the list of other metadata subsections)

Edit: folks being unaware of that page means it is also rather out of date when it comes to the way it describes the state of PEP 517/518 support in tools :frowning:

After Xavier mentioned the problem with Requires-Python not supporting environment markers in the other thread, I remembered that there’s another constraint on that field: it’s exposed in the package server repository API (as defined in https://www.python.org/dev/peps/pep-0503/ ).

A repository MAY include a data-requires-python attribute on a file link. This exposes the Requires-Python metadata field, specified in PEP 345, for the corresponding release. Where this is present, installer tools SHOULD ignore the download when installing to a Python version that doesn’t satisfy the requirement.

So while in theory Requires-Python should support varying by platform, in practice I don’t think declaring that is going to work right at this point in time anyway, so we don’t lose anything by having PEP 643 prohibit it.

For wheel compatibility declarations, combining a cross-platform Requires-Python from the sdist with the more specific compatibility tags in the wheel filename should provide sufficient expressivity in the near term, rather than having Requires-Python itself change in the wheel metadata.

If/when Requires-Python gains environment marker support, than the associated PEP will need to update the simple repository API specification as well (e.g. adding a new May-Require-Python metadata field that allows environment markers, and exposing a data-may-require-python attribute in the simple API for the new field, so older clients that won’t understand the environment markers won’t even look at them)

I also like @uranusjr’s suggestion of flipping the “The following fields MAY NOT be marked as Dynamic:” list to instead be "“The following fields MAY be marked as Dynamic:”

In addition to being a better default for new metadata field additions, I think that will also make the spec marginally easier to follow for tool implementers emitting the new metadata, as the definition of “Dynamic” would contain the full list of allowed entries, rather than the allowed list having to be inferred from the full set of keys and the disallowed list.

For informational purposes, the PEP could still include the list of “Must be static as of metadata 2.2” fields, but the format spec would only contain the “allowed to be Dynamic” list (probably in the definition of the Dynamic field itself)

Oh! I thought I’d seen something, but couldn’t find it when I looked again. Yes, not having it as a subsection under “Metadata” would help :slight_smile:

I can move it when I write the docs PR, no worries.

One thought I had, specifically about setuptools, is that setuptools itself doesn’t build wheels, that’s handled by the bdist_wheel command supplied by the wheel package.

I would expect bdist_wheel to need to be aware of this change, so that it can either use the sdist metadata directly, or report an error if the current process of getting metadata from setup() results in a value which doesn’t match the sdist metadata (the PEP mandates that backends MUST do this).

/cc @agronholm as the wheel package maintainer for awareness.

OK, it seems like most people prefer this, so I’ll go with it. On review of the fields, I’m adding Description-Content-Type, Home-page and Download-URL to the "may not be Dynamic" list. The first one is closely linked to the Description field, and the latter two are variants of Project-URL, so I assume these additions aren’t controversial.

The way PEP 503 exposes Requires-Python is on a per-file basis, though. So it’s perfectly possible for a project to publish wheels and a sdist, all with different data-requires-python values.

Having said this, nobody seems to think that this is a problem, and I scanned the files on PyPI and there aren’t currently any projects where Requires-Python varies between files uploaded for a single version. So it’s not an issue right now, and if anyone comes up with a reason why their project needs that flexibility, we can look at updating the spec at that point.

2 Likes

I’ve updated PEP 643 with the latest round of comments. The updated version is published at that link. I’ve also created a PR for the packaging user guide to make the proposed changes there.

If there’s no further feedback, I’ll submit this to @pganssle for a decision in a couple of days.

Should this change be included/referenced in https://www.python.org/dev/peps/pep-0639/? Or will we produce a 2.2 & 2.3 update back-to-back?

PEP 639 has been ongoing for some time. I’m working on the basis that metadata version numbers are “first come, first served”. Whichever proposal gets accepted first gets 2.2, and the other will simply be updated to get 2.3.

/cc @pombredanne for info.

There’s no process (and I see no need for one) to try to “combine” PEPs that update the metadata version.

I wonder how version numbering would work going forward. Source distributions are fundamentally different from wheels, and although both use the Core Metadata format, there are fields in either that don’t apply for another, e.g. Tag for wheels, and Dynamic for sdist. It would obviously be very confusing if sdist and wheel use different metadata version schemes (and the above discussion is implying that won’t be the case), but how would PEPs updating the metadata format going forward keep track of what field can be used in what distribution format?


Edit: I misremembered—Tag is in WHEEL, not METADATA. Maybe this is an indication Dynamic shouldn’t be a part of Core Metadata, and we should create a new file for that…?

My 2c from an implementation standpoint: it would be a lot simpler if everything was in PKG-INFO.

In theory, you have a fair point, but I’d argue that practicality beats purity - we’ve had so many false starts trying to standardise sdists that I do not want to extend the scope of this PEP to include adding a new file, with the inevitable bikeshedding on name, etc. Unlike wheels, we don’t have control over the content of sdists (we have no standardised metadata directory, etc), so any filename we pick will risk clashing with a project file.

IMO, standardising the layout of a sdist (including a “metadata” directory and/or a SDIST file analogous to the WHEEL file in wheels) would be a separate PEP. Whoever writes that PEP can handle the migration details.