Clarification regarding the `authors` / `maintainers` field in `pyproject.toml`


I’ve noticed what I think is an inconsistency between the specification for the authors / maintainers fields in pyproject.toml and the Author-Email / Maintainer-Email specification in the core metadata Core metadata specifications - Python Packaging User Guide

The latter says that the field needs to be a valid RFC822 From header, while the former says that, if both the name and email fields are provided, they should be combined as {name} <{email}>. However, the latter is not necessarily a valid RFC822 From header value. Should the pyproject.toml specification rectified to reflect that, maybe suggesting to use mail.utils.formataddr()?

It is also not very clear to me how to produce the Maintainer / Author fields when more than one author or maintainer is specified in authors / maintainers in pyproject.toml and not all of them have (or not have) specified name and email fields. Reading the prescription in the specification in the most strict way, I would place authors / maintainers for which there is no email in the Author / Maintainer fields and the others in Author-Email / Maintainer-Email (and this would be consistent with the requirement for these fields to be valid RFC822 From headers). However, PyPI rendering of the package page when both Maintainer and Maintainer-Email are present is a messy: it uses the name in Maintainer linking it to the first email address in Maintainer-Email. Example meson-python · PyPI Is this a PyPI bug or a wrong interpretation of the specification?

The original intention of the text is to describe how the structured name and email fields in pyproject.toml should be constructed into RFC 822 form for Core Metadata, so suggesting mail.utils.formataddr() makes sense to me. It would also be a good idea to revise the text to clarify the intention, and not strictly requiring the fields to be formatted with {name} <{email}>, although maybe keeping the string as an example for readability (since RFC 822 is not a very human-friendly identifier).

The Author and Maintainer fields are basically free-form, so you are free to do anything you seem fit. PyPI is trying to be smart here because historically people have misunderstood what the fields mean and put a name and email separately. It should be smarter when Maintainer-Email already contains a name instead and not blindly assign the email to a Maintainer entry. Assuming that gets fixed, what you want to do seems appropriate, although again anything is technically appropriate since there are no rules around the non-email fields.

Maybe related: Structure for importlib metadata identities

Related, indeed. I would love a more strict definition of the metadata fields. However, the approach proposed in that thread, that AFAICT didn’t receive any answer, is based on a misunderstanding of how the metadata fields are defined, namely that Author contains the list of author names, and Author-Email the list of the respective email addresses. This is not the case.

Thanks @uranusjr. It is very unfortunate that the metadata fields are not better specified. However, PyPI is the main consumer of the metadata, thus I think that the most important thing is to write the metadata in a way that is interpreted by PyPI as intended. For the currently deployed PyPI, mixing Author and Author-Email does not work. Having invalid addresses (names only for authos missign the email field in pyproject.toml) in Author-Email violates the metadata specification. Therefore, the only way to obtain something sensible is to put the author names and email addresses in the Author field. I’ll check how that renders on PyPI.

Somewhat related warehouse issue regarding how this metadata is displayed on PyPI: Odd rendering of author when using PEP 621 metadata. · Issue #9400 · pypi/warehouse · GitHub

Related issue for Hatch Build breaks for non-ascii emails · Issue #965 · pypa/hatch · GitHub

I’m wondering if non-ASCII emails should be supported even if it is technically possible

If interpret the linked issues correctly, it seems that there is no way to have more than one name or email address in Author / Author-Email or Maintainer / Maintainer-Email and have it correctly rendered on PyPI. As these fields are not strictly defined, PyPI is the only important consumer of these metadata, however, wheelhouse is not able to make sense of the metadata when multiple authors or maintainers are specified, plus some more unexpected behavior when both the plain and -Email versions of the fields are present. Therefore, despite the PyPA standard document, the only supported metadata format is having exactly one author or maintainer and use either the plain or -Email suffixed fields, but not both. Am I missing something?

What would be the road to have wheelhouse interpret the metadata fields as the specifications suggest? Should the specification be clarified with amendments to the PyPA documents before any patch to wheelhouse is proposed?

I think the best way forward would be to deprecate the -Email suffixed fields in favor or simply using comma separated, RFC822 compliant from addresses in the Author and Maintainer fields.

I would suggest opening an issue toward PyPI on rendering the fields more appropriately. It is entirely possible to add additional guidelines to the metadata fields so PyPI can treat it as expected, or there potentially might even be a way to just make PyPI do mostly what you want. Since PyPI is the only meaningful customer, how it renders the fields is basically the de facto standard. That would be more useful than trying to migrate the metadata format to new fields.