PEP 621 defines authors
and maintainers
lists in the pyproject.toml
file, and says of the transformation into core metadata:
If both email and name are provided, the value goes in Author-email/Maintainer-email as appropriate, with the format
{name} <{email}>
.
I’ve implemented this literally in Flit (PR) using Python string formatting. A contributor pointed out cases where it could go wrong, and pointed me to email.utils.formataddr()
as a more careful way to achieve the same thing. However, I found that this produces odd looking results with non-ASCII names ('=?utf-8?q?Zo=C3=AB?= <zoe@example.com>'
), and rejects non-ASCII email addresses altogether.
The core metadata spec says that both Author-email and Maintainer-email “can contain a name and e-mail address in the legal forms for a RFC-822 From:
header.” RFC 822 dates from 1982, and unsurprisingly, doesn’t appear to mention anything beyond ASCII (as far as I can see; I confess I haven’t read it all). There are newer standards for email which do allow non-ASCII characters.
This also goes for the core metadata format as a whole. PEP 241 (approaching its 20th birthday!) describes the format as “a single set of RFC-822 headers parseable by the rfc822.py module”, and I can’t see any changes to that in the subsequent PEPs. Do we take the email.parser
stdlib module as the successor to rfc822.py? And is there a good summary of what that expects, without reading the various RFCs?
To sum up:
- How should non-ASCII characters be represented in core metadata in general? Is it safe to write it as UTF-8, as Flit currently does? Or should it be escaped into a pure ASCII form?
- Are there special rules for non-ASCII characters in the Author-/Maintainer-email fields?
- Should we update the core metadata spec to clarify this?
- Should the wording I quoted from PEP 621 mention quoting/escaping?