Some fields are explicit multiple-use fields, while others are implicit multiple-use fields (CSV). E.g., Keywords
, Requires-Python
, Author-email
and Maintainer-email
are examples of “implicit multiple-use fields”. Explicit definition of such feature as a “compact” multiple-use fields would’ve made things more transparent, generic and readily automatable. E.g., this way Keywords
won’t be treated as a special case in email to JSON conversion rule.
Supported-Platform
seems to have same purpose as a “platform tag” in a wheel, which makes it somewhat useless:
Binary distributions containing a PKG-INFO file will use the Supported-Platform field in their metadata to specify the OS and CPU for which the binary distribution was compiled.
Description
field mustn’t contain EOLs/multiline strings.
PEP 345 states that
To support empty lines and lines with indentation with respect to the RFC 822 format, any CRLF character has to be suffixed by 7 spaces followed by a pipe ("|") char. As a result, the Description field is encoded into a folded field that can be interpreted by RFC822 parser [2].
In reality RFC822 and its successors don’t mention anything like that to create “folded field”: “7 spaces” can as easily be one space, and a pipe char can be any printable (?) char. Also
This encoding implies that any occurrences of a CRLF followed by 7 spaces and a pipe char have to be replaced by a single CRLF when the field is unfolded using a RFC822 reader.
is wrong too: CRLF followed by whitespace is replaced by a single whitespace.
RFC 822 - STANDARD FOR THE FORMAT OF ARPA INTERNET TEXT MESSAGES defines “long header fields” and what folding is
Each header field can be viewed as a single, logical line of ASCII characters, comprising a field-name and a field-body. For convenience, the field-body portion of this conceptual entity can be split into a multiple-line representation; this is called "folding".
I.e., “folding” is a means of formatting raw data, rather than the text it represents. Later it defines “unfolding”:
Unfolding is accomplished by regarding CRLF immediately followed by a LWSP-char as equivalent to the LWSP-char.
I.e., “unfolding” results in replacing of CRLF along with >=1 whitespace chars with a single whitespace char.
The only part of RFC822 that mentions anything remotely similar to preservation of EOLs is RFC 822 - STANDARD FOR THE FORMAT OF ARPA INTERNET TEXT MESSAGES that defines “structured field bodies”:
To aid in the creation and reading of structured fields, the free insertion of linear-white-space (which permits folding by inclusion of CRLFs) is allowed between lexical tokens. Rather than obscuring the syntax specifications for these structured fields with explicit syntax for this linear-white-space, the existence of another "lexical" analyzer is assumed. This analyzer does not apply for unstructured field bodies that are simply strings of text, as described above. The analyzer provides an interpretation of the unfolded text composing the body of the field as a sequence of lexical symbols.
From last sentence it becomes clear that “analyzer” provides “interpretation of the unfolded text”, meaning when “analyzer” receives field-body it is already unfolded, therefore any kind of EOLs (or only CRLF?) can’t be preserved.
RFC2822 contains only “minor” changes, like whitespace followed by CRLF must contain at least one printable character, etc., which doesn’t change the overall picture.
This is important for sdist which must use v1.2 where description can’t be stored as email payload/content to preserve EOLs, which creates perfect moment to write a proper PEP for sdist (it seems to be already in the works).
Probably the best way to support multiline strings in “field-body” is to encode them in base64.
Description-Content-Type
accepts charset
set only to UTF-8, so what’s the point in using it explicitly? Is it a Py2k remnant?
Keywords
should’ve been multiple-use field. Maybe introduce multiple-use field Keyword
, which BTW won’t be in conflict with a rule for email to JSON conversions that treats Keywords
as a multiple-use field, as well as will be in a singular form like other multiple-use fields? Or treat fields with CSV as a “compact” multiple-use fields, as described in the very beginning.
Isn’t Home-page
in conflict with Project-URL: Home page, https://hope.page
? Maybe define them to be interchangeable?
Download-URL
– same issue as with Home-page
.
Author
and Maintainer
should’ve been multiple-use field because project may have multiple authors/maintainers. Core metadata specifications - Python Packaging User Guide doesn’t say anything, but judging by METADATA
generated by setuptools
these fields can contain CSV.
Author-email
and Maintainer-email
must be compatible with RFC822 header From
, therefore must be able to contain CSV (CSV (?) “target-list”: https://cr.yp.to/immhf/sender.html).
License
faces similar issue as Description
, but can be stored only in a header.
Requires-Python
should’ve been multiple-use field. Faces the same exact issue as Keywords
.
No way to define “importable packages”. I think I’ve read somewhere about notation like Provides-Dist: {dist}:{pkg}
, but can’t find sources. E.g., ATM pkg_resources
is effectively unrelated to setuptools
and its dist metadata, resources, etc. can’t be read with help of importlib.metadata
. It seems that ATM pip
emulates this with *.dist-info/top_level.txt
, which is likely to be a result of parsing *.dist-info/RECORD
.
Obsoletes-Dist
, but for “importable packages”.
It is legal to specify Provides-Extra: without referencing it in any Requires-Dist:.
If this is related to “virtual” dist from Provides-Dist
prior to v2.1, on which Provides-Extra
seems to be based, then it implies that the mere fact of “virtual” feature being mentioned in Provides-Extra
must satisfy requirement dist[virtual]
. The problem with this is that because of complexity of “extra” in environment markers, package managers will be forced to check all environment markers to determine which extras are “virtual” (and not just by evaluating them… which is partially caused by branching), and that can be overwhelmingly complicated considering conciseness of that sentence. It’d be nice if that sentence was expanded to include meaning of such unreferenced extras.
New multiple-use field Extends-Package
/Extends-Dist
is needed to associate extensions with packages/dists, instead of requesting Classifier
each and every time some extensible distro rises to popularity. There might be other types of relations, but I guess most of them are either about extending or replacing packages. This will simplify finding of extensions, as well as possibly revive interest in Keywords
.
Requires-External
is applicable only to wheels, according to
Each entry contains a string describing some dependency in the system that the distribution is to be used.
In the context of wheel it specifies run-time environment, but in the context of sdist it will specify build-time environment, which will result in two metadata files to be different, thus requiring two separate sets of metadata definitions in pyproject.toml
(e.g., for PEP 621). That being said, considering use-case of this field, it makes sense to provide Requires-External
for sdist as much as for wheel…
Overhaul of tagging of distros for the sake of finding relevant ones much easier. Classifiers require too much typing, thus useless for CLI, and keywords don’t seem to be used at all (maybe internally by some packages). Problem is, considering the role that classifiers play, they can’t be replaced by keywords (maybe split into separate keywords in a meaningful way (?))…
Project-URL
– standard set of labels?