Good question—as you astutely point out, since proposal reverts to a previous direction not taken in the current version of the PEP, there wasn’t a single canonical place (at least in the rejected ideas) where I explicitly and cohesively explain this particular element, requiring the reader to synthesize a substantial amount of previous discussion or a number of disparate bits from the specification and/or rejected ideas. I’ll make sure to address this in the revised version of the PEP.
This is actually a pretty complex question with several distinct parts (adding an expression
key versus allowing just a flat table value, deprecating the text
key, and deprecating the file
key), but the TL;DR is that the existing license table subkeys were already mutually exclusive per PEP 621, and the core metadata/source metadata keys they map to are deprecated in favor of the much richer, more powerful and non-mutually-exclusive mechanisms in this PEP that cover the same ground and more, per the consensus in the previous discussion. Read on for the fully detailed answer, which I could abridge and include in the next revision of the PEP.
The current text
and file
table subkeys of the license
key are stated in PEP 621 to be mutually exclusive, and map to metadata fields that (per the strong consensus on the previous thread) this PEP deprecates:
The table may have one of two keys. The file key has a string value that is a relative file path to the file which contains the license for the project. Tools MUST assume the file’s encoding is UTF-8. The text key has a string value which is the license of the project whose meaning is that of the License field from the core metadata. These keys are mutually exclusive, so a tool MUST raise an error if the metadata specifies both keys.
I couldn’t find an explicit justification for this given in PEP 621, the reasons are somewhat unclear, but the upshot is that at present with PEP 621, so with those two keys, it is currently only possible to specify “one or the other”, as you say; more specifically, at most one of either:
- Some free-form text describing the license, with unspecified syntax and semantics (
text
), or
- A single license-related file, with unspecified semantics and mapping to core metadata or distribution archive contents (
file
).
This existing mutual exclusivity seems to be undesirable and overly restrictive, and seems to be one of the core things that bothers you above (and me too, which is why this PEP dramatically improves upon this situation…but I’ll get to that in a minute!).
Furthermore, it appears to be intended that text
, and possibly file
, map to the License
field in core metadata. The clear consensus in the previous discuss thread for this PEP, both before and after I became involved, was that the License
core metadata field should be deprecated by, and certainly mutually exclusive with the License-Expression
field this PEP adds, to ensure there was one (and preferably one one) obvious way to concisely specify the license(s) of the project in the package metadata, avoiding user confusion, substantial legal ambiguity, and duplication, and to allow arbitrarily complex licenses, combinations and exceptions to be described all using a standardized, unambiguous, machine-parsable format. Therefore, use of the text
key (and the file
key for this purpose) is correspondingly deprecated and replaced by (and mutually exclusive with) specifying a license expression (and specifying license-related files for special cases, as appropriate).
Similarly, for some time now, Setuptools, wheel (the library) and other packaging tools have deprecated mechanisms that only allow specifying only a single license file (license_file
), which is overly restrictive for many cases (including yours, when you have at least both a license file and notices file) and replaced them with ones that enable specifying multiple (license_files
), and per the previous consensus, was what this PEP specified on the core metadata side well prior to my revisions. Therefore, for similar reasons as above, file
is deprecated and replaced by a nearly equally simple but much more flexible way of specifying any number of license files to include, which unlike it, can also be specified alongside a license expression, and has safe, sensible, and standardized defaults and semantics for including license files in distribution archives and listing them in core metadata.
So, in summary, while the project source metadata changes in PEP 639 (with the revisions above) allows the license to be stated as easily as practical, with a single SPDX short identifier for most cases (and common license-related files included automagically), this PEP also allows much greater richness with license metadata for those who, like you say, are up for the extra work. In particular, it allows them to specify both a full license expression with any number of licenses, exceptions, and relationships, and one or any number of license files that they choose, if the clearer and more sensible defaults don’t already cover their use case, and are nearly a strict superset of the expressiveness of the previous two, which they would otherwise duplicate.
As for adding an expression
table subkey to the license
key, I actually not only carefully considered it, but (believe it or not!) had the same initial thought as you and in fact implemented exactly that in an earlier draft of the PEP. However, given the other two keys are to be deprecated and mutually exclusive with the new ones, being close to subsets of their functionality and mapping to deprecated metadata fields (for the reasons above); and there didn’t appear to be likely future keys that would be added, I opted not to add the extra complexity of an expression
table subkey and making it mutually exclusive with the others, as opposed to just adding the string key (which neatly makes a license expression mutually exclusive with both as a natural and obvious consequence of the basic structure). As I discuss in the license expression as string value rejected idea:
If an expression subkey was added to the license table, it would retain the clarity of a new top-level key, but add additional complexity for no real benefit, with an extra level of nesting, and users and tools needing to deal with the mutual exclusivity of the subkeys, as before. And allowing both (as a table subkey and the string value) would inherit both’s downsides, while adding even more spec and tool complexity and making there more than “one obvious way to do it”, further potentially confusing users.
EDIT: I meant to include this before, but skipped it. There are a couple of possible niche use cases of the existing License
field that are arguably not completely equally handled by the new License-Expression
and License-File
fields: bespoke proprietary licenses, and other arbitrary license-related information. For the former, since there is no well understood, standardized, meaning of such licenses, it seemed best to minimize ambiguity by cover this case with the LicenseRef-Proprietary
license expression and including and specifying the license-file
(s) that describes it; if custom identifiers for such cases are still desired by bespoke/proprietary tooling, the PEP does not prohibit them from allowing such, and if there’s sufficient need, we could (now or later) implement a LicenseRef-Custom
value or allow arbitrary LicenseRef-{custom}
identifers. To cover the second case, the user can simply include any extra info in a new or existing License-File
that can automatically or explicitly be included archives and listed in the metadata, or include it in the short/long description; custom LicenseRef-
s could also help cover that case if really needed. See discussion here and on the previous PR for more on that. END EDIT
In case you’re wondering why not add another files
(and/or paths
, globs
, etc) subkey to the license
table, see this rejected idea, and for the justification for the syntax and semantics of the license-files
key, see the relevant rejected idea subsection.
Hopefully this clarifies things, and in case parts are still unclear, I’m happy to answer followups!
By the way, this is super cool; for the Spyder scientific environment/IDE I initially did that manually but in a strict machine-parsable format, which others later implemented tools to read, parse and update.