PEP 639, Round 2: Improving license clarity with better package metadata

Hey Paul, I regret saying that and I’m really sorry if that came off as rude or dismissive; it was absolutely not kind or thoughtful of me and I apologize. I’d stayed up well past 7 am in my timezone responding to messages on this, which was a poor decision on my part, and my mind and judgement wasn’t where it should have be when replying to the feedback you’d taken the time to offer. And I also hope I didn’t take the gun enthusiast analogy too far; it just seemed to offer a lot of promise as a metaphor for illustrating my point of view as well.

To note, as this PEP tries to make clear in the user scenarios section (added primarily in response to your concerns at the start of the last thread, actually), and noted before in the “How to teach this” section, the simple cases get simpler with this PEP (just drop in a license file, enter the license short ID and you’re done, no need to look up the right classifiers, figure out to put in the license field, manually enter the license file path, etc). On the other hand, the complex cases weren’t even possible to express before, but are quite common among larger popular packages, and is where the PEP offers naturally given their complexity and importance the PEP takes most of its the time to explain and specify them clearly.

Also, for what its worth, I didn’t get the time to reply to your thoughtful post on the PR yesterday but @pradyunsg offered his help reducing the PR’s verbosity, and furthermore I actually discovered, tested and confirmed a more aggressive strategy for shortening the length of the PEP, moving most of the less critical sections you mentioned last night and others (Appendix 2, Appendix 3, the user scenarios, maybe the advanced example, and perhaps the rejected ideas or all but a summary/most critical) out of the PEP entirely, to separate linked reST documents in the PEP subfolder, which is now possible thanks to the new build system which will become the default in PEP 676. This will cut the PEP’s total length by well over half, on top of the reductions @pradyunsg and I achieve elsewhere.

Its naturally sad to hear you’re considering stepping down, but I know it must have been a difficult decision and not one you’ve taken lightly, and I hope my own careless words and actions haven’t played a significant part in it. It would certainly be greatly preferable to have a PEP delegate vs the SC, since the former would presumably be someone with packaging expertise, can actively following and engage in the discussion, and who can give a level of feedback on what’s needed to reach an state at which the PEP can be approved, as opposed to just having to submit it blind to the SC, cross our fingers and hope we don’t need to start all over with it after the years of effort and so many people’s contributions (if I understand the process correctly). Isn’t @brettcannon a SC member and involved with the packaging community? What about @dstufft ? Any other suggestions?

Thanks, I continue to very appreciate your great support and advice there and elsewhere, in providing many of the major suggestions and recommendations that led to the PEP as it is today.

That sounds very neat!

Did you happen to see my question here? PEP 639, Round 2: Improving license clarity with better package metadata - #14 by ofek

I really think that’s the only critique being expressed by anyone i.e. that the cost/benefit analysis described in https://www.python.org/dev/peps/pep-0639/#source-metadata-license-key does not justify the rejection of the idea.

1 Like

Not at all. Your comments were perfectly fine. I’m really grateful that you picked up this PEP and breathed new life into it, and I’m seriously impressed at the amount of work you put into it. My response wasn’t based on any frustration with what you’d said, but simply from a realisation that the PEP needs more time than I currently have available, and I don’t want to do it a disservice.

Far from it. And to be clear, I’m not thinking of stepping down as PEP delegate in general, simply looking for someone else for this PEP.

In fact, I think it would be healthier for the community in general if we had more variety in PEP delegates - the defined process is that any PyPA core reviewer can volunteer to be a PEP delegate, and I’m simply the fallback if no-one volunteers. But in reality, we very rarely get anyone volunteering, and as a result, I’ve ended up covering most PEPs.

Like you, I’m not at all sure that letting the decision pass to the SC is ideal, for all the reasons you mention, and I’d probably still step in if we can’t get a volunteer. But I hope we do (I won’t offer any suggestions - to be honest I feel it needs to be a personal choice and I don’t want to put anyone under any pressure).

2 Likes

Sorry, I was replying but had to step away for a bit to help my roommate (I removed my post saying this above)

I agree that it seems to be the general consensus, my own feelings aside, that the license PEP 621 key should be retained and used for the License-Expression field, and per the discussion with @pradyunsg , the concerns with that specific approach as suggested on the PEP are not as serious as initially thought, while the reasons for it are more compelling. While personally I still lean in favor of making it a separate key, it now ultimately comes down to a matter of preference, and the that approach is indeed workable with some compromises. Given that clearly seems to be the way things are going, to expedite things I’m fine with going ahead and propose reverting to that approach, and implement it once we sign off.

That said, the specific approach suggested here not actually the “license expression = string value of license” one being advocated, as I understand, by @pradyunsg , @pf_moore and others, and is not workable for all the much stronger reasons mentioned in the Add expressions and files subkeys to table section (which, since discussion had largely moved on from that approach, unlike the following one has not been updated as recently to reflect the more salient reasons uncovered), not the less serious ones that @pradyunsg and others responded to in the Define license expression as string value section.

On top of that, it has a much more fatal flaw: it makes it completely impossible to specify both license files and a license expression, as required by the larger and more complex projects (including the relatively moderate-complexity one cited in the example, setuptools), without adding yet another subkey to license and ditching the simple string value, which is what PEP 621 and people here are advocating for. This means license would have at least five separate subkeys, with even more complex mutual exclusivity—paths cannot be used with globs, neither paths nor globs can be used with file (and maybe not text either?),file and text cannot be used together per PEP 621, expression (or whatever we call it) can be used with one of paths or globs but not text (and not file either?) As you can see, this gets complex very fast, which just making a simple string value as envisioned in PEP 621, no more, and moving license-files to a separate key all avoids.

Ah, my sweet summer child, if only it were that simple :grinning_face_with_smiling_eyes: that approach would require undoing, at minimum, pretty much everything done in eb2e8740 (which changed the PEP from something very similar to your approach, to the current form), including a full rewrite of the whole PEP 621 section (which is 200 lines long). On top of that, a number of items added after this commit would need major rewrites, including the examples, the user scenarios, the backward compat section, parts of the how to teach this section, much of the conversion specification, and certain elements of the abstract, future PEPs, and a smattering of usages elsewhere, plus redoing many of the rejected ideas. To note, license-files is used 42 times through the PEP, and license-expression 28, while `license` (with backticks) is used a further 42, not to mention their subkeys and the other implications.

The approach I propose that is more in line with this discussion, by contrast, still requires significant changes but is substantially less disruptive, and simplies the PEP overall rather than adding another layer of complex interactions, which I will do so in a followup post to keep it short, focused and easy to follow and not get lost here.

1 Like

Given the discussion here, to expedite the process and align the PEP more closely with the general consensus and the expectations in PEP 621 as people are requesting, I propose essentially doing what @pradyung and others have urged:

  • Remove the separate license-expression key
  • Make the flat string value of the license key map to the License-Expression metadata value, as reserved by PEP 621, and deprecate the table subkeys
  • Update the Converting legacy metadata guidance accordingly, to reflect that legacy license.text metadata cannot be automatically converted during build if it is specified statically in [project] and can be warned instead
  • Keep license-files as it is
  • Simplify and update the rest of the PEP accordingly

In addition, to reduce the total length of the PEP by over half, I propose:

  • Moving the user scenarios and rejected ideas (over 50% of the current body text) to the appendix
  • Once PEP 676 is implemented, moving all appendices except the basic example (over 2/3rds the total length of the PEP) to separate supplementary file(s) in the PEP-639 directory, linked from the PEP; this would make the PEP itself far more focused and manageable while still preserving those resources for posterity for those who need them
  • Merging python/peps#2155, which will reduce the rendered length by another five full pages by eliminating the redundant 100-link references section (which I already converted to inline links)
  • Eliding the PyPA glossary portions cited in the Terminology section and replacing them with links, and trying to hopefully move at least some of those terms of general interest that aren’t there already to the main PyPA glossary instead.
  • Further reducing verbosity and excessive verbiage through the rest of the PEP with the kind assistance of @pradyunsg
4 Likes

That sounds awesome to me. If someone else chimes in with an approval of that new plan I’ll devote time this weekend to updating my implementation of this PEP.

edit: had time today, example usage:

[build-system]
requires = ['hatchling']
build-backend = 'hatchling.build'

[project]
name = 'foo'
license = 'MIT'
dynamic = ['version']

[tool.hatch.version]
path = 'foo/__about__.py'

[tool.hatch.build]
packages = ['foo']

[tool.hatch.build.targets.sdist]
include = ['/tests']

[tool.hatch.build.targets.wheel]
core-metadata-version = '2.3'
1 Like

What are the next steps here?

Sorry, I’ve been on a trip visiting various family members and wanted to give others an opportunity to chime in before going ahead with the rest of the above plan, to ensure there was consensus before reworking everything again. But as it seems we are all in agreement, if not one has any further objections, and assuming it doesn’t conflict with anything @pradyunsg has already worked on, I can go ahead with the above proposed revisions in the next few days or so.

1 Like

Yes I am. :slightly_smiling_face:

Sounds good!

Why this (I didn’t find any discussion about this in this topic and the PEP has enough of a different approach that I didn’t see how this decision was reached)? I could understand adding an expression or spdx key to go along with the other keys to support either approach (i.e. not only point to the license file but also classify it). While I fully support make stating what license a project uses as easy as possible, I also prefer to not toss out information that can be useful if people are up for the extra work (speaking from experience where I wrote a tool to generate a 3rd-party notices file that gathered licenses into a file while also working at a company that supports reading SPDX expressions for legal compliance; darn you, Rust, for only supporting one or the other!).

1 Like

Any interest in being a PEP delegate on this one? :slightly_smiling_face:

Good question—as you astutely point out, since proposal reverts to a previous direction not taken in the current version of the PEP, there wasn’t a single canonical place (at least in the rejected ideas) where I explicitly and cohesively explain this particular element, requiring the reader to synthesize a substantial amount of previous discussion or a number of disparate bits from the specification and/or rejected ideas. I’ll make sure to address this in the revised version of the PEP.

This is actually a pretty complex question with several distinct parts (adding an expression key versus allowing just a flat table value, deprecating the text key, and deprecating the file key), but the TL;DR is that the existing license table subkeys were already mutually exclusive per PEP 621, and the core metadata/source metadata keys they map to are deprecated in favor of the much richer, more powerful and non-mutually-exclusive mechanisms in this PEP that cover the same ground and more, per the consensus in the previous discussion. Read on for the fully detailed answer, which I could abridge and include in the next revision of the PEP.

The current text and file table subkeys of the license key are stated in PEP 621 to be mutually exclusive, and map to metadata fields that (per the strong consensus on the previous thread) this PEP deprecates:

The table may have one of two keys. The file key has a string value that is a relative file path to the file which contains the license for the project. Tools MUST assume the file’s encoding is UTF-8. The text key has a string value which is the license of the project whose meaning is that of the License field from the core metadata. These keys are mutually exclusive, so a tool MUST raise an error if the metadata specifies both keys.

I couldn’t find an explicit justification for this given in PEP 621, the reasons are somewhat unclear, but the upshot is that at present with PEP 621, so with those two keys, it is currently only possible to specify “one or the other”, as you say; more specifically, at most one of either:

  • Some free-form text describing the license, with unspecified syntax and semantics (text), or
  • A single license-related file, with unspecified semantics and mapping to core metadata or distribution archive contents (file).

This existing mutual exclusivity seems to be undesirable and overly restrictive, and seems to be one of the core things that bothers you above (and me too, which is why this PEP dramatically improves upon this situation…but I’ll get to that in a minute!).

Furthermore, it appears to be intended that text, and possibly file, map to the License field in core metadata. The clear consensus in the previous discuss thread for this PEP, both before and after I became involved, was that the License core metadata field should be deprecated by, and certainly mutually exclusive with the License-Expression field this PEP adds, to ensure there was one (and preferably one one) obvious way to concisely specify the license(s) of the project in the package metadata, avoiding user confusion, substantial legal ambiguity, and duplication, and to allow arbitrarily complex licenses, combinations and exceptions to be described all using a standardized, unambiguous, machine-parsable format. Therefore, use of the text key (and the file key for this purpose) is correspondingly deprecated and replaced by (and mutually exclusive with) specifying a license expression (and specifying license-related files for special cases, as appropriate).

Similarly, for some time now, Setuptools, wheel (the library) and other packaging tools have deprecated mechanisms that only allow specifying only a single license file (license_file), which is overly restrictive for many cases (including yours, when you have at least both a license file and notices file) and replaced them with ones that enable specifying multiple (license_files), and per the previous consensus, was what this PEP specified on the core metadata side well prior to my revisions. Therefore, for similar reasons as above, file is deprecated and replaced by a nearly equally simple but much more flexible way of specifying any number of license files to include, which unlike it, can also be specified alongside a license expression, and has safe, sensible, and standardized defaults and semantics for including license files in distribution archives and listing them in core metadata.

So, in summary, while the project source metadata changes in PEP 639 (with the revisions above) allows the license to be stated as easily as practical, with a single SPDX short identifier for most cases (and common license-related files included automagically), this PEP also allows much greater richness with license metadata for those who, like you say, are up for the extra work. In particular, it allows them to specify both a full license expression with any number of licenses, exceptions, and relationships, and one or any number of license files that they choose, if the clearer and more sensible defaults don’t already cover their use case, and are nearly a strict superset of the expressiveness of the previous two, which they would otherwise duplicate.

As for adding an expression table subkey to the license key, I actually not only carefully considered it, but (believe it or not!) had the same initial thought as you and in fact implemented exactly that in an earlier draft of the PEP. However, given the other two keys are to be deprecated and mutually exclusive with the new ones, being close to subsets of their functionality and mapping to deprecated metadata fields (for the reasons above); and there didn’t appear to be likely future keys that would be added, I opted not to add the extra complexity of an expression table subkey and making it mutually exclusive with the others, as opposed to just adding the string key (which neatly makes a license expression mutually exclusive with both as a natural and obvious consequence of the basic structure). As I discuss in the license expression as string value rejected idea:

If an expression subkey was added to the license table, it would retain the clarity of a new top-level key, but add additional complexity for no real benefit, with an extra level of nesting, and users and tools needing to deal with the mutual exclusivity of the subkeys, as before. And allowing both (as a table subkey and the string value) would inherit both’s downsides, while adding even more spec and tool complexity and making there more than “one obvious way to do it”, further potentially confusing users.

EDIT: I meant to include this before, but skipped it. There are a couple of possible niche use cases of the existing License field that are arguably not completely equally handled by the new License-Expression and License-File fields: bespoke proprietary licenses, and other arbitrary license-related information. For the former, since there is no well understood, standardized, meaning of such licenses, it seemed best to minimize ambiguity by cover this case with the LicenseRef-Proprietary license expression and including and specifying the license-file(s) that describes it; if custom identifiers for such cases are still desired by bespoke/proprietary tooling, the PEP does not prohibit them from allowing such, and if there’s sufficient need, we could (now or later) implement a LicenseRef-Custom value or allow arbitrary LicenseRef-{custom} identifers. To cover the second case, the user can simply include any extra info in a new or existing License-File that can automatically or explicitly be included archives and listed in the metadata, or include it in the short/long description; custom LicenseRef-s could also help cover that case if really needed. See discussion here and on the previous PR for more on that. END EDIT

In case you’re wondering why not add another files (and/or paths, globs, etc) subkey to the license table, see this rejected idea, and for the justification for the syntax and semantics of the license-files key, see the relevant rejected idea subsection.

Hopefully this clarifies things, and in case parts are still unclear, I’m happy to answer followups!

By the way, this is super cool; for the Spyder scientific environment/IDE I initially did that manually but in a strict machine-parsable format, which others later implemented tools to read, parse and update.

It’s up to @pf_moore . I can be, although I would probably ask the open source office here at work for input.

Sure, but that doesn’t mean all keys need to stay mutually exclusive.

My key point is I don’t want to lose the ability to specify the license file somehow (and to be clear, that doesn’t preclude embedding the license in the metadata like we do today, just as long as one can continue to programmatically get the license text from a wheel and sdist).

Sure, I was just making mention of the current status quo that you were comparing this PEP to.

Yup; this PEP actually greatly improves your ability to do so. Previously, with PEP 621, you could only specify a single license file, and only if you didn’t specify the license metadata; furthermore, it wasn’t explicitly specified what backends were supposed to do with file (include the path in license? include the full text? include the file in the distribution? some combination, or something else entirely?), and how metadata consumers could access it in a consistent, defined manner.

With this PEP, there is a new license-files key that allows adding multiple license-related files in addition to a license expression, either by full relative path or glob, a clearly defined mechanism for storing both the path in the metadata and the full text in the .dist-info per what Wheel and Setuptools have implemented, with additional tweaks to avoiding conflicts, backward compat issues and clutter in .dist-info and allowing including licenses from subdirs, e.g. vendored projects, rather than arbitrarily dropping them.

I said in a post above that I’d prefer someone else to volunteer to be PEP delegate for this, so feel free. But in general, the position as I understand it is that anyone can volunteer to be PEP delegate, it doesn’t need my approval for them to do so (although I would get a vote on approving them along with the other PyPA committers).

By the way, if anyone knows whether the process in PyPA Specifications — PyPA documentation

If their self-nomination is accepted by the other PyPA core reviewers, the lead PyPI maintainer and the default PEP-Delegate for package distribution metadata PEPs, then they will have the authority to approve (or reject) that PEP.

means that we need a PyPA vote, or if a simple call for anyone who objects to speak up is sufficient, please let me know!

1 Like

Have you actually come across a need for this from a top-level perspective?

Huh, interesting use-case!

I volunteer then! Let the regret begin. :wink:

If you’re mirroring how the Python core team does it then you just appoint and see if anyone screams.

2 Likes

We’re not, quite. See the quote above, it needs PyPA committer approval. I’ve posted a request for any objections to the PyPA committers list. Assuming no-one objects, I’d say that the job’s yours :slightly_smiling_face:

Perhaps we can call a vote to make this change to the governance PEP, so in future we can do this. @brettcannon what timeline you give to people to scream? Do you communicate on some list PEP delegate nominations?

We announce the delegate to python-dev so people know who they need to speak/influence the decision. We have actually never had to change a delegate, so it’s honestly a hypothetical we need to be concerned about it.

Just FYI, in case it was easy to miss, the <details> section in my original reply covers several of the followup questions in, well, detail (maybe too much, which is why I collapsed it) :laughing:

I’m not 100% sure if you’re asking about why it needs to be a top-level key under [project], or the need for multiple license files, so I’ll address both.

Regarding the former, if you’re wondering why not add another files (and/or paths , globs , etc) subkey to the license table instead of making license-files a top-level [project] key, that was what I originally implemented in the PEP, but see this rejected idea for why it proved unworkable in practice, and for the justification for the syntax and semantics of the license-files key, see the relevant rejected idea subsection.

As for the need for multiple license-related files, that is actually surprisingly common for the many projects that contain code or vendored deps under other licenses (pip, Setuptools, Spyder, etc), by those that have a license and a notice file (Such as the projects you mentioned, or several I’ve used), by projects under multiple licenses (packaging), and a number of other cases; which are all technically currently in violation of the relevant licenses by not doing so (unless they happen to match the current de-facto defaults of their tool, which this PEP also standardizes). This also matches the support of and for the reasons mentioned by Wheel, Setuptools and others.

Indeed, and its more common than you might think! Pip and Setuptools are but two examples (that are currently technically not following the licenses of their vendored deps by not doing so), the latter of which is included in the examples section of this PEP.

Thanks! :smirk:

1 Like

Thanks! For reference, that discussion can be found at

https://mail.python.org/archives/list/pypa-committers@python.org/thread/TJXEXQHP7XXYC33I5EXCD3KTHWHX2VVO/

I opened a draft PR (with that discussion referenced) to make the change on the PEP, pending the outcome of the approval.

Just to note, it seems (as @pf_moore pointed out) that there have been several instances I’m aware of lately (from having somehow instigated each of them) of there being some ambiguity in the PyPA governance document (all, somehow, instigated by me)—whether the PyPA GitHub org is the sole canonical PyPA project location or those on Bitbucket, GitLab, etc are equivalent (and whether a vote is needed to move a project from one or another); what specifically “approval” means for a PEP delegate change, and whether my minor to update references to Python-Dev on the PyPA site to point here required a vote. It might be good to address those together, if so.

By the way, I should have mentioned that the vote is complete and Brett has been accepted as PEP delegate.

6 Likes