FWIW, I don’t think @pf_moore meant individual files (as in .py files or .txt files or whatever) but rather the distribution files (.tar.gz or .whl). As @pombredanne says, per-source-file licensing is out of scope for this PEP (and I agree with that).
Coming back to this, what’s the next steps here? Do we need another round of updates to the PEP for License-Expression?
As a scientific Python developer and docs writer/copyeditor following this proposal with great interest and looking forward to seeing it accepted in a timely manner, I’d be happy to pitch in and help with a PR addressing the relatively mechanical updates if that would help move things along promptly and wouldn’t duplicate work already done by @pombredanne . It does appear that the change to use the License-Expression field requires several major content revisions to be made, which I assume @pombredanne would want to handle. These seem to be:
Update the prose to reflect that the same field is not being re-used, and explain why adding a new one was chosen
Replace the License-Expression field with re-using the License field in the Rejected ideas section and summarize the rationale as discussed and agreed here
Decide on and document the backward-compatibility implications and the migration plan as described below
The more mechanical changes that would be more appropriate for someone like me to help with, include:
The referenced version number should be updated to 2.3 and the updated version to 2.2 throughout the PEP, and 2.2 added to the references section
References to the License field should be updated to refer to the License-Expression field, where appropriate
The PEP’s title, structure, metadata and phrasing needs to be updated to reflect that used in PEP 643 per @pf_moore , that it no longer represents a canonical spec but a proposal to update the actual spec
We might want to update the “License Files in wheels and setuptools” section a bit to reflect the additional names added in pypa/wheel#251 that provide fairly comprehensive coverage of most common license files (full disclosure: I contributed that PR)
Use consistent case (either Title Case or Sentence case) throughout the headings; they are currently a mish-mash of both
In addition, the following technical changes could be considered, though they are not so clear-cut:
Should the “replaces” field and other links be updated to point to PEP 643 (Metadata v2.2) ? That PEP formally updates the metadata spec to 2.2, but unlike the others isn’t titled following the pattern Metadata for Python Software Packages X.Y, doesn’t specify it “replaces” the previous metadata spec, doesn’t list the previous specs and the fact that it describes the changes in v2.2 is buried in the prose. Therefore, if that PEP is referenced as-is, it would be a “dead-end” for readers and not describe the remainder of the metadata specification, so it seems either the link to PEP 566 be retained, or that PR updated to at least link to the reference the previous metadata specs. Clarified per @pf_moore and moved above that this PEP needs to be reframed to reflect the new convention in PEP 643
The title can simply bumped to reflect the new version, but since the PEP focuses on one specific topic in detail, is there a reason it isn’t titled as on the thread here? As an average developer, the latter would be much clearer at a glance what it describes than having to remember what a particular version happens to contain, but maybe that has already been discussed and decided. Clarified (I think) per @pf_moore and moved above that the PEP’s title should reflect it as a topical proposal for the metadata spec, not a new spec itself
However, the major changes related to adding a License-Expression do raise several important questions, the answers to each of which affects the others, and which have been briefly discussed in isolation, but without a coherent proposal considering all of them. These include:
Should the License-Expression field be mutually exclusive with License and the license classifiers? Should this exclusivity apply only to user-provided values for both fields, or to the actual fields in the METADATA file (the former would make sense if License is planned for eventual deprecation, and allow build tools to automatically fill License from License-Expression per the third point here).
Should License be deprecated (and if so, on what schedule?), retained indefinitely, or that question addressed in a future PEP? (Perhaps declare it deprecated in v2.3, and scheduled for removal in v3.0?)
Should the value of the License-Expression field in the core metadata be back-filled to the License field, at least during the deprecation period? If so, should it be to the SPDX identifier, or the full human-readable license name? (For v2.3 at least, back-filling would seem prudent, since otherwise tools expecting a license or license classifier would break. Backfilling the SPDX identifier would seem simpler, match current use of that field for such, and keep the use consistent between them.)
Should this new metadata version be 2.3, or 3.0? (Maybe deprecate and back-fill License in 2.3, and plan to remove it in 3.0.)
Based on what has been discussed, a proposed coherent strategy to address these questions might be in the form of the following changes to the current PEP 639 specification for v2.3, following the comments above:
If a package developer inputs a value for the License-Expression field, developer-facing build/publishing tools SHOULD raise an error if it is not a valid SPDX identifier, SHOULD issue a warning if is a deprecated identifier, and MUST backfill its value verbatim to the License field. User-facing install tools MAY issue a warning in this case.
Developer-facing build tools SHOULD, and user-facing package installation tools MAY issue a warning if the License-Expression field is not provided
If the License-Expression field is present, developer-facing tools MUST raise an error, and user-facing tools MAY issue a warning, if the License field or license-related classifiers are manually specified
Otherwise, if a package developer inputs a value for the License field or any license-related classifiers, developer-facing build tools SHOULD and user-facing install tools MAY issue a warning recommending the License-Expression field instead.
Publishing tools MAY use the License field or license-related classifiers to infer a value to fill the License-Expression field, but the former MUST be a valid SPDX identifier or the latter must unambiguously map to one, and if both are the case, both MUST map to an identical SPDX identifier. If this is done, tools MUST issue a clear warning.
Both developer- and user-facing package build tools SHOULD issue a warning if neither field, nor at least one license classifier is present.
A notional specification for the field in v3.0 in a future PEP could be the following:
Developer-facing build/publishing tools SHOULD raise an error if the License-Expression field is not present or is not a valid SPDX identifier and SHOULD issue a warning if is a deprecated identifier. User-facing install tools SHOULD issue a warning and MAY raise an error in this case.
Developer-facing tools SHOULD raise an error, and user-facing tools MAY issue a warning, if the License field or license-related classifiers are specified
On this point, no. The canonical location for the metadata spec is not a PEP, but is the definition on packaging.python.org. So metadata changes no longer “replace” anything, they stand alone as proposals for modifications to the canonical spec.
I’d not spotted this before, but you’re correct - PEP 639 shouldn’t be written as an incremental change from the previous PEP, but should take the approach of PEP 643:
The PEP states the motivation and details of the change, and notes that a new metadata version will be needed to introduce the change. None of this needs to be substantially different from the existing information, it’s just a rephrasing to reflect the actual process.
Either in parallel, or once the PEP has been accepted, a PR to the core metadata spec that implements the change needs to be submitted. Given how long this PEP has been in development, I’d suggest that creating that PR once the PEP has been finalised and accepted would make more sense - keeping a PR up to date all of this time would have been a major pain.
Thanks for the clarification! That makes a lot more sense than having to traverse an incremental series of PEPs that actually revise rather than actually replace each other, and focus mostly on motivation and explanation rather than the core spec. An initial draft of my post mentioned this more explicitly, but as I wasn’t sure if PEP 643 represented a new standard or was different in form for another reason, I didn’t want to potentially derail my post too far by proposing a shift to a whole new convention that I wasn’t sure was canonical. I’ve revised it accordingly.
It might be a good idea to still at least mention somewhere visible (e.g. in the abstract) that the changes this PEP proposes will bump the core metadata version to 2.3. As an interested but non-expert reader, it wasn’t clear to me that PEP 643 actually proposed modifying the core metadata spec (as opposed to just the sdist spec) until nearly halfway through the prose, and the actual version was only mentioned one spot in the middle of the text.
Thanks! Since the reframing changes suggested by @pf_moore seem pretty mechanical, let me know if I can help, though it sounds like you have things covered already.
I would suggest that the new License-Expression field should be a full replacement to the current License field and all license Classifiers.
At the moment there is sometimes ambiguity which license was used if the ones specified don’t match. The situation won’t improve if an additional field is added. Especially when it isn’t clear for devs what to use. So developer facing tools should IMO report an error if License-Expression is used together with License or a license Classifier.
Since the License field is so widely used, maybe a soft deprecation might be the way to go. Print a warning for developer, but don’t remove it for the time being. I imaging it shouldn’t be difficult for most to update once they see the warning, and a hard deprecation with removal can always follow later.
Is there a map between {License strings, License trove classifiers} and SPDX {URIs, spdx:licenseId, spdx:name} that could be used to lookup the appropriate License Expression if both are specified and/or in a deprecation warning?
Yup, that’s basically what the above tentative specification proposal implements, in more formal language. For metadata version 2.3, following previous discussion, it proposes making using the new License-Expression field together with License or license-related Trove classifiers an error when input by the developer, while specifying that developer-side tools should issue a warning on using either alone. For backward compatibility, as suggested by others, it requires tools supporting that standard to back-fill the License field in the built metadata with the SPDX identifier, so tools currently expecting it don’t break. A notional v3.0 release, as would be proposed by a future PEP, would remove the License field completely and require at least developer-facing build tools to make using license-related Trove classifiers an error.
However, as I’m not a packaging expert unlike most people here, at the very least the details likely need refinement, particularly:
Clarifying the distinction between developer-specified project metadata for build tools to consume (PEP 621, pyproject.toml, setup.cfg, etc), and the actual core metadata in the built packages (PKG-INFO, METADATA, etc), which matters for tools backfilling the License field or forward-filling the License-Expression field
If either of the above are done, do tools need to mark those metadata fields dynamic per PEP 643? Should an explicit note be made of this?
Deciding on and defining more clearly what should be considered a “package developer-facing tool/command” vs. a “package user-facing tool/command”, as specified above, or a “publishing” tool vs. a “build” tool, as specified in the original PEP. @pombredanne was the former intended to mean basically just twine, or was it intended more broadly, closer to the working definition in the proposal above?
Is there a map between {License strings, License trove classifiers} and SPDX {URIs, spdx:licenseId, spdx:name} that could be used to lookup the appropriate License Expression if both are specified and/or in a deprecation warning?
The actual mapping isn’t isn’t 1:1, which was what sparked pypa/trove-classifiers#17 which in turn was the genesis of this PEP, as I understand it. I don’t see a programmatic tool that at least maps the unambiguous cases on that issue or on the SPDX org, but as @pombredanne is a member over there, he would likely know far better than a random passerby like me. However, I happened to just be discussing a related matter with them over on spdx/LicenseListPublisher#77 in the context of a potential application of this PEP, updating the FSF data to be useful for potentially validating that a License-Expression is FSF/OSI approved. Perhaps it could take the form of something of an API/database like the FSF-API one, except much simpler without all the XML scraping and maintained under the PyPA umbrella (I’m not a member or anything, but would be willing to volunteer to help prototype something).
As far as I can see, the published PEP 639 is way out of date. So that’s the first step, to create a PR with the various agreed changes submitted, and get that merged. Once that’s done, I think we need another round of discussion, just because it’s so long since there was an update.
I’m slightly surprised at the fact that the PR is so out of date - I’d seen github notifications which I’d assumed were PEP updates, but maybe they were against a “work in progress” repo I’d somehow been subscribed to without realising.
Somewhat off-topic, but I’m getting repeatedly spammed by github notifications from what looks like everyone’s clone of the peps repository, because one of the commits for this PEP included “CC: @pfmoore” in the commit message. I suspect a lot of other people are getting similar notifications.
First of all, please don’t include github tags in commit messages, as this is really annoying
Secondly, is there any way of fixing this so it stops spamming me (and others)?