I think this PEP looks great and is basically ready.
@pombredanne I suggest that (in about a week or so) you go ahead and request the PEP-delegate (@pf_moore!) to decide on it – assuming the PEP submission doesn’t spark a new track of discussion here.
I reviewed the PEP, overall it looks great! I made a lot of small edits as I reviewed (the majority seem to be minor differences between rST and Markdown syntax and other small clarifications) so I created a PR to capture all of them, feel free to accept/reject any of the suggestions as you see fit: PEP 639: edits from review by di · Pull Request #1628 · python/peps · GitHub
One thing that’s not super clear from the PEP: when discussing the mapping from Classifier fields to the License field, it says:
Publishing tools MAY infer or suggest an equivalent license expression from the provided License or Classifier information if they are able to do so unambiguously
I think it would be good to explain when/how this can be done unambiguously, and when it can’t (e.g. with License :: OSI Approved :: Apache Software License)
I also think we could potentially take this a step further and say something like:
Classifiers that have SPDX identifiers in parentheses can be mapped to the License field unambiguously
Although this isn’t exactly true, (e.g. License :: OSI Approved :: Academic Free License (AFL) shouldn’t have parens, and License :: OSI Approved :: MIT Licenseshould) we could update trove-classifiers to deprecate license classifiers as necessary to make this statement true.
Since publishing tools would otherwise have to homebrew this mapping, it seems to me like this should be something that trove-classifiers provides anyways.
I think I’m still with Donald on this one, but it looks like we’re the minority here so I won’t press it.
Somehow I missed this important comment and in hindsight I start to warm up to the idea of a separate field
The primary reason to reuse License it is to avoid field inflation and the fact that it has been feeling to me a little like “we have two license fields that are messy, let’s add a third one to fix them.” But Donald’s point about never being to validate anything strictly and with warnings being ignored and lost in the noise is a major one.
So let’s recap: The ability to enforce a strict validation on the client side would indeed be improved if we have a dedicated field, say a License-Expression. This means that we should deprecate the License field.
The behaviour for client tools would become:
return warnings when using License or Classifier with a license.
validate strictly the License-Expression field when present (and fail too if a License or Classifier with a license are present)
The behaviour for PyPI would become:
validate strictly the License-Expression field when present (and fail too if a License or Classifier with a license are present)
later, mandate the presence of this field.
Note that for PyPI the validation opportunities are improved as it becomes possible to valdiate things right away.
So after all, I am fine with using a new License-Expression field if we can get consensus on this.
Having a mapping published alongside trove-classifiers of {classifier: license id} would be a great addition. (Side note in License :: OSI Approved :: Academic Free License (AFL), AFL is not a valid SPDX license id and is ambiguous since it does not specify a license version so we likely may not be able to provide a clear mapping there, even though in practice it may likely be v3.0 most of the times.)
This sounds fine, I would probably scope the warnings to author centric clients like twine, setupools, flit. I’m not sure it makes sense to have warnings on pip, just because the people running pip install are rarely going to actually be in a position to do anything about those warnings, so for them it’ll just noise.
To pile on a bit more, I also feel that having a field to explicitly declare the metadata as “yes, I’m actually a valid SPDX thingie” would be nice. It’s not a blocking concern to me, but, yea, I’d prefer that.
If we do agree to add a field, I’m personally in favor of reusing the current License key for the use-provided string and adding an additional field that holds the value of true/false, to denote whether it’s actually a validated SPDX identifier. And tell build backends + publishing tooling (setuptools, flit, twine, poetry etc) to print a warning when building/publishing an artifact in this way.
I agree with @dstufft that pip won’t be the correct place to have any warnings, since those users likely can’t action on those warnings.
Yes, this was exactly my point about the AFL license. However, we can update these classifiers to “fix” all the unambiguous ones that are missing or don’t have valid SPDX identifiers.
To play the other side a bit, the reason I’m not pushing for this harder is because it would be ~trivial for a consumer to attempt to parse a free-form License field and unambiguously determine if it was a license expression or not. It means anyone dealing with the metadata has to do a little more work, but not a lot more work.
So having played on both side, I can be happy with both. But the light I have seen now is that with a field that can take either free form or a valid license expression, there is no way to tell all the times that a parsing error is because the expression is invalid or because this is not an expression.
NB: I might help with some PR in the various tools that would need expression parsing support if this PEP is accepted.
I want to make clear that I would put the weight of any ideas from me at the bottom of priority of what everyone else has said here. My key thing is wanting SPDX expressions, preferably with PyPI requiring that one be provided for all uploads. Past that I’m flexible.
Yes.
Nope, I didn’t say that. I’m fine with it eventually going away.
If possible, yes.
That’s a PEP question.
Transition, but it’s not necessarily that critical as long as we bump the metadata version.
Isn’t it possible for PyPI to mandate a valid SPDX expression in the License field if Metadata-Version is 2.2 (should it then be 3.0 ?) or above while still accepting any string if Metadata-Version is 2.1 or below ?
Yes, but this would have the effect of “breaking” a lot of currently valid uploads that just always use the latest version of our tooling and aren’t aware of metadata versions. We’d need to introduce a deprecation period first if we did this.
Shouldn’t the name of this PEP be updated? It became confusing after PEP-643 has been put forward and accepted, as it already introduces v2.2 of the metadata spec.
Since this PEP is still in Draft state, is it acceptable to provide additional edits to improve clarity and wording (without changing the meaning)? If so, I’ll send a PR with some suggested changes.
I’ll also add my 2 cents indicating that I’m quite happy to see this PEP in its current state and would be happy to help move it along from Draft to Accepted if there’s anything a community member can do to help!