PEP 639: Improving license clarity with better package metadata

I think this PEP looks great and is basically ready.

@pombredanne I suggest that (in about a week or so) you go ahead and request the PEP-delegate (@pf_moore!) to decide on it – assuming the PEP submission doesn’t spark a new track of discussion here. :wink:

2 Likes

I reviewed the PEP, overall it looks great! I made a lot of small edits as I reviewed (the majority seem to be minor differences between rST and Markdown syntax and other small clarifications) so I created a PR to capture all of them, feel free to accept/reject any of the suggestions as you see fit: PEP 639: edits from review by di · Pull Request #1628 · python/peps · GitHub

One thing that’s not super clear from the PEP: when discussing the mapping from Classifier fields to the License field, it says:

Publishing tools MAY infer or suggest an equivalent license expression from the provided License or Classifier information if they are able to do so unambiguously

I think it would be good to explain when/how this can be done unambiguously, and when it can’t (e.g. with License :: OSI Approved :: Apache Software License)

I also think we could potentially take this a step further and say something like:

Classifiers that have SPDX identifiers in parentheses can be mapped to the License field unambiguously

Although this isn’t exactly true, (e.g. License :: OSI Approved :: Academic Free License (AFL) shouldn’t have parens, and License :: OSI Approved :: MIT License should) we could update trove-classifiers to deprecate license classifiers as necessary to make this statement true.

Since publishing tools would otherwise have to homebrew this mapping, it seems to me like this should be something that trove-classifiers provides anyways.

I think I’m still with Donald on this one, but it looks like we’re the minority here so I won’t press it.

I also agree with this; keep the original data and translate it to the old format.

Somehow I missed this important comment and in hindsight I start to warm up to the idea of a separate field

The primary reason to reuse License it is to avoid field inflation and the fact that it has been feeling to me a little like “we have two license fields that are messy, let’s add a third one to fix them.” But Donald’s point about never being to validate anything strictly and with warnings being ignored and lost in the noise is a major one.

So let’s recap: The ability to enforce a strict validation on the client side would indeed be improved if we have a dedicated field, say a License-Expression. This means that we should deprecate the License field.

  1. The behaviour for client tools would become:
  • return warnings when using License or Classifier with a license.
  • validate strictly the License-Expression field when present (and fail too if a License or Classifier with a license are present)
  1. The behaviour for PyPI would become:
  • validate strictly the License-Expression field when present (and fail too if a License or Classifier with a license are present)
  • later, mandate the presence of this field.

Note that for PyPI the validation opportunities are improved as it becomes possible to valdiate things right away.

So after all, I am fine with using a new License-Expression field if we can get consensus on this.

Can you elaborate a bit what you meant there?

Having a mapping published alongside trove-classifiers of {classifier: license id} would be a great addition. (Side note in License :: OSI Approved :: Academic Free License (AFL), AFL is not a valid SPDX license id and is ambiguous since it does not specify a license version so we likely may not be able to provide a clear mapping there, even though in practice it may likely be v3.0 most of the times.)

Thank you++. This looks all A-OK. Please go ahead and merge. I suggested a few extra type tweaks on top.

This sounds fine, I would probably scope the warnings to author centric clients like twine, setupools, flit. I’m not sure it makes sense to have warnings on pip, just because the people running pip install are rarely going to actually be in a position to do anything about those warnings, so for them it’ll just noise.

3 Likes

To pile on a bit more, I also feel that having a field to explicitly declare the metadata as “yes, I’m actually a valid SPDX thingie” would be nice. It’s not a blocking concern to me, but, yea, I’d prefer that. :slight_smile:

If we do agree to add a field, I’m personally in favor of reusing the current License key for the use-provided string and adding an additional field that holds the value of true/false, to denote whether it’s actually a validated SPDX identifier. And tell build backends + publishing tooling (setuptools, flit, twine, poetry etc) to print a warning when building/publishing an artifact in this way.

I agree with @dstufft that pip won’t be the correct place to have any warnings, since those users likely can’t action on those warnings.

Yes, this was exactly my point about the AFL license. However, we can update these classifiers to “fix” all the unambiguous ones that are missing or don’t have valid SPDX identifiers.

To play the other side a bit, the reason I’m not pushing for this harder is because it would be ~trivial for a consumer to attempt to parse a free-form License field and unambiguously determine if it was a license expression or not. It means anyone dealing with the metadata has to do a little more work, but not a lot more work.

1 Like

It is effectively fairly trivial to parse a free-form License field and I have a fairly battle tested library to parse license expressions at GitHub - nexB/license-expression: Utility library to parse, normalize and compare License expressions for Python using a boolean logic engine. For expressions using SPDX or any other license id scheme. so it may not require much work alright.

So having played on both side, I can be happy with both. But the light I have seen now is that with a field that can take either free form or a valid license expression, there is no way to tell all the times that a parsing error is because the expression is invalid or because this is not an expression.

NB: I might help with some PR in the various tools that would need expression parsing support if this PEP is accepted.

Provide a new field for the SPDX expression but then also backfill License.

[quote=“brettcannon, post:112, topic:2154, full:true”]

ok, so if I understand your proposal correctly you are suggesting to:

  1. Use a new dedicated License-Expression field that would contain a validated SPDX license expression
  2. Do not deprecate the existing License field
  3. If the License-Expression field is populated, somehow populate the License field from it

For 3. what would it be? just push the License-Expression value to License? Under which conditions?

And if so, what would be the benefit of having both?

Would this be ability for older tools to get license information using the License that they only know of?

I want to make clear that I would put the weight of any ideas from me at the bottom of priority of what everyone else has said here. :slight_smile: My key thing is wanting SPDX expressions, preferably with PyPI requiring that one be provided for all uploads. Past that I’m flexible.

Yes.

Nope, I didn’t say that. I’m fine with it eventually going away.

If possible, yes.

That’s a PEP question. :wink:

Transition, but it’s not necessarily that critical as long as we bump the metadata version.

Isn’t it possible for PyPI to mandate a valid SPDX expression in the License field if Metadata-Version is 2.2 (should it then be 3.0 ?) or above while still accepting any string if Metadata-Version is 2.1 or below ?

Yes, but this would have the effect of “breaking” a lot of currently valid uploads that just always use the latest version of our tooling and aren’t aware of metadata versions. We’d need to introduce a deprecation period first if we did this.

Or setuptools (and other backends) could detect wether the provided License is a valid SPDX expression

  • if the expression is invalid: warn and use Metadata-Version 2.1
  • if the expression is valid: use Metadata-Version 2.2/3.0
1 Like

Shouldn’t the name of this PEP be updated? It became confusing after PEP-643 has been put forward and accepted, as it already introduces v2.2 of the metadata spec.

1 Like

@MrMino Yes you are right. That was expected and I am going to push an update there.

Since this PEP is still in Draft state, is it acceptable to provide additional edits to improve clarity and wording (without changing the meaning)? If so, I’ll send a PR with some suggested changes.

I’ll also add my 2 cents indicating that I’m quite happy to see this PEP in its current state and would be happy to help move it along from Draft to Accepted if there’s anything a community member can do to help!