Improving license clarity with better package metadata

An important point about proprietary licenses: there is an quasi infinity of these and they are not often public and therefore hard to catalog so they can easily be handled under the “Proprietary” catch-all identifier and this should not anger anyone IMHO. In contrast there is a finite and slowly growing number of open source licenses: so using more precise Ids makes sense

Now, here is the intent of this PEP:

  1. As a package consumer I would like this license thing to be crystal clear so I know quickly if I am dealing with a known open source license or a proprietary license.
  2. As a package author, I would like this to be as simple and non-intrusive as possible.

So what about this alternative take:

  1. we continue to use the License field and have tools warn against the usage of license-related Classifiers

  2. The License field is optional and can be one of three things:
    2.1. a valid SPDX license expression (with extra Proprietary and Public-Domain ids). Everything is fine and jolly.https://spdx.org/licenses/#Deprecated%20License%20Identifiers
    2.2. an invalid license expression string, some other string, an empty string or is not present: the license is assumed to “Proprietary”. Tools are encouraged to provide a warning.

  3. At a later stage, in a new metadata 2.3 version the license-related Classifiers are deprecated entirely.

And entirely separately, I could later craft another PEP related to Pypi such that only non-empty and valid license expression are allowed (TBD with or without proprietary licenses)

That’s a fair point and a topic of recent discussions @ SPDX to create a “namespace” concept and effectively allow any licenses. Today, they are taking effectively a stance towards open source (and in some cases at least some “source available” proprietary licenses). I agree that Python, the PSF and this PEP should not care or impose any restrictions of which license can be used for a Python package in general.

1 Like

@dustin if the license-related “Classifiers” were to be deprecated, how would twine/warehouse behave? would warehouse still accept the deprecated license classifiers? or would they be rejected when you try to upload a package?

@pombredanne There’s more details in https://github.com/pypa/warehouse/issues/2996, but essentially PyPI has the notion of a “deprecated” classifier, and if you try to publish a distribution that uses a deprecated classifier, you get a 400 error which is displayed by twine.

Twine does not have a mechanism right now for allowing an upload to succeed, but printing some warning/notification.

1 Like

Yes, I think it would be better to follow the process. Otherwise, it gives people the impression that it’s okay to post drafts there while they’re still being prepared. You can still have the draft in a personal branch.

I updated the topic with a link to https://github.com/pombredanne/spdx-pypi-pep/pull/2 instead and closed https://github.com/python/peps/pull/1148 accordingly

1 Like

Hello,

I originally reported issue #2996 after facing issues with license classifiers. I am mostly interested in making packaging easier for distribution developers (people who work on Fedora, Debian, *BSD, etc.).

The current situation (license classifiers + a “License” field that can be used as a fallback) is a bit confusing for package authors, and also makes it hard for automated tools to figure out what license is used. My workflow when trying to identify a license if the following:

  • Are the classifiers non-ambiguous? If yes, we are done.

  • Is the License field non-ambiguous ? If yes, we are done.

  • Is there a LICENSE file in the project’s repo, and can we figure out what it is ? If yes, we are done.

This gives the following crazy code crazy code in a tool I maintain. I would love to ses non-ambiguous SPDX identifiers make their way into Python packaging.

Rest assured I do not want to turn all Python programmers into license lawyers :slight_smile: There are already lots of things people do not care about, but end up doing anyway: writing tests, writing a setup.py file, etc. Licensing should be no different. An unlicensed package is hard to include in a distribution, since maintainers have strict rules about what they may or may not include in their distro. As @pombredanne pointed out, people who do not care can select a license such as MIT and use it for all of their projects. I am also under the impression that most people choose a license for their packages, but cannot express it in a non-ambiguous way because of the current limitations. For instance, scanning ~180k packages on PyPI, I counted more than 4500 different values for the “license” field.

I like the idea of repurposing the “License” field. Some people already use SPDX identifiers in this field (I do for my BSD-3-Clause projects, and I know other projects do it too). Which means some projects would be compatible with the new semantic without changing a single line.

I understand that changing the semantic of a field might be frowned upon, so I do not have a strong opinion on this and would be fine with a new field as well.

We could probably be quite lenient in the beginning and only enforce stricter rules once we’re confident the new semantic works well enough. It would probably be great to have a fallback option (ie “License: dont-bother-me-your-new-field-is-buggy-as-hell”): this would let us analyze why the author was not able to specify the license they wanted to use and fix issues.

Warehouse can sometimes “refuse” an upload and return a 400 error. In the future, maybe it should do so when the license is not properly defined and the issue can easily be fixed. For instance, if the license is set to “GPL”, the upload could fail and the user could be shown a message listing all the possible GPL identifiers from the SPDX list.

Speaking as someone who is interested in distro packaging, what I really care about is having reliable and useful info from https://pypi.org/pypi//json. I think setuptools/flit/poetry/etc. could probably tell the user “hey, by the way, you should probably drop these classifiers, and use XXX instead”. If I remember correctly, poetry already recommends using SPDX identifiers.

I would indeed like such tools to be community-maintained.

1 Like

Regarding the change process, the specifications section under packaging.python.org is intended to be like the Python language reference section under docs.python.org: clarifications and correction of errors and accidental omissions in existing specifications don’t require a PEP, but additing new fields or making significant changes to existing fields is likely to still need one in order to fully document the rationale for the related design decisions.

The old process (which didn’t work very well, hence the change in PEP 566) tried to use the PEPs themselves as both the reference document and to provide the rationale for change from the previous iteration, which made it hard to tell what was actually changed and what remained the same relative to the previous version.

https://www.pypa.io/en/latest/specifications/#handling-major-updates (and the preceding section on clarifications and minor updates) attempts to document that distinction, so if there’s wording that could be clarified there, suggestions would be appreciated.

2 Likes

Got it, thanks for the clarification @ncoghlan (and for the confusion @pombredanne). :+1:

2 Likes

Based on https://poetry.eustace.io/docs/pyproject/#license @sdispater is indeed recommending SPDX ids and is listing some. This is not yet expressions but as close as it gets.

@takluyver’s Flit doc is mostly consistent with the Core metadata docs:

license
The name of a license, if you’re using one for which there isn’t a Trove classifier. It’s recommended to use Trove classifiers instead of this in most cases.

@jaraco’s Setuptools also lists the license_file which was originally introduced by the wheels tool. This doc section on medatata also lists using both the “license” and the “classifiers” fields which is not aligned with PEP 566 doc and the Core metadata license-related texts at packaging.python.org

Using a single license_file (singular) in wheels “metadata” has since been replaced by the plural license_files list by @agronholm with @njs support based on some ticket I had entered on wheels originally @ bitbucket)

See also this doc in wheels: https://github.com/pypa/wheel/blob/b8b21a5720df98703716d3cd981d8886393228fa/docs/user_guide.rst#including-license-files-in-the-generated-wheel-file

There used to be an option called license_file (singular). As of wheel v1.0, this option has been deprecated in favor of the more versatile license_files option.

@pf_moore would you agree that handling license files also needs to be addressed in this PEP so we have a clean, consistent and properly documented one single way to handle licensing documentation in packages?

Thanks for chiming in Nick! Much appreciated!

Let’s write this down somewhere in the PyPA Specifications page?

1 Like

I think it would be acceptable for the proposed change (and the PEP) to not make any comment about license files, but I think that doing so would be better - as you say, it makes the proposal into a complete review of licensing, and a proposal to address the whole area.

If you;re happy to expand the scope to include license files, it seems like a good ideal (in general, your approach with the PEP has been very good so far, so I’m happy to trust your judgement on questions like this :slightly_smiling_face:)

Nick did point to here, but maybe that needs to be more discoverable or clearer? Both @di and I missed it, which implies the answer is “yes”, but I’m not sure what could be improved…

I think the problem is actually the opening paragraph on https://packaging.python.org/specifications/, as the link to the process page is the subtle “pypa.io” one at the end.

It would probably be a lot clearer if the subsection titles were repeated as bullet points on the main spec page, with direct links to the relevant part of the process page.

I have always wanted to add more categories to wheel, including a “docs” category. In the wheel you would have a *.data/docs or e.g. *.data/license which could be installed somewhere sensible.

+1 on SPDX metadata

1 Like

I haven’t read all the posts here, but hopefully, when this is all sorted out, someone will create a ticket against wheel with a link to the spec so I can then implement it. Thx :+1: