File values for license metadata

According to PEP 621 a Python package can be distributed with a license metadata field set to either file: link_to_license_file or text: some_license_text.

Bokeh sets file but this results in a very awkward rendering on PyPI:

and additionally a very awkward shields.io badge (that’s not an hrule below, that’s the actual badge with all the text):

Is this intended? I assumed since we have a file, it makes sense to specify the file as the single source of truth. Also that is what the example in the PEP shows: https://peps.python.org/pep-0621/#example Should we just duplicate the file contents and specify text = "BSD-3-Clause" or similar, inside pyproject.toml, instead?

1 Like

We have the same problem in meson-python. The explanation seems to be that PyPI maintainers decided to punt on actually implementing support for the license field in PEP 621, and are waiting for PEP 639: The way license appear on PyPi seems wrong · Issue #129 · mesonbuild/meson-python · GitHub.

:confused: Would { text = "BSD-3-Clause" } function as a workaround for now, then? (i.e. has someone tried it? I don’t want to bother with a change, if it won’t be useful.)

AFAIU, if you omit the license metadata entry, PyPI takes it from the classifiers “License ::” entries.

1 Like

FWIW, specifying the license with file works fine for several of my projects.

For example:
PyPI
pyproject.toml

So the behavior you observe is clearly not intended.

PyPI presumably checks the license file against a set of known licenses, and uses the shorthand if a match is found. Bokeh’s LICENSE.txt file is not identical to the OSI’s version of BSD-3-Clause. The only dynamic fields in the OSI’s version are <YEAR> <COPYRIGHT HOLDER> at the top of the license, while bokeh additionally replaces “the copyright holder” with “Anaconda” in the license text. The OSI’s version also uses numbered bullets for the three clauses, which bokeh does not.

The above is just a guess. Github is clearly able to figure out the intended license, so if I’m right, PyPI’s matching criteria are probably too strict.

Edit: Actually, reading the linked github thread above, the reason it works for me might be that I use flit for packaging. Apparently it omits the “License:” metadata entry entirely when creating PKG-INFO (possibly to work around this very issue?) and just relies on the classifier instead.

:+1: This seems plausible. It seems like the best WAR at present is just to omit the license field altogether.

This is https://github.com/pypi/warehouse/issues/12392. PyPI isn’t doing any fancy matching to determine the license, we just have a simple heuristic: it takes the license from the classifiers, and the first line of the license file, and combines them depending on what is/isn’t present:

The issue appears to be that some build tools have recently started removing newlines from the License metadata field, so the ‘first line of the license file’ ends up being the entire file. You can see with Bokeh that it’s taking BSD License from the classifiers, and then putting the entire LICENSE.txt file in the parentheses.

I’m not sure what the reasoning behind that change in the build tools would be, because AFAIK newlines in this field have always been valid, and PEP 621 didn’t change anything regarding that.

I created https://github.com/pypi/warehouse/pull/12653 to ‘fix’ this for releases that have long License fields without newlines, but that’s about the best we can do – IMO the right fix in the short term would be for the build tools to ensure that metadata retains newlines, and in the long term it’s PEP 639.

To be clear, the PEP 621 authors punted on defining a proper License field, not the PyPI maintainers on implementing it. There’s just some overlap between the two sets :wink:.

2 Likes

Thanks for the additional context @dustin

The issue appears to be that some build tools have recently started removing newlines from the License metadata field,

Just clarifying, then in this case that would be setuptools [1]? @abravalheri Do you know if this behavior is intentional? At a glance, I didn’t see any open issues on the setuptools tracker that looked like this. I’m happy to open one if it would be useful.


  1. since: build-backend = "setuptools.build_meta" ↩︎

1 Like

I don’t recall an intentional change in this direction (but I might be missing something).

I remember that recently we modified pypa/wheel because it was having problems with utf8: Allow `METADATA` file to contain UTF-8 chars by abravalheri · Pull Request #489 · pypa/wheel · GitHub

Not sure if this change might have an undesirable side effect…

I had a look on the inspector and the License field in the METADATA file for bokeh 3.0.2 seem to have some sort of newline characters
Screenshot_20221207-222230_Firefox
(Maybe it is not the same as PyPI expects).

@abessman Actually can you elaborate on why you think this is the case? The OSI license specifies

Copyright <YEAR> <COPYRIGHT HOLDER>

And Bokeh starts with

Copyright (c) 2012 - 2022, Anaconda, Inc., and Bokeh Contributors

In this case copyright holder is all of “Anaconda, Inc., and Bokeh Contributors”, I don’t see how that is inconsistent with the OSI version. Are you referring to the (c) or comma in between, which do seem to be extra?

Hmm, that’s what I’d expect. Seems like build tools are off the hook, and this is coming from upload clients, something about how they’re interacting with PyPI, or PyPI itself (although I’m not aware of any changes here).

Looks like this might be a bug in pkginfo; this strips the newlines:

$ pkginfo bokeh-3.0.2-py3-none-any.whl | grep license:
license: Copyright (c) 2012 - 2022, Anaconda, Inc., and Bokeh Contributors All rights reserved.  Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:  Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.  Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.  Neither the name of Anaconda nor the names of any contributors may be used to endorse or promote products derived from this software without specific prior written permission.  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

but with the same METADATA file read with the stdlib email module, they’re preserved:

$ python -c 'from email.parser import Parser;print(Parser().parse(open("bokeh-3.0.2.dist-info/METADATA"))["license"])'
Copyright (c) 2012 - 2022, Anaconda, Inc., and Bokeh Contributors
        All rights reserved.

        Redistribution and use in source and binary forms, with or without modification,
        are permitted provided that the following conditions are met:

        Redistributions of source code must retain the above copyright notice,
        this list of conditions and the following disclaimer.

        Redistributions in binary form must reproduce the above copyright notice,
        this list of conditions and the following disclaimer in the documentation
        and/or other materials provided with the distribution.

        Neither the name of Anaconda nor the names of any contributors
        may be used to endorse or promote products derived from this software
        without specific prior written permission.

        THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
        AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
        IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
        ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
        LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
        CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
        SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
        INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
        CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
        ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
        THE POSSIBILITY OF SUCH DAMAGE.
1 Like