PEP 639, Round 3: Improving license clarity with better package metadata

The API to access the data is not the problem. The problem is that PyPI does not currently record the information: there is no per distribution artifact metadata collected. In the case of licensing information, making sense of metadata prior to version 2.4 is a lost cause, unless you want to try to interpret free text licensing information is a gazillion of different formats with fall back to ambiguously defined classifiers. Metadat 2.4 makes it much easier but still not trivial: the License-Expression metadata field is not mandatory. Licensing information can be expressed linking to license text via License-File fields, or not present at all.

2 Likes

Except the one reply from Hynek, which hasn’t reacted to your suggesting that his suffering should be endured for the collective good, the only replies are mine and yours, so I don’t know on what you are basing your conclusion.

I was not the one to link time spent to suffering. I don’t consider the time I spend working on open source projects as suffering. I was indeed very surprised when Hynek jumped from time spent to suffering and that motivated my next message on the topic where I lightly suggested that if time spent working on FOSS contributions is perceived as suffering, maybe it is time to revisit something.

I think that the accepted view that maintainers of open source projects have any responsibility toward the users of their code, to the extent that maintaining the project is more important than their well being is something that need to change. Comments like yours, even if intended only as jokes, only reinforce this view.

You are accusing me of mean or inconsiderate behavior on a public forum. I prefer to defend my behavior on the same forum.

The API to access the data is not the problem. The problem is that PyPI does not currently record the information: there is no per distribution artifact metadata collected. In the case of licensing information, making sense of metadata prior to version 2.4 is a lost cause, unless you want to try to interpret free text licensing information is a gazillion of different formats with fall back to ambiguously defined classifiers. Metadat 2.4 makes it much easier but still not trivial: the License-Expression metadata field is not mandatory. Licensing information can be expressed linking to license text via License-File fields, or not present at all.

As has been pointed out repeatedly in prior discussions, license information reported for packages on PyPI is at best a strong hint as to the copyright license(s) covering downloads for that project. Anyone concerned about the actual licenses for all of the files contained within the downloads in each project’s release need to consult files shipped in those projects, or their upstream developers’ documentation. Recent metadata changes improve this, but do not address all possible complexities of applying copyright licenses in projects.

1 Like

I intended no such accusation, and I apologise that it was perceived that way. For anyone else reading this, I’ll clearly state that I don’t believe Daniele was being mean or inconsiderate.

That’s fair. They aren’t directly the same thing, though I do understand how they logically link together (primarily because the time spent in this case is time that didn’t need to be spent except that we forced it to be, and that is what maintainers often refer to as “suffering”).

Again, I apologise that my comments were seen that way. I certainly don’t endorse suffering in any real sense. Though it’s important to understand that the term is commonly used in OSS maintainer circles to refer broadly to unnecessary demands being placed upon maintainers, and in that sense most maintainers do feel a responsibility to “suffer” for their projects and their users. We hope the other parts of the experience are rewarding enough to make up for it, and they often are, but we also help by not minimizing the additional work we cause when our own contributions don’t go smoothly.[1]


  1. And for complete clarity, this is not directed at Daniele specifically, but all of us on this forum. ↩︎

3 Likes

@daniele @steve.dower you’re derailing the discussion here. This is not a topic I can sensibly split, because there is no category that would fit your back and forth here. So let’s leave it at that, otherwise I’ll have to hide the off-topic messages that users keep flagging.

11 Likes

Nope, I was quite responsible in fact and waited to release the new version of Hatchling until others confirmed that they would not be broken: Hatchling v1.27.0 changelog dead link + no tag ¡ Issue #1842 ¡ pypa/hatch ¡ GitHub

5 Likes

FTR, I was just checking the state of it and realized that it’s been merged 3 weeks ago: Switch to packaging for parsing metadata and support metadata 2.4 by dnicolodi · Pull Request #1180 · pypa/twine · GitHub. So it seems like the last bit everybody’s waiting for is a new Twine release, unless I’m missing something.

4 Likes

… and Twine 6.1.0 was released yesterday.

4 Likes

And I updated pypi-publish about an hour ago: @webknjaz.me on Bluesky. So we’re all set here, I think.

5 Likes

Thank you, @webknjaz!
With that, I just sent the final PR to the PEP: PEP 639: Mark Final by befeleme ¡ Pull Request #4227 ¡ python/peps ¡ GitHub

6 Likes

Thanks all, I just made a release with these changes:

--- a/pyproject.toml
+++ b/pyproject.toml
@@ -2,7 +2,7 @@
 build-backend = "hatchling.build"
 requires = [
   "hatch-vcs",
-  "hatchling",
+  "hatchling>=1.27",
 ]

 [project]
 ...
 license = "BSD-3-Clause"
+license-files = [ "LICENSE" ]
 ...
 classifiers = [
-  "License :: OSI Approved :: BSD License",
   "Programming Language :: Python",
   "Programming Language :: Python :: 3 :: Only",
   "Programming Language :: Python :: 3.9",

And it shows up on PyPI with:

This depended on (at least) these updates:

Thanks to everyone for making it happen, and especially @ksurma for all the PEP+spec work!

10 Likes

And pip 25.0 which was cut a few hours ago support displaying License-Expression in pip show and pip install --report

$ pip install prettytable==3.13.0
$ pip show prettytable
Name: prettytable
Version: 3.13.0
Summary: A simple Python library for easily displaying tabular data in a visually appealing ASCII table format
Home-page: https://github.com/prettytable/prettytable
Author: 
Author-email: Luke Maurits <luke@maurits.id.au>
License-Expression: BSD-3-Clause
Location: /home/ichard26/.local/lib/python3.12/site-packages
Requires: wcwidth
Required-by:
6 Likes

Could we please have documented somewhere a reference implementation in Python for the glob part that complies with the mandatory requirements of the PEP? (maybe an attachment? Or something in the PyPA docs?)

I feel that we departed from the original intention of “let’s document whatever stdlib’s glob do, so that we can implement it in other languages” to something that require a lot more validations which are not implemented by the stdlib itself.

We received something similar to the following in a contribution to setuptools: Validate license-files glob patterns by cdce8p ¡ Pull Request #4841 ¡ pypa/setuptools ¡ GitHub (thanks @cdce8p)[1]

import os
import re
from glob import glob


def find_pattern(pattern: str) -> list[str]:
    """
    >>> find_pattern("/LICENSE.MIT")
    Traceback (most recent call last):
    ...
    ValueError: Pattern '/LICENSE.MIT' should be relative...
    >>> find_pattern("../LICENSE.MIT")
    Traceback (most recent call last):
    ...
    ValueError: Pattern '../LICENSE.MIT' cannot contain '..'...
    >>> find_pattern("LICEN{CSE*")
    Traceback (most recent call last):
    ...
    ValueError: Pattern 'LICEN{CSE*' contains invalid characters...
    """
    if ".." in pattern:
        raise ValueError(f"Pattern {pattern!r} cannot contain '..'")
    if pattern.startswith((os.sep, "/")) or ":\\" in pattern:
        raise ValueError(
            f"Pattern {pattern!r} should be relative and must not start with '/'"
        )
    if re.match(r'^[\w\-\.\/\*\?\[\]]+$', pattern) is None:
        raise ValueError(
            f"Pattern '{pattern}' contains invalid characters. "
            "https://packaging.python.org/en/latest/specifications/pyproject-toml/#license-files"
        )
    found = glob(pattern, recursive=True)
    if not found:
        raise ValueError(f"Pattern '{pattern}' did not match any files.")
    return found

Is it enough/complete/correct? (at first glance I would say yes by looking at the text of the PEP, but I would like a second opinion)


  1. the example code is a modification of the original contribution ↩︎

Sorry to revive this, but the wording in PEP 639 is breaking some projects which have a need to fetch the LICENSE file from a parent directory (because the Python bindings are part of a larger project).

It is not obvious how to workaround this problem in the package configuration. Am I missing something?

See [Request for Reverting Intentional Breaking Change] New license file validation breaks projects with non-standard layout ¡ Issue #4892 ¡ pypa/setuptools ¡ GitHub for the corresponding setuptools GH issue.

1 Like

See also a similar issue in Flit. It seems that relative parent-directory paths in backends were previously causing unspecified & potentially broken behaviour, whereas Core Metadata 2.4 prohibits them. A project I maintain resolved this by adding a cp ../LICENSE LICENSE step just before building – this might be possible to do automatically in a build backend.

A

2 Likes

Some questions, if we want to support license files held outside the project source tree:

Is it OK to allow pyproject.toml to reference potentially anywhere in the filesystem when looking for license files? I can’t think of a security risk here, but that might just mean I’d make a bad malware developer :slightly_smiling_face:

How would this be handled when building a sdist? The license file would need to be copied into the sdist, and couldn’t remain in the same location as in the source tree (as there is no parent directory in a sdist). That makes this comment in the spec inaccurate:

If the metadata version is 2.4 or greater, the source distribution MUST contain any license files specified by the License-File field in the PKG-INFO at their respective paths relative to the root directory of the sdist (containing the pyproject.toml and the PKG-INFO metadata).

Build backends couldn’t copy the file to a new location and write an altered License-File value, as that would violate the guarantees given by the fact that the pyproject.toml field isn’t dynamic - the metadata field License-File would no longer be the same as the pyproject.toml field. We could make an exception for this field, but that seems likely to be messy at best, and probably a source of bugs and inconsistencies between backends.

1 Like

I think it is, but build backends should strongly consider being more safe, whatever that means to them. You generally can’t build this kind of security into an interop specification - only restrictions.

I mean, the build backend could require it to be marked dynamic if it’s not already included in the sdist? That’s easy enough for the developer to fix up at the same time as they’re setting the path.

I thought “dynamic” didn’t apply to source tree->sdist transformations? If all the published metadata matches, what exactly is dynamic about it?

You can’t supply a value if you mark a field as dynamic. So you’d have to use a tool-specific field to specify a license file if you wanted to mark it as dynamic.

It’s a bit of a grey area, TBH. People expect that they can read pyproject.toml and if a field isn’t marked as dynamic, they can treat the value as canonical. There was a lot of debate at the time of PEP 621 over whether people should be allowed to get metadata values from pyproject.toml without consulting the build backend. The dynamic field came out of that, and was intended to make it so that people could know when they needed to involve the backend.

In addition, it’s technically the license that’s the metadata, not the location of the license file. So this is all a bit secondary anyway.

(This is all something that might need considering in the context of the SBOM PEP as well - cc @sethmlarson).

That’s probably also fine for these cases. Anyone who’s making a code layout work when it’s more complex than any Python template out there is probably sympathetic to the idea that defaults can’t work for everyone.

It’s a bit of an unfortunate specification, though. There were arguments about whether absence should imply dynamic, which is about as extreme as presence implying static even when the (static) dynamic value explicitly says it’s dynamic (when it would’ve been just as easy and safe to allow a default that may be overridden, at least once you exclude 3rd party metadata readers that don’t follow the spec). Every time I read that PEP now I wish I’d spent more time digging into the specification at the time :smiley:

Yeah, well, they can. And if they ignore dynamic then they’re misreading it, except for the line that says to ignore dynamic when it conflicts with the rest of the file.

Hopefully any serious tools trying to do something useful here aren’t short-sighted enough to treat it as canonical when there are actual canonical metadata files available (in an sdist or wheel).

I don’t mean to relitigate the design, especially since I purposefully opted-out of the process at the time, but it seems we do need to properly define how to include files in packages in pyproject.toml, since people keep wanting to standardise it even when we explicitly said that this was the responsibility of the build backend.

2 Likes

Where I recall things ended up is that packaging metadata standards dictate aspects of metadata for packages, and source trees are not packages even if users may wish they were and tools sometimes try to pretend they are.

If an sdist doesn’t say a field is dynamic then you should be able to automatically infer that value for corresponding wheels. If you’re looking at a non-sdist source tree like a git repository, all bets are off. That’s the domain of non-standardized tool UX which could even (granted this is pathological) completely ignore the included pyproject.toml and replace that with its own when creating an sdist.

2 Likes