Also, Rust’s glob::Pattern
appears to follow this rule.
I just merged the PR.
Thank you. With that I consider the PEP done and want a pronouncement.
I will try to get to it as soon as I can (hopefully this week)!
After reading through the PEP again, I’m happy to say I conditionally accept it with one change!
For the change, I think consuming tools SHOULD reject invalid globs (h/t to @konstin for catching that). If @ksurma is okay with that change, I have a PR ready at PEP 639: Make the policy around globs tighter by befeleme · Pull Request #3913 · python/peps · GitHub . After that we can mark the PEP as conditionally accepted.
As for the conditions:
- Two build back-ends implement the PEP (which shouldn’t be too arduous since I know Hatch basically already does, so it’s really just one more)
- PyPI implements it (which will probably require
packaging
to gain some APIs around this like @pradyunsg volunteered us for )
Thanks so much to those who participated in the discussions, Philippe for the initial draft, @CAM-Gerlach for their updated version of the PEP, and @ksurma for reviving the PEP again and seeing it through to the end!
Thank you!
I’m okay with this change, please merge the aforementioned PR.
Merged!
And the PEP is now officially marked as provisional!
I know I’m a bit late to say this but it would be pretty useful to downstream packagers (i.e. Linux distributions) if the association between each license file and its corresponding SPDX identifier is encoded into the pyproject.toml
. It’s a standard pattern for the popular licenses to be put in a common-licenses package then other packages delete their copy and merely reference the shared copy.
Secondly, Debian/Ubuntu (and to some extent, Fedora) not only like to differentiate between “this package can be used under either x or y license” versus “some of this package is licensed x and some under y” but they, in the latter case, expect a manifest of exactly which files are under which license.
I wouldn’t say that either of these are particularly worth addressing but just wanted to flag that there are still gaps.
And in case it provides a starting point for anyone dealing with the legacy classifier → SPDX migration, I’ve have trove-spdx-licenses.json. Ambiguous, unambiguous and no-op classifiers map to lists, strings and "ignore"
respectively.
Just a bit.
Are you speaking as a Debian representative or user? @ksurma works on Fedora so I have operated under the assumption Fedora specifically was covered and I viewed that as being “picky enough” to cover the vast majority of cases.
If you want to propose a follow-up PEP that updates license-files
to take an object that is a file path and an SPDX expression for that license then you could (e.g., {path = '...', license = '...' }
). But since the globbing support was specifically called out as useful and that won’t work in a case where you want to specify a SPDX expression per file, this would be an addition instead of a change and thus doesn’t require retracting the PEP which has been open since Aug 2019.
The remaining technical gaps are dwarfed by the fundamental credibility gap: downstream consumers that genuinely care about licensing accuracy still need to audit the entire package to confirm that the claimed license is actually consistent with the package contents.
As a result, the new metadata doesn’t (and isn’t intended to) let distros drop any of their existing licensing auditing and file classification processes when consuming upstream sources.
What it does do is enable more deterministic “first pass filtering” in license management processes, where components with “definitely not acceptable” licenses can be ruled out before kicking off any more detailed scans. On the positive side of things, the best those first pass scans can say is “Potentially acceptable, assuming the claimed license is accurate” (and then provide pointers on where to start with a more in-depth analysis to determine the accuracy).
But you could also use this to store the audited metadata. Maybe after adjusting it first – with a patch you can now offer to the upstream project.
Just as someone who sometimes ships their code as Linux packages. If I contradict a real representative then just ignore me.
FYI, although the PEP is provisionally accepted, twine can’t upload packages with metadata 2.4 due to Twine fails to upload packages with latest metadata-version · Issue #1146 · pypa/twine · GitHub.
That’s to be expected right now. PyPI has stricter validation of the metadata now and the process for rolling this out is gonna be: packaging.metadata
gains support for loading and validating this stuff, and then PyPI picks up this newer version.
My expectation was that packaging
is what build-backends will use as well to make this a logical ordering, but it sounds like you have a package that was built without waiting on this?
AFAIK, packaging
does not implement parsing and validation of project metadata defined in pyproject.toml
and its translation into package metadata. A few build backends use pyproject-metadata
for this.
That’s correct.
Am I correct in understanding that maintainers of packaging
are now amenable to it growing those capabilities?
Am I correct in understanding that maintainers of
packaging
are now amenable to it growing those capabilities?
This is getting off-topic, but it’s a possibility.