PEP 639, Round 3: Improving license clarity with better package metadata

pradyunsg · May 10, 2024, 7:12pm

I disagree. That data file contains a lot of duplicated metadata and a lot of information that is not going to be relevant in the context of parsing a license expression. We don’t need to know the name of the xml file associated with the license ID for example, when parsing the license expression.

Unless I’m misunderstanding, the entirety of the information we need for parsing license expressions are the short form license identifiers (like MIT, BSD-3-Clause etc) and the short form of exception ids (like Asterisk-exception, etc). These two lists would be very tiny relative to any of those packages. You can see the entire list of license identifiers here: SPDX License List | Software Package Data Exchange (SPDX).

It would also not be particularly difficult to maintain this list up to date, since the SPDX folks maintain versioned machine readable files with all the metadata (license-list-data/json at main · spdx/license-list-data · GitHub). These files can be used as the source of truth for this process and be regularly regenerated + released.

FWIW, I wouldn’t mind if we ended up with these lists and corresponding parser stuff living in packaging. There’s already a bunch of parsers in there for specific METADATA fields in there so this would be an obvious addition in that regard, and it would fit in well with packaging.metadata’s goals as well.

My suggestion would be that we don’t specify a specific package in the PEP for parsing license expressions, and leave it as an implementation detail that we’d hash out separately. Not all of the relevant tooling may be implemented such that it can use a Python package, for example).

And, we need a baseline implementation for modern packaging PEPs to get to accepted state nowadays – this isn’t really an implementation design problem that we need to solve in the PEP’s text and we can leave this detail out of the PEP.

I’d prefer mutually exclusive. I think relaxing the strictness around these fields is something we can do in the future, if maintaining these as mutually exclusive is found to be problematic. We can’t become stricter without a backwards incompatible metadata 2.0 release.

The corresponding section in the PEP states (in support of the current position of “optionally? fill both”):

This would improve backwards compatibility and was suggested by some on the Discourse thread.

I don’t understand what the backwards compatibility benefit of this would be, and would appreciate it if someone could clarify this.

I think so.

As I see it, it’s an escape hatch provided by the folks behind SPDX because they understand that their approach doesn’t cover all possible license situations.

I don’t see any reason for why someone might not need such an escape hatch in Python projects^[1]. I argue the potential for user confusion is a (low cost) tradeoff in exchange for maintaining compliance with the entire SPDX license expression syntax rather than inventing our own subset of it with all the associated costs. That said, I also won’t mind starting stricter here with the specific allow list of names and expanding that if we get feedback that the restriction is annoying/harmful (which is what @RazerM seems to be suggesting, in their reply above).

Unless Python has some special legal loophole/magic/lawyer-repellent properties, that I don’t know about. ↩︎