PEP 639, Round 2: Improving license clarity with better package metadata

Great, thanks Brett! To note, this also resolves a lot of issues that many users have brought up with the current license tag system, and for a couple years now seems to be the de-facto intended path forward from PyPI’s end rather than adding and maintaining lots of new license tags while deprecating many old ones.

I would appreciate feedback as well; they intend to address some significant issues with the current implementation, as discussed on the relevant linked Wheel and Setuptools issues, and follow what was suggested there to only require small tweaks in the implementation and not pose meaningful backward compat concerns, but before standardizing it I’d certainly want to hear others’ thoughts as to any unforseen issues with that approach and any other potentially viable alternatives.

I’d be curious to hear a concrete implementation proposal for the latter, in order to better compare it as a potential alternative. I considered something like and explored a few speculative possibilities, but I didn’t find many benefits to that approach that would justify the spec and implementation complexity and other downsides over just making a couple small tweaks to the existing system that has been mostly implemented in many tools already.

In particular, we’d have to design, agree on, prototype and implement a mechanism to to this in existing metadata producers and consumers, which includes addressing questions like:
* How to store the original filenames/paths, or how else to identify each set of file contents?
* How to map the file identifiers to the license—some kind of embedded data structure?
* What would the API look like for accessing them? Just dump all the text? Get a particular license?
* How do we handle encoding, escaping and special characters?
* What do the PEP 621 keys look like? Co-opt what we have now, or design something new?
* What about in other tools? Do we suggestion tools change the behavior of the existing license_file/license_files in wheel and setuptools, or add yet another config setting?
* Etc…

There’s also other potential concerns:

  • For projects with a lot of license files (e.g. Spyder, a medium to larger project, has just a NOTICE.txt file of 200 KB, and a number of other license-related files) this could potentially impose a small but non-trivial performance penalty reading and parsing METADATA (currently 15 KB, including a large readme) on every access.
  • Embedding the license texts within an existing machine-formatted data file, as opposed to leaving them in their original files, makes them more opaque and less easy and obvious to access, and could potentially conflict with license provisions that require explicitly preserving the files/their names (e.g. Apache with NOTICE) and with making them sufficiently user-visible/discoverable.
  • The current proposed approach doesn’t impose any significant further difficulties on users and tools currently adding and accessing license files from wheels, while any such new embedded approach could raise backward compat concerns, in addition to imposing an additional burden on tool authors who’ve already implemented the current approach.

However, I’d appreciating hearing more from the side of anyone advocating an embedded approach as to the potential advantages I may be missing, and be able to more fully consider a concrete proposal that tried to answer some of the questions above. I don’t want to get too hung up on this, as the original request in adding a formal specification for storing license files in wheels was to just formally codify and refine the the existing implemented behavior, but I don’t want to just dismiss it before hearing from others who may have better ideas. Thanks!

1 Like