[Split from PEP 639] Expressing project vs. distribution licenses post-PEP 639 (Mod titled)

First, let me say a huge thanks to @ksurma for stepping up and getting this over the finish line, to @brettcannon for facilitating and reviewing it, and to the community for their input, support and implementation!

Now that what turned out to be serious medical issues plaguing me for the past year are (mostly) resolved, now seems to be an opportune time as any to re-engage with the discussion given the questions raised surround the intent of portions of the text that I happen to be the primary or sole author of. I’ll address the main points raised here, with replies to more specific cases in one or more followups.

TL;DR:

  • The PEP always intended to and (clarity defects aside) does frame the license expression as that of the distribution packages, not the project.
  • Even in the couple instances where “project” is imprecisely mentioned, the therein-explicitly-referenced definition of such includes vendored dependencies (like Pip’s) and anything else checked in to the source tree.
  • For the edge cases where it does matter, there is no one single obvious definition of “project license” that can be useful or programmatically checkable for all or perhaps even most uses given all those same edge cases, without per-file license info i.e. duplicating SBOMs
  • Accordingly, I think we should:
    • Rectify this clarity defect in the PyPUG to make clear what the PEP intended
    • Also explicitly note the limitations in the lack of per-artifact license
    • Work toward a PEP defining per-artifact licenses (or perhaps solving the broader underlying problem of per-artifact metadata, viz. dynamic and PyPI as a whole, which is the real issue here)
    • If people are still interested in incorporating a notion of “project license” into the Python packaging metadata, separate from (or perhaps better just consuming) a SBOM, propose another PEP defining what that actually means
Sidenote: Some personal perspective

FWIW I’m a bit surprised and disappointed at the level of pessimism expressed by a couple of community members here toward the PEP already being a “failure” at its goals, the authors having negligently ignored crippling flaws in clarity, and the PEP being so little or even negative improvement for many project over the previous deeply flawed and fragmented situation that it is better to not adopt it at all. Particularly given these folks include those who were heavily involved throughout the process and contributed a significant amount of constructive feedback that improved the final version.

For some context, the PEP was first officially posted for review and comment well over half a decade ago, and was finally approved after over 600 comments, dozens if not hundreds of substantive, feedback-motivated changes, as well as at least one full (Hatch) and one partial (Setuptools) production implementation people could test with years before its finalization.

Not once that I can recall during that time did anyone raise the issue of project vs. distribution license, and only once late in its lifecycle was even the issue of the differing licenses for sdists vs. wheels brought up, which was prominently acknowledged above the fold in the Non-Goals section in the first few paragraphs of the text (criticism of the Rejected Ideas section being in a separate linked file to the contrary, which was itself a response to criticism that the extensive Rejected Ideas section accumulated over this long history of feedback and modification made the substantive parts of the PEP hard to navigate).

Furthermore, given even the existing scope had already dragged out the standards process as long as it had and had for years been blocking solving a number of related issues, adding yet another order of complexity to provide even more fine-grained license information would have likely led to even more delays to the present day in not only further debating and iterating on the PEP but also implementing it and educating users on it, rather than already solving the fundamental problems for ~>95-99% of cases it does now with the potential to iterate further with the benefit of real-world experience.

As the (original or final substantive) author of most or all of the relevant bits text here, I can confirm as correct the interpretation of the majority of people here (e.g. @brettcannon , @mikeshardmind , @steve.dower , @Liz , @notatallshaw , etc.) as to the intention of the text: that the license expression represents the license of the package, i.e. the distribution artifact(s), not whatever is (somewhat arbitrarily) considered the “project”. The edge case explicitly left to later PEPs to handle was whether it represented just the license(s) of the sdist, the union of all licenses of all distribution artifacts or something in between.

Specifically:

  • The intent and in most cases of the letter of the PEP always was to define the license of the distributed package (distribution artifacts) built using the project’s pyproject.toml-specified build system, not whatever the license for whatever one wants to define as the “project”.
  • The few places where the PEP imprecisely alludes to the license expression being of the “Project” are clarity defects in my writing that I didn’t get to fix before passing on the torch and for which I take full responsibility (although as detailed below, the explicitly referenced definition of “Project” does include vendored dependencies, at least those included in the source tree).
  • The original PEP didn’t include the pyproject.toml keys at all, only the Core Metadata fields for the packaging metadata; the former was only added later to provide a standardized way to populate the later without relying on backend-specific config.
  • The primary intended consumer of this particular class of metadata was and always has been packaging-related tooling, which are fundamentally most concerned with the license of the distribution package (per the title of the PEP), which is what actually matters to the end users using them.
  • The project may be packaged differently with/without vendored dependencies by other third-party distribution systems not under the direct control of the project authors (something that PEP author @ksurma is intimately experienced with being one of the people responsible for Python packaging at Red Hat).
    • However, the project’s authors have no direct control over that, nor can be expected to anticipate what level of vendoring might or might not be retained or stripped by any given packaging system.
    • And packagers must conduct their own due-diligence manual review of the project’s license files anyway in order to legally package it and set up their specific distribution system’s metadata accordingly regardless, so it is unclear that a “Project License” provides much or any meaningful value in those cases.

A fundamental issue that anyone trying to define a “project license” will have to contend with is trying to draw the somewhat arbitrary boundary between what is and isn’t included in a “project”, and how to handle the many edge cases—the very edge cases for which this distinction matters to begin with, yet which may in turn limit the usefulness of any single definition. For example:

  • Do vendored dependencies count if they are checked in to the source tree?
  • If they are part of the main repo vs. submodules?
  • If they are inline with the rest of the source instead of in a separate _vendor directory—inside or outside of src?
  • If they are modified/forked? To what degree?
  • If they are a single file, or multiple?
  • Or one or more functional unit(s) within a file?
  • What about tools or helper scripts vs modules?
  • Generated files?
  • Images/logo assets under non-code or other licenses?
  • Etc.

So, to define a single “project license”, either:

  • Each of those questions must be answered (which adds complexity and will necessarily limit the usefulness of the result for a substantial number of cases one way or another), or
  • Some or all left ambiguous (leaving us not that much better off than we are now), or
  • Discard the notion of a single “project license” completely and instead precisely define licenses per-path, file, etc…which you could add to Pyproject metadata, but is already handled much more thoroughly by existing SBOMs and the automated tooling surrounding them

FWIW, the “Project” used in the License Expression definition is explicitly referenced to be the PyPUG definition of the term, which then and now states:

Since most projects create Distributions using either PEP 518 build-system, distutils or Setuptools, another practical way to define projects currently is something that contains a pyproject.toml, setup.py, or setup.cfg file at the root of the project source directory.

By that definition, the vendored dependencies in Pip are part of the “project”, since they are part of the project source tree and checked in to the source repo.

7 Likes