[Split from PEP 639] Expressing project vs. distribution licenses post-PEP 639 (Mod titled)

I haven’t read this discussion in full but are we saying that if, during the wheel build process, there is extra content shipped that the license field in pyproject.toml must be marked as dynamic? I have a project that copies an extension module from CFFI at build time to remove it as a runtime dependency. I then read its license file based on distribution metadata and copy that into the wheel. Is that sufficient?

2 Likes

If the License-Expression field in the wheel would then be different than the value of license in pyproject.toml, then yes. And furthermore, the value must be removed from pyproject.toml and supplied to the build backend via a tool-specific field (because dynamic fields can’t have a value specified).

That’s not related to anything specific about license metadata, it’s simply how static and dynamic fields in pyproject.toml and core metadata work.

My understanding is that this is sufficient from a legal standpoint. My own implementation of this also includes notes about what exactly the license applies to like “the libfoo*.so file is bundled into the Linux wheels and the following license applies:”.

If you are asking whether it is sufficient from the perspective of PEP 639 then firstly I don’t think the PEP or the metadata field it defines has any legal standing. Rather it is intended to provide license information in a more helpful way for automated tools than needing to read the free text license files. However as far as I can tell the PEP is underspecified: it does not specify what the meaning of the license field is explicitly in any case but especially not your case which is explicitly listed as “out of scope” in the rejected alternatives section (confusingly in a separate page from the main PEP document):

3 Likes

I should clarify here that this depends on what the license is of the thing that you are bundling. You didn’t say what the license was but I’m assuming it essentially has something like “you may redistribute … provided this copyright/license notice is included”. You should read that text to see what it is asking you to do.

CFFI uses the MIT license: cffi/LICENSE at main · python-cffi/cffi · GitHub

So I guess from a legal standpoint this is fine like I thought however I should start making the field dynamic. Fortunately this case is easy however it would be quite the hardship if the dependency had a complex licensing setup, so much so that I probably would be reluctant to extrapolate what the expression would be and not define it.

That is not clear. I don’t see anything in the PEP that says so and the PEP explicitly says that this case is out of scope.

It honestly doesn’t matter. My point is you said my view is “backwards” as if that was fact and thus I was factually wrong. What I’m saying is I disagree with that and think what you’re expressing is an opinion instead of fact and I felt a bit attacked by your phrasing.

We all know “SHOULD” and “MUST” are not the same thing. :wink: Unless there’s an SBOM definition that I’m unaware of that outlines minimum data to include, I don’t think we can assume license details will be included.


So I think people have different ideas of why license details should be included in core metadata and pyproject.toml. For me, it’s about providing at least a hint as to whether the license of what I’m going to install is compatible with what I want (e.g. no GPL). Now that doesn’t mean it’s legally binding and you still have to do your due diligence (at least I assume; IANAL), but it’s better than having to crack open every project and inspect every included license by hand to see if the license is compatible with what you want. I will say I signed up to be PEP delegate because this was the use-case I was targeting.

But it sounds like some people interpreted/want the license expression to only pertain to the source of the project itself. I’m interpreting the motivation of this view to be to make it clear what the source tree is licensed under and to avoid wanting to update license-expression via back-ends if they add in vendored code.

I think we can choose to go with either as pyproject.toml specification - Python Packaging User Guide and Core metadata specifications - Python Packaging User Guide don’t dictate either view right now. I don’t know if that’s just a spec update that Paul can approve or if it requires another PEP to be very explicit about the purpose of the field, but it sounds like people do want to be explicit.

3 Likes

I think the most coherent strategy is for the field to correspond only to the project and shipping non-project code would only require that the distribution include the associated extra license files (or whatever else is required by law).

2 Likes

Both can be true. pyproject.toml grew out of the distribution side, but has utterly outgrown that tightly delineated context. Nowadays it’s a catch-all for project metadata (it’s in the name…), and that is IMO a much more relevant concern that what the original intention was. The [project] table in pyproject.toml needs to be about the… project! Not some distribution-only thing.

4 Likes

It is late here so I am not going to read and respond in detail but let me just say that I am sorry if you felt attacked. That was not my intention.

2 Likes

IMO clarifying the intended usage of the field would be a spec change only, so I could approve it.

However, changing it to say that it’s distribution metadata would require PyPI to stop displaying it as project metadata, so I’d want confirmation that PyPI were on board with doing that before I’d approve that particular option.

In both cases, I’d expect the clarification to include an explanation of how the PEP intended the other alternative to be recorded.

2 Likes

Not quite. I want the license to be recorded accurately for all things and I want back-ends to update it. I showed how this is currently done in python-flint’s case above (cat wheels/LICENSE_linux_wheels.txt >> LICENSE). I want better tooling that can handle license bundling and I want to record unambiguously what was bundled and what license applies to each bundled file.

There are at least three different concepts of the license that are all relevant in different situations:

  1. What I would call the “project license”.
  2. The license that applies to all the contents of the sdist.
  3. The license that applies to all the contents of the wheels.

The situation is not so dynamic that these cannot be written statically in the pyproject.toml although e.g. for python-flint’s wheels it is different for Linux vs MacOS vs Windows. In principle the difference could be more fine-grained like say Linux aarch64 wheels have different things from the x86-64 wheels or 3.13t vs 3.13 etc. In general the license for wheel contents is different per-wheel and ideally the tooling that does the bundling could be configured to help make the license information as accurate as possible for each wheel.

There are even more cases than the three I listed above e.g. the license for say the conda package which could be (1) or (2). There are also subtleties not captured here so that e.g. “the license of what I’m going to install” is not necessarily the same as “the license of the contents of the sdist”:

  • The sdist might vendor a build dependency that does not get installed.
  • In principle building from the sdist could download and build other dependencies that do get installed.

I don’t think all cases can be reasonably captured in metadata but the three I enumerated are all needed. For PyPI (2) and (3) are clearly needed. One reason for needing (1) as distinct from (2) or (3) is because of downstream repackaging: PyPI is not the only distributor of Python code and it might be that (1) is the license everywhere else. An important reason for needing (1) is also for the contributor side rather than the consumer side: license clarity is important there as well.

The common case for Python packages is that (1) == (2) == (3) so there is no distinction between these and that is what the PEP is aimed at. An also common case though because of the nature of PyPI wheels is (1) == (2) != (3). That is the python-flint case and probably applies to most projects that ship binary wheels. The pip case is (1) != (2) == (3) which I think is less common but again happens due to the nature of Python packaging and the need for packaging tools to bootstrap.

The metadata described in the PEP has been designed with the implicit assumption that (1) == (2) == (3) but that is not accurate for many Python packages. In the discussion now it seems clear that the implicit expectation of the PEP was for License-Expression to mean at least (2) if not both (2) and (3) (hence the PEP says that (2) != (3) is out of scope). The way that the metadata field is defined though as implicitly being the “project license” means that maintainers of Python packages are generally going to want it to be (1). Usually that will be the same as (2) but e.g. in pip’s case it isn’t.

For the use case where you want to know the license of all of the contents of the sdist I think that the best thing is to have a metadata field that unambiguously says that that is what it is e.g.:

Sdist-Contents-License-Expression = ...

The metadata is most likely to be recorded accurately if it is clear what it should apply to and if we can avoid conflating it with the “project license” for any cases where that might be different. If we don’t allow these things to be recorded distinctly then there is a risk that projects will simply interpret the field how they want which undermines the “I want to know the license for all sdist contents” use case. If e.g. PyPI, GitHub etc are going to extract the License-Expression field and display it on e.g. the repo landing page then you will see projects recording it as (1) regardless of whether any “specification” says it is supposed to mean (2).

5 Likes

I agree with you and @oscarbenjamin that it would be simpler if this PEP referred to a project’s license(s) rather than a distribution package license(s), for a variety of reasons already outlined.

However, on reviewing, while the PEP mentions both projects and distributions, the language and framing lean heavily toward distributions, e.g., the rejection of handling different licenses between sdists and wheels assumes license metadata applies to the distribution.

If it is decided that PEP 639 core metadata fields should refer to the project and not the package distribution then a significant amount of the language should be updated or it would seemingly contradict much of the rest of the PEP.

As someone with the same use case that @brettcannon described, evaluating a distribution’s license(s), this PEP might easily be read as supporting that use case, unless it’s made very clear up front that it doesn’t. I think a change to mean the project’s license(s) should update both the spec and historical document to avoid confusion.

I’m concerned that clarifying how to record the alternative could require its own PEP, beyond what this PEP can recommend, and making this a hard requirement could block removing this ambiguity.

1 Like

I didn’t want to close the door on someone coming up with something that worked as a simple clarification, but my expectation is that saying the metadata records the project license is something that could be handled as a clarification, but if the desire is to record the distribution file license in the license field, that would be complex enough to need a PEP.

I don’t want this to come across as “do it the way I prefer or I’ll make you write a PEP”, though. I’m trying hard to keep my personal (strong) opinions and my position as potential PEP delegate separate here.

Actually, @brettcannon, as PEP delegate for PEP 639, is this something you would rather make the decision on? I’m happy to do so, but I don’t want to tread on your toes if you feel it’s something you’d prefer to handle.

2 Likes

I think it would be best if I divested myself from the decision as I don’t want my personal use-case to overly influence myself (which I can already tell I will).

2 Likes

No worries. I also have strong opinions on the matter (opposite to yours, unfortunately). I will do my best to not let them influence me, but if someone feels I’m not managing to do so, please say so.

At the moment, I don’t see any consensus forming, and @ksurma hasn’t made any comment as the PEP author. That leaves us with no way forward, unfortunately - I can’t see any way I can make a decision in that circumstance without taking my personal view into account.

If I had to make a decision right now, I would say that:

  1. Resolving this issue requires more than a simple “text clarification” of PEP 639.
  2. Projects that don’t have any concerns about the interpretation of the new PEP 639 license data can use it. Projects that do (which include pip, python-flint, and potentially any project that uses auditwheel) should continue using the legacy license = {text = ...} format. It goes without saying that the legacy form must remain supported until we get a resolution that suits those projects’ use cases.
  3. Any proposal for resolving this ambiguity will need to be raised as a new PEP.

That’s about the only solution I can think of that I’d be happy with which clearly doesn’t just mandate my preferred option.

I’ll let the discussion proceed for a little longer, as I don’t like the above decision, but if we don’t get consensus, and no-one comes up with a better option, that’s what I expect to end up with.

6 Likes

I don’t think it’s generally possible, technically speaking, to automatically and accurately construct a license expression from license files containing free-form text.

2 Likes

To be clear the reason python-flint does not already have license = "MIT" in its pyproject.toml is because when I tried that (before this discussion) I found that meson-python didn’t support it yet. Otherwise I would have added that since it seemed to be documented in the packaging guide as the new way to write the license metadata in pyproject.toml. That would have been accurate for the contents of the sdists but not for the contents of the wheels.

I think that leaving the situation ambiguous will result more in a proliferation/continuation of ambiguous/inaccurate metadata than in projects not adopting the new metadata format.

1 Like

Taking a step back, the vast majority of Python projects aren’t affected by this edge case, their project license, sdist license, and wheel license will all be the same, and the PEP 639 fields can be used without worry.

This edge case was only brought up a few days ago on this thread, and it’s been a typical holiday period in many European countries this last week, so I think it’s more than reasonable to wait a couple of weeks for the PEP author to respond, especially without any prior knowledge of their availability to address this kind of issue.

I think the concerns and desired use cases have been thoroughly expressed and there isn’t a lot of value continuing until either the PEP author responds or someone proposes a new PEP which would be in a new thread.

6 Likes

Thank you for the ping. Holidays and a pile of work, but I’m finally here.

First, let me say that even though the issues we’re currently discussing surfaced with PEP 639, they have always been there - with the legacy way of specifying licenses there weren’t means to distinguish between projects, sdists and wheels either. PEP describes a format of the licensing field, but keeps the same ambiguity as the legacy system, as to what the format refers to (otherwise, I believe, it would be much harder to approve). Having it there enables the discussion about the corner cases and needs of the community. How to describe particular artifacts is a subject for another PEP. If pip uses the new format to express the same license it already declares, things won’t get worse (but will gain machine readability). What is the advantage of keeping the legacy license = {text = ...} over the new format?
How can the legacy way better accommodate projects with code injected by auditwheel? If different artifacts are differently licensed, isn’t it a case for the dynamic field?

Project vs distribution: Glossary incorrectly defines SPDX expression as describing a project (definition of this term shouldn’t make any assumptions of what it refers to). For me, similarly to @brettcannon, it was always the expression of the resulting artifact - it describes what is distributed. If I install pip from PyPI, I don’t only get MIT-licensed code. If we treat SPDX expression as an abstraction over the included license files, nothing prevents the main licensing file, or a special notice, to state: “The project is MIT, vendored dependencies are licensed according to their respective license files”. Is there a need to store the main project’s license as a separate metadata field? What’s the use case for consumers of such a field? Or is project license more of a thing of display (on the git forge, on PyPI)? If deemed necessary, it seems like an issue for a fine-tuning PEP (e.g. adding an optional Source-License-Expression as @brettcannon suggested).

Side note, since repackaging has been mentioned here - from my POV it’s more convenient to get hinted about vendored dependencies by looking at a long license expression and delete unwanted bits as I devendor chosen libraries, than the other way around. In a perfect world I’d scan the files in the package and get an accurate representation of the license expression, but we’re not there yet.

9 Likes