PEP 639, Round 3: Improving license clarity with better package metadata

Agreed, let’s not revisit PEP 621 and dynamic. But I agree the problem here is that we’re trying to include files[1] - and that’s something that will be more general than just license files (SBOMs will have the same issue).

Here’s a proposal. This is the simplest change I can think of that addresses the issue.

  1. Remove the prohibition on .. in license-files (in pyproject.toml). Retain the prohibition in the core metadata License-Files, though.
  2. Build backends MAY relocate project license files which originate outside the project source tree when storing them in the sdist/wheel. However, if they do, they MUST use the same layout in the sdist and the wheel, and the License-Files core metadata MUST NOT be marked as Dynamic, so that the layout defined by the sdist metadata is preferred over the one in pyproject.toml when building from sdist. Where to place such relocated files is at the discretion of the build backend. Build backends MUST NOT change the location of license files that are within the project source tree.

I think that’s enough to support cases like ../LICENSE.txt.

There is one problem I can see. If you want to add license files at wheel build time, License-Files in the sdist has to be Dynamic. But that’s incompatible with the new rule for when the build backend relocates license files. So it’s not possible to have both out-of-tree license files and licenses added at wheel build time. I think that situation is going to be sufficiently rare that we can live with the limitation, though.


  1. although the metadata is the content of the file, not the file itself ↩︎

1 Like

I think the MAY and MUST should be swapped - the files have to be relocated out of necessity (or else omitted), so MUST makes sense. We might want to say that they MUST be moved into the root of the distribution (but what does that mean for a wheel?) or another specific location.

Saying that they MUST NOT relocate files is also unnecessarily strong (what if the user specifies that they want it relocated?). The meaning and consistency of metadata is what’s under our control here, so saying that License-Files should only be modified when relocating files that originate from outside the project[1] and must be resolvable to the file ought to cover it. (And the second part of that has already been said in PEP 639, I assume.)

I don’t see why then restricting Dynamic adds anything? It’s the simpler path to move once and then keep it at the same relative location, sure, but it shouldn’t make a difference to how the metadata is interpreted? Whether the build backend prefers pyproject.toml over PKG-INFO or not for this particular property is just a decision/bug in the backend - if we haven’t explicitly said which file is the source of truth for building from an sdist, we should say that. There shouldn’t be any need to change it just for one property. And if for some reason the file moves between sdist and wheel, better to leave that in the hands of the user than trying to specify it away.


  1. I dislike assuming the project will be the entire directory, but apparently I’m the only build backend that doesn’t assume that, and I’m already quite happy to ignore requirements that don’t help my users :slight_smile: ↩︎

1 Like

Sorry, yes. I reworded my suggestion and missed the fact that that turned the MAY into a MUST.

Regarding Dynamic, you’re right. I was being over-prescriptive, largely because PKG-INFO is the only way to reliably locate a relocated license file in a sdist, and I misremembered the guarantees Dynamic added. I’m too used to dealing with legacy sdists where the metadata is frankly garbage :slightly_frowning_face:

Anyway, I’m happy for someone to take my suggestion and refine it. I don’t have any personal stake in how this issue gets addressed, so I’ll let someone else make a formal proposal to update the spec. If it’s based on my idea, that’s fine, but if another approach is better, that’s fine too.

1 Like

One thing I would say is that any proposal should be reviewed by build backend authors, to ensure it’s implementable in practice. In particular, given that this has come up for flit and setuptools, @takluyver and @abravalheri should probably be involved.

1 Like

What is wrong with having symlinks in the repo? Cargo crates also have a similar restriction, yet it’s fairly common design to make symlinks in the necessary subdirectory. Yes there’s the whole windows does not enable symlinks out of the box, but that’s an issue to raise with git on windows to have them finally resolve this thing.

It’s not an issue with git, it’s an issue with Microsoft, and the repeated statements from their side (e.g. through @steve.dower IIRC) was: symlinks won’t be allowed in a default user setup – which is what we need to cater for.

1 Like

Yes, it’s an issue with Microsoft w.r.t. setting it up, but git is not a default application on Microsoft and the packagers of git on Microsoft can add a check or message or anything to point the user to enable it for that functionality.

On the other hand though, what is the failure method if the user did not enable it, does the git repo not clone, does it create an empty file, or does it copy the content. Those could be addressed on the git side, and if it fails by copying the content, then it wouldn’t even be a problem.

It would be great if there was a central information source for this issue, because so far the only source of information for me are various fragmented stackexchange, some of which indicate that this is already documented on the first-time user installation: Git symbolic links in Windows - Stack Overflow

You need to be an administrator, either to use them by default or to change the setting allowing non-admins to use them.

Many users are not admins and cannot access admin - not everyone works on a machine that they own and are responsible for.

Hard linking is also a fine option here, because a git clone is always going to be on a single volume[1]. But this is up to git, not us.

As usual, my position on our specifications is that we shouldn’t legislate away every dumb idea. If you care at all about portability, symlinks in a code repo is a dumb idea. And while I’d strongly discourage anyone from doing it under any circumstance, I wouldn’t build such a restriction into an interoperability spec.


  1. Apart from really degenerate cases, such as mount points added to a local repository and a symlink pulled down in a new commit, which really don’t have a choice but to copy. ↩︎

2 Likes

For Flit, I would be reluctant to support this, because it means that either the pyproject.toml in the sdist is wrong (still pointing to ../LICENSE.txt), or we have to rewrite it when building the sdist. I like the principle that as far as possible, an sdist is just a tarball of a source directory with some metadata, not a distinct intermediate state we need to deal with. Flit has also always aimed to support the 90% of simple cases, not 100% of what people might want to do.

People can work around that by duplicating the file - IIUC, a git repo will only store the file data once - or with a symlink if Windows support is not a priority.

5 Likes

I share the concerns about allowing .., due to the potential complexity and integration challenges with existing sdist workflows given the existing code base[1]. Additionally, it could set a precedent for other files. However, I am open to discussing this further, especially if someone can provide a proof-of-concept PR for setuptools.

If hard links work (apart from edge cases), they might be a practical solution[2]. Developers who need to reference files outside the project directory could be required to keep their code within a single volume/filesystem. This approach wouldn’t affect end users on PyPI but would impose requirements on the build environment. Moreover this is a “specialized” requirement and will not affect developers that can afford either copying the file or relocating it to inside the project folder by rethinking the directory strategy. It offers a compromise that simplifies the implementation of backends.


  1. A successful proposal should include clear instructions on how to handle the sdist (e.g., specifying the directory mapping in a standardised way) and how to manage the “build from sdist” scenario. ↩︎

  2. Developers can choose if they need hardlinks or symlinks depending on their build environment requirements (e.g. packages that are not available on Windows can choose to use symlinks) ↩︎

My comment specifically said they work for git, within a single git repository. They don’t work any better than a copy for the LICENSE case, and are more likely to cause confusion/issues since they don’t appear to be a link to users.

The problem here is [people being worried about] the pyproject.toml metadata in an sdist not being consistent with the PKG-INFO metadata in the same sdist. Modifying the pyproject.toml added to the sdist is a possible solution; “stop worrying” is also viable. Hard links are irrelevant.

Thanks Steve, I understand the major worries so far have been with pyproject.toml. However my personal concern (prompted by an earlier comment in the thread) is somewhat different.

What I am mostly concerned about is the sdist vs config consistency. I think that the current workflow model that the PyPA ecosystem employs is not very compatible with ../. Considering the existing workflow source tree => sdist => wheel => site-packages, files indicated by paths with ../ become inaccessible in the second stage. I also believe there may be other concerns about how .. could be engineered maliciously.

The suggestions here seem to go in the direction of “patching” the sdist (mapping ../ into a different folder and rewriting pyproject.toml) to support the use case, which introduce complexities. I also don’t like the precedent it opens for other files outside the repo and the open questions that come with it.

In this context, the mention of links (both hard and soft) sounds relevant. These are OS features that exist to solve this kind of problem, right? So if they work reasonably, I think we could give it a try instead of reinventing the sdist (sorry for the pun).

Please find bellow a proof-of-concept snippet using a hard links that seems to work fine when building:

## Windows Machine - PowerShell

mkdir $env:TEMP\myproj\subproj
cd $env:TEMP\myproj

new-item LICENSE.txt
@"
My license
"@ | add-content -Path LICENSE.txt

cd subproj

new-item pyproject.toml
@"
[build-system]
requires = ["setuptools>=77"]
build-backend = "setuptools.build_meta"
[project]
name = "mod"
version = "42"
license-files = ["LICENSE.txt"]
"@ | add-content -Path pyproject.toml

new-item -ItemType HardLink -Path .\LICENSE.txt -Target ..\LICENSE.txt

cd ..

@"
Modified license
"@ | add-content -Path LICENSE.txt

python3 -m venv .venv
.venv\Scripts\python -m pip install -U pip build
.venv\Scripts\python -m build subproj --outdir dist

tar -xOzf .\dist\mod-42.tar.gz mod-42/LICENSE.txt
# My license
# Modified license

Regarding the limitations (e.g. hard links not spreading across different file systems, etc…), I think it is acceptable to have build requirements for special use cases (the same way you don’t expect to build a C extension in a machine whose architecture/OS does not support an specifically required library).

We don’t need an OS feature, we need an archive feature, because the defining characteristic of an sdist is that it’s a single archive that can be taken to another machine and has all the files that are needed to build.

It’s no good transferring a link to another machine, because the file it references won’t be there. We can’t force users to use links in their source repos just so that the sdist doesn’t have to figure it out. And we can’t enforce “outside the pyproject.toml root but inside the repo” because we don’t have any definition for “repo” other than “the directory containing pyproject.toml”.

We need a way to copy a ../../../file into a tar.gz and safely extract it on an arbitrary machine. The only idea I’ve got is to rewrite the path where it’s expected to be found so that it can be extracted within the extraction directory, rather than at ../../.... No kind of file system link is going to help with that.

Yeah, I agree with what Steve said.

I haven’t had time to implement it but one of the next features in Hatchling which will resolve a few issues is the ability to define file inclusion options based on the type of build. Basically, I’m going to have duplicate-named file inclusion options that are suffixed by -from-sdist that only take precedence when the project directory has a PKG-INFO file. This is how I will remedy file path remappings.

1 Like

156 posts were split to a new topic: [Mod titled] How to express project vs. distribution vs. artifact license post-PEP 639

Per the request of the OP and PEP author @ksurma here I’ve split the followup discussion on how to better express project vs. distribution vs. per-artifact license to a new thread where we can focus on how to move forward given this PEP is Final.

2 Likes