And that should be the last PR!
Thanks for proposing the clarifications Brett. The text looks good to me although in the second PR it seems a little ambiguous where it says:
This key should only be specified if the license expression for any
and all distribution files generated from thepyproject.tomlis the
same as the one specified. If the license expression will differ then
it should either be specified as dynamic or not set at all.
I think that “generated from the pyproject.toml” is a bit ambiguous. I can suggest clearer wording on the PR but I just want to be clear here what I mean to make sure we’re on the same page first.
I just updated python-flint to use PEP 639 a few days ago in this PR. That is not yet released but if you want to see the resulting wheels they can be found at the Anaconda scientific python nightly wheels index here.
Specifically we have this in the project section of pyproject.toml:
Lower down though we have this:
What that means is that when building wheels in CI with cibuildwheel we will run auditwheel afterwards to bundle in some libraries and at the same time we run a script that bundles in their licenses and files and makes a combined license expression. The sdist therefore has the same license metadata as shown in the project section of pyproject.toml but inside the wheels there is:
$ tree python_flint.libs/
python_flint.libs/
├── libflint-db910cfb.so.21.0.0
├── libgmp-b288de48.so.10.5.0
└── libmpfr-a43da5fb.so.6.2.2
$ tree python_flint-0.8.0.dist-info/
python_flint-0.8.0.dist-info/
├── licenses
│ ├── LICENSE
│ └── python-flint.libs
│ ├── flint-3.3.1
│ │ ├── COPYING
│ │ └── COPYING.LESSER
│ ├── gmp-6.3.0
│ │ ├── COPYING
│ │ └── COPYING.LESSERv3
│ └── mpfr-4.2.2
│ ├── COPYING
│ └── COPYING.LESSER
├── METADATA
├── RECORD
└── WHEEL
$ grep License- python_flint-0.8.0.dist-info/METADATA
License-Expression: MIT AND LGPL-3.0-or-later
License-File: LICENSE
License-File: python-flint.libs/gmp-6.3.0/COPYING
License-File: python-flint.libs/gmp-6.3.0/COPYING.LESSERv3
License-File: python-flint.libs/mpfr-4.2.2/COPYING
License-File: python-flint.libs/mpfr-4.2.2/COPYING.LESSER
License-File: python-flint.libs/flint-3.3.1/COPYING
License-File: python-flint.libs/flint-3.3.1/COPYING.LESSER
I know the discussion above was pretty tense but let me just say thank you to everyone involved with PEP 639 here. I am a lot happier with this than the previous approach which was (pretty much literally) this:
cat wheel-licences/* >> LICENSE
The advantages are clear:
- Each distribution has a short (machine and human) readable license expression so you can see what they are and how they differ.
- We bundle the unmodified upstream license files and can organise them in an easily understandable way.
The text “generated from pyproject.toml” looks ambiguous here since the configuration that inserts these additional licenses is in pyproject.toml and the pyproject.toml was also used to build the wheels before anything else got bundled. My reading though is that you didn’t intend to disagree with what I have done here but perhaps it is helpful to see how this could look in practice for why the wording is possibly ambiguous.
I would say that in general for all metadata and not just the license the significance of “dynamic” is tied to the PEP 517 build specification and answers this specific question: when building a wheel from an sdist using the PEP 517 interface can it be assumed that some piece of metadata from the sdist will be unchanged in the wheel?
The pyproject.toml file is the configuration that is used to determine what a PEP 517 build would do so I can see why you would say “generated from pyproject.toml” as a proxy for “built via the PEP 517 interface” but also pyproject.toml can be used for many other things as well.
Also “If the license expression will differ” is ambiguous as well. The reason is that the build backend might be configurable. It might be the case that with the default build options it would not differ but that there is also the possibility to provide non-default options that would potentially affect the license. This could look something like:
pip install . -C setup-args=--download-and-bundle-stuff
I did not know of examples of this before but I recently read something suggesting that matplotlib’s (PEP 517) build can use meson’s wrap file feature to download and build dependencies that are statically linked. I’m not sure if that is something that is done on default build configuration or if it is only on an opt-in basis. I have certainly contemplated that it would be good for python-flint to be able to do the same at least on an opt-in basis but in python-flint’s case I think it needs to be dynamic linking (the LGPL license has a carve out for dynamic linking) and meson-python does not provide for that currently. I expect more projects will do this sort of thing in future.
If you want to see a more complex example of how PEP 639 works out this was recently added to numpy’s pyproject.toml:
Since setuptools support removal date has been mentioned here, I jumped at it and sent a feature request for this project to postpone the removal for as long as possible: [FR] Support the `license` key table values for as long as they are supported by specification · Issue #5081 · pypa/setuptools · GitHub
I hope the maintainers will see it similarly.
That’s my understanding, and in order to reflect that I’d basically just change “all distribution files generated from the pyproject.toml” to “all distribution files created by a build backend using the pyproject.toml”.
The core metadata Dynamic field used in sdists should be covered by Clarify that the Dynamic metadata field only applies when building from sdist by pfmoore · Pull Request #1901 · pypa/packaging.python.org · GitHub (which I’d forgotten to move from draft status to “ready for review”, so thanks for the reminder
)
This turned out to be a breeze BTW. Even all our awkward cross compiling support just neatly slotted into a hatchling build hook without fuss. I’ve had similar good experiences with moving other projects over to hatchling in the last couple of years too. If anyone else is the same boat as I’ve been in, really wanting an alternative to setuptools that isn’t as opinionated about your project as flit nor as opinionated about your workflow as a workflow manager, then I’d definitely recommend hatchling (just the build backend, you don’t have to use hatch the frontent tool/workflow manager).
This is the default for FreeType and Qhull, and has a build option to opt out of that and use system libraries instead, as explained at matplotlib/meson.options at 5b38f50ae200e09419f929a5f3760ea5ae099d2e · matplotlib/matplotlib · GitHub. Note that this is a significant improvement over the previous behavior from when Matplotlib was using setuptools, where somewhere deep inside a setup.py file there were some make invocations in a subprocess to achieve the same “download and build FreeType and Qhull” thing in a completely ad-hoc fashion.
It should support that now, with auto-vendoring of the built shared library a la auditwheel, see Using shared libraries - meson-python . There are some sharp edges of course (this kind of thing is as hard as it gets for build backends) and another improvement is coming soon to avoid issues with “the subproject being built wants to install more than just the shared library you want”.
Thanks for sharing your very detailed example @oscarbenjamin and working on improving how to deal with licenses of vendored libraries. I think that’s essentially working towards a key part of the “left for a future PEP” determination of this rejected idea from PEP 639.
Nice, thanks Ralf. I will take a look at that.
Separately from the technical aspects the main thing I am unsure about is whether this matches user expectations. For example Steve’s point above:
It is not just a question of licenses but also supply chain security and other things. If you do pip install --no-binary foo foo then maybe you expect that it downloads things from PyPI but having the build backend download things from other places is unexpected?
For most users it would be much better UX for the backend to download and build the dependencies as matplotlib does (if not already available or if wrong version etc). If you can build the package itself then you likely have the pieces to build the dependencies as well. The weird division between Python and non-Python dependencies in PyPI-land that historically prevents this has always been a massive pain for users.
I also don’t want to mark the sdist as having a “dynamic license” though when it is the primary input for repackaging in conda, Linux distros and so on. In those other situations the “Python vs non-Python” distinction is irrelevant and the non-Python things are just normal dependencies that can be installed separately.
I think it depends. As a package author, your alternatives are to do what Matplotlib is doing now, or to vendor the downloaded sources (that’s what I’d personally prefer, but it’s also a lot of overhead - and something like FreeType is quite large), “just let it break”, or not shipping an sdist at all. What Matplotlib does now is download sources through a standard mechanism with an exact hash which is verified by the build system, so it’s pretty secure.
There are obviously bad practices here that are to be discouraged, but I don’t think what Matplotlib does now is one of them. It also implements ways to opt out and build against system libraries instead, in line with Supporting downstream packaging - Python Packaging User Guide.
I also don’t want to mark the sdist as having a “dynamic license” though when it is the primary input for repackaging in conda, Linux distros and so on.
Agreed. I think the thing that bothers Steve here (which is very valid, and bothers me too) should be a priority to fix in PyPI, so the UX doesn’t mislead the casual reader about what the actual license is. I won’t comment more, because the phrase from Steve was from 100+ messages up and followed by lots of discussion that we’re bound to otherwise repeat. For the purposes of what “dynamic” means and a PEP 639 clarification I think this thread converged to something reasonable. It just didn’t solve that per-artifact UX issue, which the authors acknowledged and left to a future PEP.
I have updated the PR with this change.
FYI, my dynamic metadata PEP should go live soon. My sponsor read it, so I just have to rebase with a new number and make the PR (probably when I get back from India).
So, it seems like PyPI has started taking a much stricter approach towards license metadata:
I can understand the intent, but getting this type of error at the end of our release procedure (which is, well, highly non-trivial) is a bit frustrating:
Uploading pyarrow-22.0.0.tar.gz
100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB • 00:00 • 140.0 MB/s
WARNING Error during upload. Retry with the --verbose option for more details.
ERROR HTTPError: 400 Bad Request from https://upload.pypi.org/legacy/
License-File LICENSE.txt does not exist in distribution file pyarrow-22.0.0.tar.gz at pyarrow-22.0.0/LICENSE.txt
It would be nice if there was a way to PyPI-sanity-check a package before upload, such that we would detect such situations ahead of time and not at the last moment.
It would be nice if there was a way to PyPI-sanity-check a package before upload, such that we would detect such situations ahead of time and not at the last moment.
My understanding is this is the class of problems twine --check is intended to catch, most sensible would be if warehouse and twine shared a common library of checks so they wouldn’t get out of sync.
The (unfortunate) workaround I’ve taken to using is performing a preliminary upload to test.pypi.org first in order to catch similar issues.