Help testing experimental features in setuptools

I can understand finding packages via the implicit (“flat”) layout by default, as well as in a src directly (which I personally strongly prefer for its explicitness and avoiding all the heuristics added here, as well as the various other issues), but does finding namespace packages really need to be turned on by default. At least in my understanding, their use is not that common, at least intentionally, are sometimes (mis)used without even knowing it, and this seems to be as likely to run into edge cases like this one where this behavior is assumed but not intended as those where it was.

Given that, is it possible to have find.namespaces default to False (at least initially), and those who actually need it can simply flip it on? Explicit over implict, after all.

I have personally mixed feelings about this. While I agree that implicit are better than explicit, we cannot ignore the fact that there is a PEP normalising implicit namespaces… Unless I am wrong, if you fire an interpreter while in this folder you are going to be able to import those files, right? So the folder is a valid package…

The origin of the evil seems to be the flat-layout + exotic folder structure… The current implementation should get rid of tests, scripts, tools, docs, examples, tasks, etc… But it is impossible to contemplate all the auxiliary folder names that developers might come up with… (I did leave some escape hatches behind… folders starting with a _ or . character will be considered private/hidden and therefore automatically excluded).

Sure, but the fact that it is a valid import package doesn’t necessarily meant that Setuptools is required to include it in a given distribution package by default, just like it already uses a variety of implementation-dependent heuristics (some of which you mention above) to include/exclude possible import packages. Just because it is implicit at runtime doesn’t mean it has to be implicit when packaging; a number of other tools (mypy, linters, etc) require flipping a switch to enable PEP 420 support, often with various similar caveats clearly documented along with the option itself.

And as I’m sure everyone here is aware, just because the code works when run from a local directory doesn’t mean it will work when packaged (given package data, data files, include/exclude and other config). Of course, this should be reduced as much as practical, which your changes help do, but in cases that induce particularly great “temptation to guess” and carry significant risk of guessing wrong—like with namespace packages—it would seem to make sense to make it explicit, especially given the relative rarity of intentional PEP 420 layouts and the low cost of flipping an option from False to True, as well as the benefit of ensuring users are made aware of the possible side effects first.

2 Likes

Thanks Christopher for the feedback, I will consider this more deeply.

I don’t think that I will make namespaces=False the default, but I think that when doing automatic discovery (with no find) I can leave 420 out. I don’t know yet, this is complicated… My first approach was different (you can see a discussion here [FR] Default values for `packages` and `py_modules` · Issue #2887 · pypa/setuptools · GitHub).

The only think that I would really like to avoid is to make the discovery mechanism even more complex (and with more switches) that it already is/has (it is hard to keep backwards compatibility while push things forward!).

(Side note: I love namespaces and I use PEP 420 all the time, it works like a charm, you just need to use the src-layout).

3 Likes

It’s me who should be (and I am) thankful @abravalheri :wink: PS. small repo with experiments here.

I found a fun case, where it was still including venv/bin/*.py (after venv* was set in pyproject.toml) to sdist tar.gz.
To reproduce,

  1. have venv in exclude and build. Don’t remove dist or egg-info,
  2. modify pyproject.toml (changing exclude venv to venv*) and build again.

I’m not sure if this would be intentional, that they would be removed from pkg.whl, but left in sdist? Also PKG-INFO claims license is unknown, despite license = { file = "LICENSE" } I guess one has to explicitly inform it about SPDX string, apart from just providing a file? Anyways, thanks again!

Hi all,

Thanks a lot for the effort! I also made a small repo with experiments and explanations in GitHub - moshez/pyproject-only-example – I hope that’s useful to anyone who tries to move and/or test.

Hi @kkrolczyk, I tried your repository and it seems that running from a clean state (i.e. after a rm -rf dist* build *.egg-info* *.log), everything seems to work as expected.

When you change the build configuration, you will likely have to do a clean, this is probably related to Updating MANIFEST.in does not correctly update the package sdist creates · Issue #436 · pypa/setuptools · GitHub, but I don’t consider it is a blocker.

For differences between sdist and wheel, or which files are included, please check Controlling files in the distribution - setuptools 69.0.3.post20231214 documentation (sdists will always include more files: the wheel is build from the sdist but just getting the files under the package directory).

Let’s keep this on hold for now until the new license PEP is approved. The LICENSE field in PKG-INFO should be mapped to the new license-expression in pyproject.toml. Meanwhile if you want to avoid the UNKOWN value you can use:

[project]
dynamic = [..., "license"]

[tool.setuptools.dynamic]
...
license = "..."  # This field will likely to be removed after PEP 639
license-files: [...]

(This is a current limitation of PEP 621 that does not allow a file and an expression to be specified at the same time for project.license).

I will check the warning with the url field at some point this week :sweat_smile: (the warnings with author/maintainer and email are already addressed with a PR to distutils).

2 Likes

Hi @CAM-Gerlach, I am experimenting with this concept: [follow-up `auto-discovery`] Prevent auto-discovery from being too aggressive by abravalheri · Pull Request #3155 · pypa/setuptools · GitHub. The idea is just to look for folders/files that match the name metadata/config. Do you think that would solve the problems you pointed out?

(I was resistant at first because it makes the discovery code even more complex that it already is, but I think this is the safest approach)


UPDATE: I backed off from that proposal and closed the PR after re-reading the comments from the lead developer of setuptools in: [FR] Default values for `packages` and `py_modules` · Issue #2887 · pypa/setuptools · GitHub).

Without doing a survey on GitHub/PyPI we can’t really say if namespaces OR multiple packages per distribution are more or less used than flat-layout AND auxiliary folders with custom/exotic names.

If the users want to use a particular directory layout, that is completely fine, but then the best would be to provide specific configuration instead of relying on auto-discovery.

The license.file subkey in PEP 621 is rather under-specified, leading to substantial ambiguity and potential confusion over the specific intended behavior and mapping (not the least for me when writing PEP 639). However, after far too much time carefully combing through the PEP, actual implementations and discussions with some of the authors, what I believe their intent here (not entirely sure whether and how all implementations follow it; it would seem like @abravalheri 's implementation in Setuptools omits this functionality) was that the contents of the file listed under license.file get loaded and dumped as a string into the License metadata field; that’s the reason it is mutually exclusive with the text subkey, since only one or the other is used for the field value.

Therefore, at least per PEP 621, if you want to specify a SPDX expression for your package, you would list it under license.text, and your license file (and any others) should get picked up automatically by the de-facto standard li[cs]ense*, copying*, notice*, authors* glob I helped implement in Wheel some time back, and it looks like the main tools setuptools, wheel, etc. adopted as a de-facto standard (unless you have license files that don’t fit that pattern, in which case you’ll have to continue to rely on the Setuptools license_files option in the tool section until PEP 639 is accepted). If @abravalheri doesn’t implement the license key entirely to allow a clean transition when PEP 639 is approved in final form, which is a fair decision, then you’ll just list that under license in the tool section instead.

PEP 639 will resolve and simplify this, at least as presently designed; you’ll just put your SPDX license expression as the flat value of license, and list your license files under license-files (if not relying on the de-facto standard default glob which this PEP also now makes de-jure).

Actually, the consensus plan for some time (and as implemented in the PEP) has been to deprecate the old License core metadata field and store the license expression in a new License-Expression field instead, due to the ambiguity that storing it in License would create (which is at cross-purposes to the PEP’s main goal of improving clarity here).

Also, I’m working on finally updating the PEP to reflect this as we speak, but the current consensus approach to the PEP 621 keys is to have a SPDX expression be the flat string value of license, as the PEP 621 authors originally intended, rather than introducing a new top-level license-expression key, and thus naturally deprecating (and being mutually exclusive with) the license.text and license.file keys.

I’m assuming this is a result of choosing not to support the license key entirely until PEP 639 is resolved. I don’t object to that decision, but if you do go that route, please make specifying license an informative error until that’s the case, or people’s license metadata will get silently dropped.

See above, this took many re-readings, analysis and discussion to get clarity on due to substantial ambiguity in the original PEP, but this appears to be ultimately because license.file appears to be intended to do something very different from Wheel and Setuptools license_file and license_files options, namely loading the text of a single file and injecting that as the License core metadata field, rather than including multiple listed files in the distribution. Thus, it is not possible to specify the text and file subkeys to license simultaneously, as they get routed to the same core metadata field.

2 Likes

Right know, if project.license.text is specified, the implementation will add a value to the License core metadata, no license metadata will be silently dropped. The same is valid for project.license.file and License-File. The project.license field is supported.

I think what kkrolczyk was referring to is a different behaviour. When a field is not specified setuptools will fill the value with UNKNOWN in PKG-INFO. So it is currently possible to have a valid License-File: and a License: UNKNOWN in the same PKG-INFO/METADATA file. Probably if someone surveys PyPI right now they will find packages behaving like that. (It does not seem to be problematic, just an eyesore).

2 Likes

At least in my suggestion, that was a substantially bigger and stricter change than I was suggesting, which was just to set namespace package autodiscovery to false by default at least if the flat layout is used, to avoid silent false positives that get unknowingly baked into the final package and could cause downstream issues with not only the package in question, but potentially even the broader ecosystem (name conflicts, unintended behavior etc). The PR you made does solve the practical issues I raised, but being a more expansive change, runs into the other roadblocks you mentioned that ended up resulting in it being closed.

This comments do hold for the change you proposed, but they are orthogonal to not picking up implicit namespace packages by default in the flat layout, because they do not discuss namespace packages at all and the approach I propose above is much more limited in scope.

Indeed, and neither have I (yet) done a proper quantitative survey to support this, but my impression—both from having contributed to hundreds of different FOSS packages large and small, none or nearly none of them (intentionally) using PEP 420, and the fact that the other major tools I know of that it affects (Pytest, Pylint, Mypy, other linters, etc) all have an explicit off by default switch to enable namespace package discovery, due to the issues its implicitness can cause, is that it would be wiser to do likewise, at least for “implict”/flat layouts.

Oh; in that case, I was confused as to why you were recommending that the user use tool.setuptools.dynamic.license = "..." and adding license to dynamic instead of just doing project.license.text = "...", unless you had deliberately not implemented the functionality of the license key.

This is not the intended behavior specified by PEP 621, as the License-File field didn’t even exist when PEP 621 was created, it is introduced in the still being finalized PEP 639. Rather, as I explain above, what appears to be intended is for file to specify a file whose text is dumped into the License core metadata field, much like the file prefix in setup.cfg (which is also why it is mutually exclusive with text). However, it certainly is very understandable that you interpreted it this way, as PEP 621 is very vague, underspecified and ambiguous on this point and in no way states this explictly. In fact, despite pouring over that PEP and many others multiple times, as well as various implementations, I myself was confused on that point until very recently.

Yes, it is a different behavior, but @kkrolczyk is expecting is the behavior intended by the specification in PEP 621. There is not yet a standardized License-File or License-Expression core metadata field, but having License-File contain a value but License/License-Expression not is perfectly valid (though specifying both of the latter is an error), and would be desired in some cases (e.g. non-SPDX license). The converse case for Core Metadata 2.3, where License-Expression/License contains a value but no license files are specified/found, is technically valid (i.e. the standard currently doesn’t prohibit it), but should not normally happen (and it is specified that it should generate a warning due to no License-File entries being found, regardless of License-Expression/License field status).

Regarding Setuptools’ and Wheel’s non-standard use of License-File, they are using them roughly following a previous draft of the yet-unstandardized PEP 639, but the semantics have changed somewhat in the current version, and could change further before final acceptance. As such, we cannot condone this behavior, though so long as they do not actually increment the metadata version to 2.3, it is still possible to retain backward compatibility without affecting packages produced with these existing non-standard extensions, and the specification in the PEP explicitly ensures that future tools conforming to it will not assume the standard semantics in older non-standard fields.

I think, that in general this can prove to be another difficult task - what if license file does not match license file? Or, what if it “nearly” does, but license file was “slightly” modified? Or someone used not quite standard SPDX license expression? But in so far I am happy with this solution, and even more so that actually this topic gets more and more recognition.

Out of other dynamic options, i took a look at entry-points. I wonder if i understand it correctly - in experimental feature commit it parses external file, but expects also a static list of key value pairs.
I wonder why wouldn’t someone put them directly to pyproject.toml in such case?

I was looking into it, because in my case, for older setup.py it was a hook to utils script, which would scan package dir, and return dynamically created list of entry points - kind of facilitate “plugins”. Would it make sense to have a third optional argument to entry_points function, taking functor? By no means now, as this might complicate reviews and all, just a general idea.

Thank you very much for the detailed explanation Christopher and all the advice.

I think I will adopt a pragmatic approach here and keep the current implementation which is consistent with the existing setuptools behaviour (a non-standard License-File metadata in PKG-INFO, with no increment in the metadata version).

Once PEP 639 lands, we can go ahead and implement License-Expression and do the necessary adjustments to adhere to the final text of the PEP.

My plan is to merge the experimental branch into main soon (just after this first round of tests is completed), but the feature will still be marked as experimental for a while until we collect feedback from the broader community. So we will still have some freedom to change things.

I haven’t read the latest iteration of PEP 639, but I believe there is so much we can do… In the end of the day we have to trust that the package developers will act in their best interest and provide a SPDX identifier that is coherent with the license file.

This was done to mimic existing behaviour of setuptools. One of proposed ideas in Discussion: support for pyproject.toml configuration · Issue #1688 · pypa/setuptools · GitHub is that eventually legacy setup.cfg files will be parsed by translating them first into a pyproject.toml equivalent (which is one of the reasons of existence for ini2toml). The ability to do this 1:1 translation is the motivation for that commit.

In principle, everything is possible, but that have to go through the feature requests process in setuptools (in general a FR that have a compelling use case and an associated PR have more chances to be accepted).

Meanwhile I suggest pre-processing the pyproject.toml file and adding the entry points via an automated script before running python -m build.

2 Likes

Yeah, but this is really more a job for other more appropriate tools and a discussion for another place.

Right, to be consistent with present behavior (for better or worse), handling tool.setuptools.dynamic.license-files the same as the license_files key in setup.cfg is fine. The one thing that does need to be fixed, however, is the handling of project.license.file to have the behavior specified above, as the current behavior is not what PEP 621 specifies (despite it very confusingly sounding like it is, and it being very understandable to initially interpret it that way) nor what, AFAIK, the other backends implement. In addition to those issues, if that is not fixed, it also creates a potential problem with the dynamic key (which is also specified with somewhat inconsistent and confusing terminology in PEP 621).

:+1: from me on that, which is also more or less the community consensus on that aspect, which has been discussed at some length there.

Correct. Since PEP 621 was structured to be more-or-less a mapping from TOML to core metadata w/ some convenient flourishes, we did not try to invent anything new/different (beyond dynamic). There was no innovation intended with license (hence no SPDX support and thus CAM’s PEP 639), and since the core metadata only has the concept of a dump of license text, the specified file was expected to be dumped.

If that is not clear by pyproject.toml specification - Python Packaging User Guide then please either open a PR or let me know what needs to be changed to make that clear.

3 Likes

Thank you all again for the feedback.

I have updated the implementation in the experimental branch to reflect the discussion about license and license_files.

After the discussion, my conclusion is that tool.setuptools.dynamic.{license,license-files} are no longer necessary (since backfilling License-file as non-standard core metadata is completely independent of adding license to project.dynamic).

The updated implementation will expand project.license.file “in place” (i.e. add the file contents to the License core metadata). It will also accept provisionally tool.setuptools.license-files (until PEP 639 is approved)[1].

Both validate-pyproject and ini2toml were updated accordingly.

People that were using the pyproject.toml generated by the previous version ini2toml might find some errors with tool.setuptools.dynamic and need to update (or regenerate the file with the latest version of ini2toml), to reflect the changes discussed above.


  1. For now tool.setuptools.license-files is a simple list of glob patterns. Once we implement project.license-files it will accept the same kind of value as described in the final version of the PEP. ↩︎

Yup, exactly. license-files isn’t a key specified in PEP 621 (nor a core metadata field specified in the Core Metadata spec), so it wouldn’t really make sense to list it under dynamic anyway.

My non-binding suggestion would be to just retain the existing license_files syntax in the tools section rather than attempting to copy whatever behavior is specified in the project section by PEP 639. This retains backward compat and avoids churn while allowing anyone who wants to migrate to use the project table format.

There’s also some specific tool behaviors specified or recommended along with the format, which might cause confusion if they were not also duplicated (which might not be desirable). It also won’t back-translate to the old setup.cfg format which will probably be around for some time, potentially cuasing further confusion.

But this all won’t happen until PEP 639 is resolved anyway, so no need to decide that now.

2 Likes

Dear all, I updated again the experimental/support-pyproject branch.

The main difference is that the auto-discovery will now error out if multiple top-level packages are detected in a flat-layout.

This looked to me like a good compromise to try preventing unwanted files and folders from being accidentally included in the distribution.

Users that actively want to have multiple top-level packages in a single distribution are encouraged to one of the following:

  1. use the src-layout
  2. explicitly define [tool.setuptools.packages.find] with the appropriate where/include/exclude.
  3. manually list packages or py-modules

I think we are now in a relatively good shape to merge this into the main branch, which I plan to do in the following days unless something I was not foreseeing comes up. If anyone would like to give it a last try and report bugs that would be very appreciated.

There are now some docs in setuptools 60.10.0.post20220322.post-20220322 documentation, the following pages are specially helpful:

Note that once the branch is merged, these features (pyproject.toml metadata support and auto-discovery) will still be marked as experimental which will allow us to react to feedback coming from the broader community and also examine more thoroughly the design of the [tools.setuptools] table.
(Naturally, the handling of the [project] table should not change significantly).

5 Likes