Should sdists include docs and tests?

barry · March 25, 2022, 4:05pm

Like

$ tox

pf_moore · March 25, 2022, 4:36pm

I don’t recognise that item. Is it specific to a particular backend? I don’t see it in the flit docs, and setuptools has only just got pyproject.toml support, so I assume it’s not from that. But yes, something like that.

Yep. And then there would be the “tests” vs “test” debate, of course

… and unfortunately, I think the conclusion that the repeated discussions demonstrate is that there’s no consensus on this. Years ago, when I set up the original version of the PyPA sample project, we discussed including tests and documentation, and there were so many opinions that we abandoned the attempt to come up with a “best practice”. Things have improved since then (pytest and sphinx seem pretty ubiquitous) but I still don’t think there’s something we can reasonably endorse.

barry · March 25, 2022, 5:35pm

I guess it’s specific to pdm. The docs don’t specifically say, but it looks like the defaults would include the top level tests/ directory, but not a top-level docs/ directory. Maybe @frostming can say more.

Yep! Ideally, tools would accept both.

Perhaps you’ll never get consensus, but you can be opinionated!

pf_moore · March 25, 2022, 5:47pm

Agreed, someone can. It won’t be me, though. I already said I think including tests should be up to the project, and people didn’t seem to like that opinion

brettcannon · March 25, 2022, 7:02pm

It appears @barry is having that debate with himself.

That came up in another topic recently (may have even been the one this topic was forked from). I have some ideas on this topic on how we could potentially make this work, but I currently don’t have the bandwidth to pursue them. If people really want to try and tackle this we could start a new topic to discuss it.

CAM-Gerlach · March 25, 2022, 8:08pm

In our case (Spyder), we maintain (fairy sizable, complex and varied) user and developer documentation in separate repos from the code it supports (for multiple reasons), so its not (easily) possible to include in the sdist even if we wanted to.

Tests, on the other hand, are already included by default at least with Spyder’s stuff, since by convention there they are included “inline” in a test (or is it tests ) subdir of each level of the subpackage hierarchy.

Our best practice across the various projects I maintain is to always run the tests from the source on the installed wheel. For each Python version and platform in our CI matrix, we do a clean sdist and wheel build (with build, of course), install the wheel in a clean env, and then run the tests from the source against that, using either a src dir, tox and/or python -I, plus pytest --import-mode=importlib, to ensure isolation from the source tree and we’re always using the installed copy. It isn’t as important that the tests themselves work when packaged for end-distribution, but rather than the code under test works.

I ran into this limitation myself with some conda-forge packages I’m one of the maintainers of. Is there a reason that the official recommendation is to use the sdist, even though I’m pretty sure git clone/tarballs are supported as well, which would bypass the issue and ensure the full source is available?

I recall this as well, but after some browsing and searching, I can’t seem to find it.

barry · March 25, 2022, 8:15pm

But actually, I use test/ while the pdm default is tests/.

barry · March 25, 2022, 8:18pm

I should also mention that I’m a big fan of doctests, although I prefer putting those in separate .rst files. Mailman went deep down this rabbit hole at first, though I walked back some of that based on experience. Doc tests are great for happy path tests, not so good for unit tests / coverage. Anyway, the point is that in many cases docs are also tests!

cameron · March 25, 2022, 10:10pm

By Barry Warsaw via Discussions on Python.org at 25Mar2022 20:28:

I should also mention that I’m a big fan of doctests, although I prefer
putting those in separate .rst files. Mailman went deep down this
rabbit hole at first, though I walked back some of that based on
experience. Doc tests are great for happy path tests, not so good for
unit tests / coverage. Anyway, the point is that in many cases docs
are also tests!

Aye. When feasible, and particularly to illustrate typical use or a
corner case, I like to stick docstests in my docstrings as examples of
use.

Cheers,
Cameron Simpson cs@cskk.id.au

blink1073 · March 26, 2022, 4:35am

The conda docs give some compelling reasons. (e.g. bandwidth and security).

They also have docs for running unit tests as part of the build.

CAM-Gerlach · March 26, 2022, 5:01am

Yeah, but that section and its reasons are only about why source tarballs should be used in preference to git clones, not why sdists should be used over true source tarballs—which, as mentioned above, are perfectly possible to obtain from GitHub; in fact, the linked section even includes an example of such:

Therefore use, for example,: [sic]

curl -sL https://github.com/username/reponame/archive/vX.X.X.tar.gz | openssl sha256

It doesn’t, as far as I can find, imply any preference for PyPI sdists over GitHub source tarballs.

Yeah, that’s what I was working off; specifically, it has source_files, but that presumes the source tarball is from the actual source (and thus contains all the source files in the first place), and not a sdist distribution archive with selectively-packaged files.

frostming · March 26, 2022, 11:13am

Correct, it is described here. I think tests should be good to include by default since distros may need that when building packages from source, but docs is left for opt-in.

fungi · March 26, 2022, 12:03pm

While I don’t have a strong opinion on defaults for sdist
generation, the projects I’m involved in consider sdists to be
“true” source tarballs, “truer” than a naive archive of the Git
worktree since we can bake in relevant Git metadata which would
otherwise be lost. Because we want distributions to be able to use
them as a basis for their own packaging efforts, we make sure any
files tracked by Git (except for things which make no sense outside
revision control, like .gitignore) are included in the sdists we
publish, and we test to make certain that’s the case. If we have
docs or tests or even CI configuration checked into the repo, we
always include it all in the sdist.

We also usually include tests in our wheels, since we consider them
to be a part of the software itself and so locate them inside the
importable package rather than in some parallel directory tree. That
one may import foo.tests.bar (yes we use “tests” not “test”
too).

sinoroc · March 26, 2022, 1:53pm

For what it’s worth, I put the test code in sdists, but not in wheels.

I use the test directory at the root of the project (not tests), because that is the distutils (setuptools) default, and I do not need to add it to MANIFEST.in.

My test code is usually quite vanilla. Although I use pytest to run it, usually it can be run without it (cd test && python -m unittest). If test dependencies are required, they are in a dev_test extra (which I should rename to dev-test now, I think), so that info is also available in the sdist. At some point I tried quite hard to make ./setup.py test work for my projects , but it never did behave well enough, anyway that is gone now.

I do not add documentation to sdists (and wheels). But after having read more about how Linux distros packagers work, I now think adding documentation in sdists makes sense (at least the necessary for man pages and similar).

All in all, in the long term, I think a PEP 517 for test runners and doc builders could make sense.

CAM-Gerlach · March 26, 2022, 11:33pm

Yeah; that makes sense—since your tooling (PBR, but also true of the popular setuptools-scm) not only ensures your sdists include everything checked into your repo, minus VCS-specific files, and also bakes in additional metadata, your sdists are indeed more complete representations of your source than a straight tarball of your worktree. Whereas in our (Spyder project) case, we still rely on manual MANIFEST.in maintenance but also maintain our own tooling (LogHub) that takes care of things like updating the CHANGELOG and AUTHORS from the GitHub issues, PRs and contributors, which is all checked into source control.

I’ve been suggesting switching to setuptools_scm for automatic version and manifest management, which would result in a setup very similar to yours, but there’s some amount of institutional inertia and reluctance to “fix what ain’t broke” and a number of places that Spyder itself uses the version information that would need to change, as well as higher short-term priorities (e.g. modernizing our packaging config/infra and CI setup) to spend our limited resources and change budget on.

This is our standard practice as well in the Spyder project, where we not only have tests be a subpackage but have tests subdirectories for every directory that contain tests for the code in each module within, i.e.:

_ spyder
|__ spam
|__ eggs
|__ ham
|  |__ foo
|  |__ bar
|  |__ tests
|     |__ test_foo
|     |__ test_bar
|__ tests
   |__ test_spam
   |__ test_eggs

As some background, Spyder is a 15 year old project originally developed by a scientist with limited programming experience, and the original “tests” were just regular functions and function/method calls in the if __name__ == "__main__" blocks of the code under test, so this was a natural outgrowth of that.

In some other (typically smaller) projects I’m involved in, we use the src layout and locate our tests outside the package, organized by type (unit, integration, functional) and run from the source against the installed wheel.

I was actually completely unaware of that; for reference, could you point me to where that is mentioned?

Using _ is fine; PEP 685 doesn’t change that. _ is the most common non-alphanumeric character used in extras names, and being perfectly okay under the PEP 508 rules adopted by PEP 685. It’ll automatically get normalized to - via PEP 503 normalization when written out to core metadata, and consuming tools will also normalize it when comparing, so it doesn’t really make any practical difference (except in one specific scenario, but changing it on your end now won’t help that at all).

If you do make a change, you might want to consider a test extra, and a dev extra that includes test plus all your development dependencies, at least in terms of standard convention. But no need unless you want to, at least until it is more standardized, though nominally the core metadata spec does reserve the test extra for running tests:

Two feature names test and doc are reserved to mark dependencies that are needed for running automated tests and generating documentation, respectively.

sinoroc · March 27, 2022, 9:08am

Correction, it says:

anything that looks like a test script: test/test*.py (currently, the Distutils don’t do anything with test scripts except include them in source distributions, but in the future there will be a standard for testing Python module distributions)

– https://docs.python.org/3/distutils/sourcedist.html#specifying-the-files-to-distribute

Thanks. I did not know that. Interesting. I wonder if any tool actually recognizes those extras and handles them in any specific way.

blink1073 · March 27, 2022, 11:51am

No, but the regro-cf-autotick-bot uses PyPI releases to trigger automatic update PRs.

CAM-Gerlach · March 27, 2022, 9:23pm

Gotcha, thanks. I wouldn’t rely on this, though, since that’s an implicit default, distutils is deprecated, things with Setuptools may have changed, and in particular are likely to much further with

(perhaps @abravalheri might have further comments on this)

Perhaps the legacy, deprecated/removed distutils commands for running tests and building docs automatically install them, but other than that I’m not aware of other tools doing so. Poetry has something like that, but AFAIK that uses their own tool config. There’s also currently plans to propose a standard PEP 517-like hook for invoking such, but it hasn’t reached the PEP stage yet.

Right. Does it only work with a url source set to PyPI in the recipe, or does it work with GitHub tar sources as well? I couldn’t find any answer in the docs about that.

abravalheri · March 27, 2022, 10:26pm

Hi @CAM-Gerlach, currently the auto-discovery changes in setuptools are restricted to the package files. If the testing code resides within the package, they would be included.

For sdists, I personally recommend for people to use something like setuptools-scm or setuptools-svn whenever they can. I think this is a much better alternative than re-implementing a way to decide which files are transient or not (we all know the fate of the MANIFEST.in format…).

blink1073 · March 28, 2022, 8:43am

Oh nice, it appears the bot does work with GitHub releases: [bot-automerge] libbson v1.19.1 by regro-cf-autotick-bot · Pull Request #4 · conda-forge/libbson-feedstock · GitHub. So it appears there is no strong reason from the conda-forge perspective to include the tests in the PyPI sdist.