Should sdists include docs and tests?

merwok · March 24, 2022, 6:59pm

Sadly many projects are taking the view that wheels are the artifact for
installation, and there’s no value in publishing sdists (or including
docs and tests in sdists) when downstreams can get zip archives from
github (and others). There is often resistance when people request that
tests be included in sdists.

fungi · March 24, 2022, 7:38pm

It really depends on the project. For most of the projects I’m
involved with, we consider sdists to be our real “source tarball”
artifacts because our sdist release process incorporates information
(versions, authors, changes…) extracted from our Git metadata at
build time. If downstream consumers just used tarred-up copies of
the Git worktree, such as those produced by hosting platforms like
Gitlab or GitHub, then that critical information would be absent.

CAM-Gerlach · March 24, 2022, 7:44pm

EDIT: Added the predecessor messages that were relevant to this thread

fungi:

Modernising my packages - am I thinking about this all wrong?

Modernising my packages - am I thinking about this all wrong?

There is often resistance when people request that tests be included in sdists.

Coming from a package author/upstream perspective, I don’t have a problem with doing so, but what I’ve had a hard time understanding is why downstreams don’t just use the source tarballs, as they are the definitive source form of the project, whereas the sdist is nominally for user consumption. I can see why that is the case for special cases like @pf_moore where very restrictive corporate policies are in place, but not for Linux distro downstreams or other open source projects I have more inclination to spend my volunteer time supporting.

our sdist release process incorporates information
(versions, authors, changes…) extracted from our Git metadata at
build time.

I see. Is that with setuptools_scm, or custom tooling?

pf_moore · March 24, 2022, 8:33pm

I’m ambivalent over the matter of tests being included in sdists. My (personal) view is that I want the sdist so that I can build the project on an otherwise-unsupported environment, and I can view the source code of the project (even though pure Python wheels include the source, it’s not in “buildable form”). I don’t see supporting files like documentation sources, tests, release scripts, etc, as necessarily part of the sdist - if projects want to include them in the sdist, then that’s fine, but on the other hand, if they want to point users to a project repository for those things, I’m also fine with that. For my personal projects, I tend not to bundle these sorts of thing.

If we want to make it the norm to include tests, documentation source, or whatever in the sdist, then IMO we should be standardising the details (such as the directory in the sdist where they go, a standard means of running tests, building docs, or whatever).

Consumers like Linux distributions who need a published, well-defined source for the project could reasonably contribute a set of shared requirements here, that we could use to drive a standardisation effort. But without some clear understanding of what the requirements are, I don’t see the packaging community being in a good place to agree a standard for this sort of thing. And so far, there’s been little unified input from the Linux distros on this type of detail.

barry · March 24, 2022, 9:47pm

I’ve gone back and forth about that too @pf_moore and in fact I used to include tests/ and docs/ inside the source directory. You can see this in Mailman f.e. where I went deep down this rabbit hole. Maybe it even makes sense in a big project like that.

These days I still include tests/ and docs/ in my sdists, but I make those siblings of the src/ directory. Big app vs smaller library isn’t apples to apples, but it feels like the right trade-off these days.

I don’t see any upside to omitting tests/ and docs/ from the sdist and a lot of downside. Arguments about tarball size aren’t at all convincing these days, but I want downstream consumers of my sdists to be able to reproduce the build as much as possible (and that includes running the tests and building the docs).

cameron · March 24, 2022, 9:48pm

By Paul Moore via Discussions on Python.org at 24Mar2022 20:43:

merwok:

There is often resistance when people request that
tests be included in sdists.

I’m ambivalent over the matter of tests being included in sdists. My (personal) view is that I want the sdist so that I can build the project on an otherwise-unsupported environment, and I can view the source code of the project (even though pure Python wheels include the source, it’s not in “buildable form”). I don’t see supporting files like documentation sources, tests, release scripts, etc, as necessarily part of the sdist - if projects want to include them in the sdist, then that’s fine, but on the other hand, if they want to point users to a project repository for those things, I’m also fine with that. For my personal projects, I tend not to bundle these sorts of thing.

Just to this, if I’m building something on a platform with no
presupplied version, if there are tests I can run it gives me more
confidence that the build produced a working thing.

If we want to make it the norm to include tests, documentation source, or whatever in the sdist, then IMO we should be standardising the details (such as the directory in the sdist where they go, a standard means of running tests, building docs, or whatever).

There is much room for bikeshedding there, and that might be a PITA if
the author’s layout does not match the whatever standard might ensue.

As an example, part way down the rabbit hole I’d entered when I made
this post was looking for the pyproject.toml equivalent to
setup.py’s package_dir setting, on which I was relying (my Python
code is in a lib/python subdir and the hg archive incantation I use
to construct the basis of the build tree preserves that). I’ve been
shipping sdists in that structure for years.

Likewise, a common pattern for tests seems to involve tests/
subdirectories, whereas a lot of my cs/foo.py modules have an
associated cs/foo_tests.py tests file. I ship those in my sdists too.

Just making the point that any standardisation is either going to
restrict author practices or need to take the form of specifying a hook
to “run tests” in a sufficiently flexible way. A can of worms, whose
closure should not be a blocker to enouraging the shipping of tests (and
docs etc etc) in sdists.

Cheers,
Cameron Simpson cs@cskk.id.au

fungi · March 24, 2022, 10:46pm

I see. Is that with setuptools_scm, or custom tooling?

It’s a setuptools plug-in named PBR which implements features
similar to setuptools-scm with a focus on combining SemVer hinting
with support things like for PEP 440 pre-release version calculation
and generated linear development version strings (but does also do
other things setuptools-scm does like handle building the manifest
from the tracked files list). It additionally implements features
similar to setuptools-changelog, can build AUTHORS files, and so on.

CAM-Gerlach · March 24, 2022, 11:38pm

Ah, yes, I’m familiar with PBR—that’s what OpenStack uses, IIRC. Thanks.

fungi · March 25, 2022, 12:35am

Ah, yes, I’m familiar with PBR—that’s what OpenStack uses, IIRC.
Thanks.

The same. Irony is the main reason we originally wrote it over a
decade ago was, for consistency purposes (thousands of
developers–need I say more), we needed declarative package
configuration so people wouldn’t stuff all sorts of crazy into
setup.py. Years later, SetupTools itself replicated the setup.cfg
idea, though with some subtle changes to the keys (we added aliases
for those to keep things backward/forward-compatible).

pitrou · March 25, 2022, 10:20am

My inclination is to include tests in the Python package itself (for example, tests for pyarrow are under pyarrow.tests) so by construction they would be part of the sdist unless specific measures are taken to exclude them.

No idea for documentation, though.

pitrou · March 25, 2022, 10:21am

If the project is non-trivial, you probably need to run the tests to ensure that your build is functional, no?

pf_moore · March 25, 2022, 10:39am

If the project is non-trivial, you probably need to run the tests to ensure that your build is functional, no?

If I’m porting, I’d do so from github, not from a sdist. By “otherwise unsupported” I really mean “one where there’s no wheel”. But it’s not particularly important, this is only my personal view, not any sort of policy or recommendation for others.

hugovk · March 25, 2022, 10:50am

We include tests and docs in Pillow’s sdist, we’ve had downstream distro packagers request tests.

I also tend to include them for other projects using setuptools_scm.

For Pillow the test images make the sdist pretty big, but for other projects it doesn’t make much difference, and wheels are provided anyway.

pitrou · March 25, 2022, 11:07am

Often, when there’s no wheel, it’s also not a continuously tested platform for the package. You can of course just blindly hope it works anyway. But being able to run the tests is useful.

steve.dower · March 25, 2022, 11:57am

There isn’t really much in the way of (public/popular) tooling around pulling an sdist from PyPI and doing anything other than installing it.

So people (like me) who need to do this occasionally have a choice between using curl or git clone and it really doesn’t matter which way to go.

That said, I do like that sdists are an official copy of the source at the time the associated wheels were built. Slightly easier than finding the git tag. I’d be quite happy for the sdist to contain the entire repository snapshot for simple packages, and like that they can be rearranged if you have a more complex repository setup.

blink1073 · March 25, 2022, 1:02pm

For conda-forge, the suggested pattern is to pull the sdist from PyPI and build the conda package from it. This is another example outside of Linux distros where it can be helpful to include tests so the test suite can be run as part of the conda build in a representative conda environment.

pf_moore · March 25, 2022, 1:20pm

Ultimately, “common practice” will be dictated by what build backends make default, or make easy. Larger projects that are engaged with distributors may well add extra config to include tests, docs or whatever in the sdists, but the majority of projects, I suspect, will just do the minimum needed to get a proper distribution from their backend.

If it’s super-simple to include tests/docs, someone saying “please can you include the tests/docs in the sdist” may be enough to get them added^[1]. but if it’s hard (hello, MANIFEST.in, I’m looking at you ) most projects won’t bother, or will give up.

If the default behaviour is to include them, they will likely be there anyway. ↩︎

steve.dower · March 25, 2022, 3:10pm

Yeah, I agree. Though there’s a demand chain involved too - build backends will implement (eventually) what projects ask for, and projects will implement (eventually) what users ask for, so if users aren’t even contemplating using an sdist as a way to get a fully buildable/testable set of sources, nobody will ask for it.

This is why PyPA-“endorsed” recommendations are so important: they can bypass that demand chain and essentially tell projects or backends “we know nobody has asked you for this yet, but we know it’s important and you should do it anyway”.

I think that’s why these questions keep coming up here. It’s not just for the discussion, but to find out the “correct” answer.

FFY00 · March 25, 2022, 3:49pm

I think it would make sense if we standardize test invocation, allowing people to blindly run tests from a sdist before building the binary distribution. Otherwise, I’d lean towards to not including them as they increase file sizes and do not really provide almost any benefit. If one had to have prior knowledge of the project to run the tests, they might as well just pull a proper distribution tarball or clone the git repo instead for eg.

barry · March 25, 2022, 4:04pm

From one of my pyproject.toml:

source-includes = [
    'docs/',
    'test/',
    'tox.ini',
    'conftest.py',
]

Could only be easier if these were the default I guess