Sadly many projects are taking the view that wheels are the artifact for
installation, and there’s no value in publishing sdists (or including
docs and tests in sdists) when downstreams can get zip archives from
github (and others). There is often resistance when people request that
tests be included in sdists.
It really depends on the project. For most of the projects I’m
involved with, we consider sdists to be our real “source tarball”
artifacts because our sdist release process incorporates information
(versions, authors, changes…) extracted from our Git metadata at
build time. If downstream consumers just used tarred-up copies of
the Git worktree, such as those produced by hosting platforms like
Gitlab or GitHub, then that critical information would be absent.
I’m ambivalent over the matter of tests being included in sdists. My (personal) view is that I want the sdist so that I can build the project on an otherwise-unsupported environment, and I can view the source code of the project (even though pure Python wheels include the source, it’s not in “buildable form”). I don’t see supporting files like documentation sources, tests, release scripts, etc, as necessarily part of the sdist - if projects want to include them in the sdist, then that’s fine, but on the other hand, if they want to point users to a project repository for those things, I’m also fine with that. For my personal projects, I tend not to bundle these sorts of thing.
If we want to make it the norm to include tests, documentation source, or whatever in the sdist, then IMO we should be standardising the details (such as the directory in the sdist where they go, a standard means of running tests, building docs, or whatever).
Consumers like Linux distributions who need a published, well-defined source for the project could reasonably contribute a set of shared requirements here, that we could use to drive a standardisation effort. But without some clear understanding of what the requirements are, I don’t see the packaging community being in a good place to agree a standard for this sort of thing. And so far, there’s been little unified input from the Linux distros on this type of detail.
I’ve gone back and forth about that too @pf_moore and in fact I used to include tests/ and docs/ inside the source directory. You can see this in Mailman f.e. where I went deep down this rabbit hole. Maybe it even makes sense in a big project like that.
These days I still include tests/ and docs/ in my sdists, but I make those siblings of the src/ directory. Big app vs smaller library isn’t apples to apples, but it feels like the right trade-off these days.
I don’t see any upside to omitting tests/ and docs/ from the sdist and a lot of downside. Arguments about tarball size aren’t at all convincing these days, but I want downstream consumers of my sdists to be able to reproduce the build as much as possible (and that includes running the tests and building the docs).
By Paul Moore via Discussions on Python.org at 24Mar2022 20:43:
I’m ambivalent over the matter of tests being included in sdists. My (personal) view is that I want the sdist so that I can build the project on an otherwise-unsupported environment, and I can view the source code of the project (even though pure Python wheels include the source, it’s not in “buildable form”). I don’t see supporting files like documentation sources, tests, release scripts, etc, as necessarily part of the sdist - if projects want to include them in the sdist, then that’s fine, but on the other hand, if they want to point users to a project repository for those things, I’m also fine with that. For my personal projects, I tend not to bundle these sorts of thing.
Just to this, if I’m building something on a platform with no
presupplied version, if there are tests I can run it gives me more
confidence that the build produced a working thing.
If we want to make it the norm to include tests, documentation source, or whatever in the sdist, then IMO we should be standardising the details (such as the directory in the sdist where they go, a standard means of running tests, building docs, or whatever).
There is much room for bikeshedding there, and that might be a PITA if
the author’s layout does not match the whatever standard might ensue.
As an example, part way down the rabbit hole I’d entered when I made
this post was looking for the pyproject.toml equivalent to setup.py’s package_dir setting, on which I was relying (my Python
code is in a lib/python subdir and the hg archive incantation I use
to construct the basis of the build tree preserves that). I’ve been
shipping sdists in that structure for years.
Likewise, a common pattern for tests seems to involve tests/
subdirectories, whereas a lot of my cs/foo.py modules have an
associated cs/foo_tests.py tests file. I ship those in my sdists too.
Just making the point that any standardisation is either going to
restrict author practices or need to take the form of specifying a hook
to “run tests” in a sufficiently flexible way. A can of worms, whose
closure should not be a blocker to enouraging the shipping of tests (and
docs etc etc) in sdists.
I see. Is that with setuptools_scm, or custom tooling?
It’s a setuptools plug-in named PBR which implements features
similar to setuptools-scm with a focus on combining SemVer hinting
with support things like for PEP 440 pre-release version calculation
and generated linear development version strings (but does also do
other things setuptools-scm does like handle building the manifest
from the tracked files list). It additionally implements features
similar to setuptools-changelog, can build AUTHORS files, and so on.
Ah, yes, I’m familiar with PBR—that’s what OpenStack uses, IIRC.
Thanks.
The same. Irony is the main reason we originally wrote it over a
decade ago was, for consistency purposes (thousands of
developers–need I say more), we needed declarative package
configuration so people wouldn’t stuff all sorts of crazy into
setup.py. Years later, SetupTools itself replicated the setup.cfg
idea, though with some subtle changes to the keys (we added aliases
for those to keep things backward/forward-compatible).
My inclination is to include tests in the Python package itself (for example, tests for pyarrow are under pyarrow.tests) so by construction they would be part of the sdist unless specific measures are taken to exclude them.
If the project is non-trivial, you probably need to run the tests to ensure that your build is functional, no?
If I’m porting, I’d do so from github, not from a sdist. By “otherwise unsupported” I really mean “one where there’s no wheel”. But it’s not particularly important, this is only my personal view, not any sort of policy or recommendation for others.
Often, when there’s no wheel, it’s also not a continuously tested platform for the package. You can of course just blindly hope it works anyway. But being able to run the tests is useful.
There isn’t really much in the way of (public/popular) tooling around pulling an sdist from PyPI and doing anything other than installing it.
So people (like me) who need to do this occasionally have a choice between using curl or git clone and it really doesn’t matter which way to go.
That said, I do like that sdists are an official copy of the source at the time the associated wheels were built. Slightly easier than finding the git tag. I’d be quite happy for the sdist to contain the entire repository snapshot for simple packages, and like that they can be rearranged if you have a more complex repository setup.
For conda-forge, the suggested pattern is to pull the sdist from PyPI and build the conda package from it. This is another example outside of Linux distros where it can be helpful to include tests so the test suite can be run as part of the conda build in a representative conda environment.
Ultimately, “common practice” will be dictated by what build backends make default, or make easy. Larger projects that are engaged with distributors may well add extra config to include tests, docs or whatever in the sdists, but the majority of projects, I suspect, will just do the minimum needed to get a proper distribution from their backend.
If it’s super-simple to include tests/docs, someone saying “please can you include the tests/docs in the sdist” may be enough to get them added[1]. but if it’s hard (hello, MANIFEST.in, I’m looking at you ) most projects won’t bother, or will give up.
If the default behaviour is to include them, they will likely be there anyway. ↩︎
Yeah, I agree. Though there’s a demand chain involved too - build backends will implement (eventually) what projects ask for, and projects will implement (eventually) what users ask for, so if users aren’t even contemplating using an sdist as a way to get a fully buildable/testable set of sources, nobody will ask for it.
This is why PyPA-“endorsed” recommendations are so important: they can bypass that demand chain and essentially tell projects or backends “we know nobody has asked you for this yet, but we know it’s important and you should do it anyway”.
I think that’s why these questions keep coming up here. It’s not just for the discussion, but to find out the “correct” answer.
I think it would make sense if we standardize test invocation, allowing people to blindly run tests from a sdist before building the binary distribution. Otherwise, I’d lean towards to not including them as they increase file sizes and do not really provide almost any benefit. If one had to have prior knowledge of the project to run the tests, they might as well just pull a proper distribution tarball or clone the git repo instead for eg.