14 posts were split to a new topic: Providing a way to specify how to run tests
I split off a bunch of replies to my 3-step process post because it was going into details about how to know how to run tests instead of whether having an answer changed how we cared about this topic.
Back to the topic at hand: it seems like most people like the idea of suggesting people include docs and tests in sdists. The question is where to put that recommendation. Paul had suggested Packaging Python Projects — Python Packaging User Guide as a user-focused change. I had suggested the sdist spec to suggest build tools facilitate doing this.
I personally don’t see a problem with doing both selections.
Sorry for being the main culprit on that, and I personally appreciate others chiming in to share more details, since I’m by no means an expert at either Tox or distro packaging (and not only learned new things, but even more importantly discovered a couple things I thought were true actually weren’t in the process).
+1 from me; IMO its a good idea to have a user-focused recommendation somewhere most users, especially those looking for advice, are likely to see it, while also including general, non-prescriptive guidance in the sdist spec for tooling to keep in mind and to reference when maintainers or others questioning or aren’t sure if these belong in an sdist.
Are we therefore suggesting that build backends (such as setuptools and flit[1]) should change to include tests by default? Have we had any feedback from those projects that they support this? I’d be strongly against adding a recommendation that the main build backends are going to ignore - that damages our credibility for no benefit.
It’s worth also noting that
flit build
creates a different sdist from theflit_core
build backend, so both should be considered. The main flit command will probably include tests simply because they get checked into VCS, whereasflit_core
builds a “minimal” sdist, which will not includetests
(at least by default). ↩︎
It doesn’t directly imply that backends must do this, but it suggests that they should facilitate it. Most of them already do so by default (off the top of my head, setuptools w/setuptools_scm, Poetry, Hatch/Hatchling, PBR, and Flit, not sure about PDM). The only backends I’m aware of that don’t currently do so are flit_core
and plain Setuptools w/o PBR, setuptools_scm or other plugins.
At least in terms of the latter, I seem to recall some talk @abravalheri related to your work about potentially making Setuptools fully “zero-config” in the future, i.e. doing something similar, though not sure if I’m remembering that right? And @takluyver , any perspective from the Flit side?
The effort for making setuptools usable without requiring configs[1] was mostly focusing on the packages
, package_dir
and py_modules
fields. The idea was to make it possible to use setuptools in a project that has a pyproject.toml
file with the bare minimum fields required by PEP 621.
Personally, I would incentivise people to adopt the VCS system as the single source of truth for which files to include in the sdist (via the relevant plugin, like setuptools-scm
)[2].
Remember that the vast majority of packages use setuptools without VCS inspection, and so a change to the default would be a functionality change. Would you need agreement (possibly in the form of a deprecation notice) from a sample of the general package developer base before making this change?
Also, for the user-focused documentation, can we please include some information as to why we recommend this? I’d rather not end up with a bunch of “because it says so in the guide” requests.
To be honest, the logic for wanting this seems to still be unclear (see the discussions above around whether redistributors running tests are going to find anything that wouldn’t have already been picked up by the developer’s testing). It would be good to make sure that we have a consensus here on why this is beneficial before we start making recommendations for the general public.
Personally, I consider all 3 steps to be linked - I don’t see the value of any of them individually, they work together or not at all.
Individual package authors may be willing to support people building their own processes for running the tests. In that case, they may find value in just having some of the steps but not all. That’s fine, but it seems to me that’s the package author’s choice.
Personally, I avoid like the plague any mechanism which ties what’s in the sdist to what’s in the VCS. I don’t want things like .github
in the sdist, and I routinely check experimental/demonstration code into VCS (so it’s available in all of my development environments) that I wouldn’t want in anything that claims to be a “distribution” of the project.
Clearly, I’m an outlier here, though - so keep that in mind when considering how much weight to give to my preferences
That is a very good point.
Yesterday when I was writing this, I thought about the same thing: there are files (specially related to CI) that are very common to code repositories but that don’t make much sense in a sdist. In the end of the day it will boil down to the user’s preference.
For example, some users (me included) may believe that the benefits of having a single source of truth for which files to ignore (e.g. transient files/artifacts), outweighs the side-effect of having these “useless” files in the sdists
(specially considering that they are not part of the final wheel). Other users will prefer to have fine grained control over the contents of the sdist.
I feel that we’re straying pretty close to a philosophical discussion of “what is the purpose of a sdist” at this point. That’s likely to be a large, and possibly controversial, discussion, as well as likely not resulting in much in the way of actionable decisions. So I’d rather not divert this thread any further for now.
So let’s just say “people have different opinions on what should be in a sdist” and leave it at that for now
If and when we do have the “what is a sdist” discussion, I think we should start it from the POV of what invariants we’d like to see (such as “any sdist claiming to be foo 1.0, when used to build a wheel, should produce a wheel that’s functionally equivalent to any other foo 1.0 wheel”). Some of these are obvious (like that one), others are not (for example “if you build a sdist from a sdist, you should get back an equivalent sdist”).
Redistributors find stuff all the time that the developers didn’t pick up. Even things as simple as new breaking changes in dependencies may be picked up elsewhere.
I didn’t think I saw much controversy around this particular point - the main value that redistributors add is testing packages in the context of their distribution (whether that value is publicly available or private to an organisation), and many redistributors add value by recompiling from source in a consistent build environment, rather than the mixed sources that you get from builds on PyPI
For redistributors that start from sdists rather than git tags, having tests in the sdist seems like an easy ask. But it does depend on what we want sdists to be - if they are not supposed to be useful canonical sources for the package, then everything I just said is irrelevant because no redistributor will want to use them.
(Also, for those who aren’t familiar with my position on this, I consider practically all pip users to essentially be redistributors. Assembling an environment yourself is exactly the same job, and being able to validate that the wheels you just [let the tool] choose work together is as important for those users as it is for those who are going to redistribute the resulting environment.)
At least support it, yes. I don’t know if “facilitate” encompasses that or starts to get into “encourage” territory.
Just to be clear, the particular point I was harping on was not that downstream testing doesn’t add value or is unlikely to find anything the developer didn’t, but that downstream testing using an identical isolated environment to the upstream (as with Tox, by default, without the plugin @encukou mentioned or specific configuration to disable this) adds limited value for both the original project and redistributors, whereas testing in the distribution’s environment most certainly does. @steve.dower put it well:
I wanted to start a similar discussion and I’ve found this topic. FWICS there hasn’t been any real progress made here, particularly the recommendation doesn’t seem to have eventually made it into the guide.
I am one of the more active developers responsible for the maintenance of Python packages in Gentoo, and I’d like to add a few data points. Downstream testing was always at heart of Gentoo, and its specificity as a source distribution means that the vast majority of our users deal with sources and therefore are impacted by how these sources are distributed. On top of that there are some very strict restrictions — we need to be able to prefetch all the sources necessary for the build via HTTPS and no Internet access is allowed during the build.
For a long time, we were replacing PyPI sdist archives with GitHub generated snapshots as sources of our packages, primarily because of successive projects not including tests, documentation sources or other files in their sdist.
However, I should note that this is not limited to files intentionally omitted — only yesterday I’ve noticed that simplejson stopped including the C sources in their sdist. The modern build systems seem to have been focusing on a certain kind of tricky minimalism that sometimes causes files to be unexpectedly omitted. When most of the testing pipelines operate either directly on the repository or use generated wheels, and so do users, it’s easy to miss problems with sdist archives.
Using GitHub archives come with a number of issues. Just to list a few I’ve seen over the years:
- There’s no real guarantee that the archives will be reproducible. Only recently a major mayhem was caused when GitHub upgraded git and caused all archives to be generated differently. They reverted the change for now but we knew it was coming, and it will hit us again in the future.
- GitHub archives are often missing generated artifacts that are non-trivial to regenerate. Just to recall some problems we had recently: packages that required Cython 3.0 (which is a problem when Cython is installed system-wide and we’re not ready to have our users upgrade to alpha versions), packages that used npm to generate some assets (true horror to get working offline).
- setuptools_scm use gets problematic. For a long time, we’ve been working around it by setting
SETUPTOOLS_SCM_PRETEND_VERSION
. Only recently someone noticed that this actually causes setuptools_scm to skip installing some package data files that would be installed otherwise (e.g.py.typed
files in some packages). We still haven’t been able to estimate how widespread the problem with our packages is.
All things considered, I think using GitHub archives to workaround the problem of sdists missing files that we need is not a good long-term solution, and we really need to start urgently looking for a better solution. Ideally, one that doesn’t require every downstream packager to repackage everything.
Personally, I’m not convinced that a recommendation would be sufficient to solve the problem but having an official recommendation to use as an argument would certainly help the case.
An extreme option would be to extend PyPI to allow two different kinds of source distributions — “minimal” distributions that focus on size and include only files that are needed by the build system, and “full” distributions that include all files in the repository (or at least a useful subset of them).
In any case, I think we need to do something, and we need to do it urgently because GitHub archives just blew up in our faces, and we can’t just keep pretending it won’t happen again.
As someone responsible for many hundreds of projects on PyPI which
rely on a build step to generate important (sometimes even legally
important) file content distributed in our signed sdists which can’t
be reconstructed without the metadata encoded in Git commits and
tags, I’ve had to fight for more than a decade to convince
downstream distro package maintainers that just tarring up the Git
worktree and assuming that’s an accurate representation of the
upstream project’s intended distribution is an actively harmful
workflow choice.
We see Git repositories as an implementation detail of our
development process, not a software release distribution channel,
and go to great pains (including testing on every commit) to ensure
that when we build an sdist it doesn’t leave anything out which the
Git repository contains, other than directly Git-specific files like
.gitignore for obvious reasons. I too feel strongly that
incorporating tests and documentation in an sdist is appropriate,
the name is a contraction for “source distribution” after all. If
developers want their users not be required to download those extra
files, that’s what wheels are for.
My position hasn’t changed from what I said above. People have different opinions and intentions for their sdists, and right now that’s valid. Some people view them as “a way to build a wheel on platforms where wheels aren’t available”. Some people view them as “a way to ship all of the artifacts I consider part of what’s needed to work on my project”. Some people view them as “a way to ship everything I expect a redistributor to need”.
Getting a policy on a single, unified view of what a sdist is, will be a potentially controversial debate. Publishing and promoting the resulting definition, so that everyone publishing sdists knows what’s expected of them, will be an exercise in itself. Writing tools that validate if a sdist conforms to that definition (check that tests and docs are included, for example) is another aspect, as is changing the culture to make use of such tools commonplace.
But IMO, this discussion has run its course. We’ve established that people have different opinions, even though for some people, a common approach is important. We need to look at the next step, which is probably for someone who cares sufficiently, to draft a document that is intended to become a PEP (or some other form of document with the force of a standard) and manage the process of gaining consensus.
Personally, I try to package sdists with things that might be necessary for the downstreams to do their thing (i.e. docs/tests in particular).
With that in mind, I’ve been thinking of how to prevent regressions/make sure it’d work.
(shameless plug!)
So I came up with an approach to build the artifacts prior to running testing. To facilitate this in GHA, I’ve made this simple action that is intended to replace the common Git checkout step in the test jobs in my projects and would hopefully help others too, one day. Here it is: checkout-python-sdist · Actions · GitHub Marketplace · GitHub.
I hope that folks like @mgorny would appreciate upstreams emulating some the downstream processes with almost no effort needed.
Any and all help appreciated. In fact, I was thinking of creating something akin to autotools’ make distcheck
but never had time for it — i.e. basically something that would spawn python -m build
, unpack the artifact and run tests inside.
Sorry to necro an old discussion, but I want to push back on this a bit and highlight what I think is the core of the persistent talking-past-each other that has been going on.
In my view the tag in git (or the same in whatever version control the project uses) is the “level 0” truth of what a version is, the sdist is the “level 1” truth of the release, and any binary artifacts (wheels, conda packages, deb, rpm, exe installers, …) are equivalent “level 2” truth of what the release is.
In this view “a way to build a wheel on platforms where wheels aren’t available” and “way to ship everything I expect a redistributor to need” are isomorphic to each other. It just happens the “redistributor” who builds the wheel is frequently the same person/process who makes the sdist, but I do not see how that is a material detail.
I think the answer to OP’s question is unequivocally “Yes”!
The other point of contention, for me, was the implication that a
naïve tarball of the Git worktree is just as good as the actual
repository. So while I agree with you that the actual state of the
Git data is at least as good (and generally superior to) the sdist
for most projects, the “release tarballs” you get from GitLab or
GitHub are not necessarily as good as the actual repositories they
were extracted from. In the case of the projects I work on, our
sdists may not be quite as comprehensive as their Git repositories,
they’re still closer than a GitLab/GitHub “release tarball” is.
Basically, an sdist can be very close to your “level 0” while a
tarball of the content from a Git repository can be quite a bit
farther from it.