Providing a way to specify how to run tests (and docs?)

Yeah, exactly.

Once the tests are unpacked and the test deps are installed, from both my (limited) experience in Conda-Forge repackaging and IIRC from skimming the Debian and Fedora packaging guides, in practice this step mostly boils down to just pytest <path/to/tests> (analogous to the pre-PEP 517 de-facto install). This should likely suffice to run the great majority of test suites, given the outsize popularity of Pytest and the fact that it is nearly drop-in compatible as a runner for unittest and mostly so with nose.

True, some projects require specific CLI options, config and environment setup, but those are often (though not always) included in the and pytest.ini (at least, they generally should be). So yes, there are projects that require something than essentially just pytest . or pytest <path>, but it’s not nearly as big a hurdle in practice as the above makes it seem once the deps are installed.

@FFY00 you’re involved in distro packaging, right? Any input here, or other people you could refer here who would give helpful input?

Agreed. Maybe we can start with some common practices and/or heuristics. For example in my projects:

  1. I do this by adding a source-includes key in the [tools.pdm] section of my pyproject.toml file. I include my top level docs/, test/ directories, and top-level tox.ini and files.
  2. I don’t use a test extra, but have a testing key in my [] section of my pyproject.toml file.
  3. $ tox - this is the one implicit part you’d have to heuristically detect.

So pretty much, if you see a tox.ini file, just run tox and that will do everything necessary. I guess nox would have a similar heuristic. I really like this set up.

The only minor glitch is that both tox and pdm have to be installed for everything to work. Installing pdm isn’t a problem though because I include that in the deps key for each of my tox environments. The tricky bit here is that you can either have tox drive pdm, or pdm drive tox. I’ve experimented with both and prefer the former. So if you get one of my sdists, all you need to do is run tox and it should all Just work.

The big problem I see with this is, as I’ve explained a few times previously, this defeats the purpose of downstream testing (the primary motivating reason for shipping tests with sdists in the first place). Doing this simply unit-tests the code in isolation in a consistent, reproducible environment, which should be no different from the tests that ran and passed before the project was released upstream.

What downstreams require is a way to integration test that projects, as built, packaged and installed in their environment, still functions as expected. For that, they need to invoke the test runner directly, not through an intermediary environment manager. Usually, this is still pretty simple, though not always, which is where a PEP 517-like hook would be helpful (but out of scope here).

Integration testing always seems to be under utilized, even in corporate environments. The best ecosystem-wide testing of Python libraries that I’m aware of are redistributors which regularly rebuild and run test suites of all applications and dependencies. Although I’m years removed from keeping up with things, I was most familiar with Debian and Ubuntu, and those “rebuild the world” events often caught bugs that weren’t otherwise caught. It’s a difficult problem and requires significant infrastructure to pull off.

Right, but what I’m saying here is that unless I’m gravely mistaken, generally speaking the reason downstreams want upstreams to bundle tests is so they can run them on the project _as packaged and installed in the downstream’s environment, not the raw code in an isolated environment provided by the upstream. Otherwise, if all the conditions are the same, downstream testing offers little to no value over what is already done upstream, while missing most packaging and integration issues the downstream may actually be responsible for.

Not necessarily. For example, when we pull packages from PyPI into our internal mirror, we like to get the sdists and run the package’s tests mostly as a sanity check on that newer version of the package.

Here’s an example of an issue we typically get from downstream packagers in Jupyter, which details the process they’re using to create the rpm package.

Inlining here:

"So I'm using the typical PEP517 based build, install and test cycle used on building packages from non-root account.
 - python3 -sBm build -w --no-isolation
  - because I'm calling build with --no-isolation I'm using during all processes only locally installed modules
  - install .whl file in </install/prefix>
  - run pytest with PYTHONPATH pointing to sitearch and sitelib inside </install/prefix>"

Sure, I guess, but wouldn’t this ultimately be an indication of an upstream issue they somehow didn’t catch running the tests themselves, or some niche corner-case on the platform you’re running them on that wasn’t in their CI matrix, rather than anything on your end (barring some serious misconfiguration)? So perhaps, this, this would have some limited value for a fraction of use cases, but as evidenced from @blink1073 's example, this is not downstreams in general seem to do.

(Fedora packager here)
tox is very, very nice for packages that use it. It’s the closest we have to a standard way to run tests.
pytest path/to/tests usually works too, but no, the required options are usually not found in config files. You’re more likely to find them in GitHub Actions config, mixed with commands to set up a very specific environment.

An underappreciated problem we have is that people mix non-functional tests (like linting, code formatting, performance) with their unit tests. The tools involved tend to change their output often, so the checks tend to require very precise versions/environments, which might not be available. And they tend to report “issues” like too many blank lines that aren’t worth reporting/fixing.

But Tox can run tests in a pre-created environment, and it can print out the requirements needed to create that environment. See the tox-current-env plugin.
So we can actually test the installed package. (We currently don’t do this exactly in Fedora, but we’re close.)

For integration, running tests of dependent packages are also useful. For those, the dependencies are installed exactly as on user systems.


I have also faced that kind of report before.

In the end it boils down to the user experimenting with a different test methodology then the one officially supported by the package developers… While that is perfectly fine (and might even help to surface interesting/unintended behaviours), it is unfair to shift the onus of supporting such experiments to the package developers (specially if the intended test workflow is available as code for the CI process and can be reproduced).

(There are problems other than the test methodology also: based on reports from the same user to other projects, it also seems to me that the workflow used to build packages can be incompatible with projects using setuptools-scm, e.g. the build is not triggered from the sdist or from a git checkout)

1 Like

I just recalled that setuptools used to have a tests_require parameter, which was meant to contain a list of dependencies that would be automatically installed for the python test command. I had completely forgotten about this, and as far as I know this was enforced, whereas the test extra was not.
This with setuptools automatically adding the test/test*.py to the sdist, it made for what seems like a complete solution. On the other hand I think that I never managed to make it work well enough for my needs at the time. And of course most of this is deprecated now, but I thought it was interesting to note.

Yes, and that’s kind of the point :smiley: - if/when we find such issues, we’ll contribute back with bug reports and PRs, so I think it makes the packages better. It helps them and us.

1 Like


I generally separate them into different tox test environments, so at least in theory they can be isolated. E.g. [testenv:qa] runs my flake8, isort, blue, and mypy checks. In practice you still run these if you just run tox so it’s probably not a super useful distinction for downstreams, even though it’s pretty handy for development.

Very cool, and I can see how helpful that would be for a Linux distro. One of the things mentioned in that package’s readme:

Well, it turns out that tox became too popular and gained another purpose.

The Python ecosystem now has formal specifications for many pieces of package metadata like versions or dependencies. However, there is no standardization yet for declaring test dependencies or running tests . The most popular de-facto standard for that today is tox , and we expect a future standard to evolve from tox.ini . This plugin lets us use tox 's dependency lists and testing commands for environments other than Python venvs.

I couldn’t agree more, and I’ve long dreamed that something like tox would be the specification and interface. Maybe we’re not that far off from that, although maybe with things like nox now we need some other standard that isn’t tool dependent.

I’m assuming you’re referring to things like people using Pytest plugins to run Mypy, Flake8, etc. along with their test suite? I’ve never understood why people do this, honestly, and none of the projects I’m involved with do; if you’re going to integrate your linter checks with a common framework, IMO it makes far more sense to do so with something like pre-commit that’s specifically designed for the task and much more useful, and easy to use for it, while avoiding the pitfalls of trying to shove a round peg into a square hole to make static checks act like dynamic tests.

Ah, well that that obviates most of my concerns over it, thanks. Of course, the issue is that those solutions are specific to one environment manager and require extra steps and plugins. Do you think distros would be receptive to standard PEP 517-like hooks to build docs and invoke tests, as distutils used to offer, or is the current modern status quo not burdensome enough to justify this?

Presumably, the GH Actions workflow @blink1073 posted previously avoids this issue by unpacking the sdist and running the tests from there? Naturally, this would lend further credence to the points that were made on the “Include tests and docs in sdists” thread, particularly those in response to my initial statements asking why running them from Git tarballs was not sufficient, as they would presumably have the same problem, unlike true sdists.

Indeed—I was assuming that on the upstream end, the tests were being fully run on a proper matrix, and the downstream end was using an isolated, reproducible tox environment, so in this ideal world there should never be cases where these sort of tests fail downstream but not upstream, but as I sometimes forget, we are not living in an ideal world :smiley:

Hasn’t the whole odyssey with picking one specific tool (Setuptools) and defining its behaviors, quirks and baggage as the de facto standard for Python packaging that everything else had to emulate taught us that this might not be the best idea, especially long-term? :joy:

That works wonderfully in practice, too. A distro doesn’t typically need to run on multiple Python versions anyway, so we typically only use one environment.

It goes both ways – you can run functional tests from pre-commit, or linters from tox.
And why not – using two different tools designed to install and run various other tools seems rather silly.

Fedora would. But it would be a big project – tox config is pretty complicated.

Well, at a minimum, the specification would need to specify the info tox needs, even if other tools do things differently. That’s why I think investing in tox is a good idea.

No testing matrix is complete :‍)

I think this isn’t actually such a big project - it’s basically what Brett has been trying to achieve with pyproject.toml entries for “command to run to do X”, and then a simple tool that knows how to read and trigger it. So the file would look like:

requires = ["tox"]
command = tox -e test ...

Or it might be:

command = $PYTHON -m unittest

But either way, the command is now discoverable directly from the sources in a programmatic way, rather than having to dig into the various CI build definition files to figure it out (at least, that’s how I’ve been doing it recently :wink: ).


It seems that approach either assumes the matrix only has one entry, or the tool handles creation of all the environments itself.

I guess that’s where the bikeshedding comes in, though I think there’s a solid case for the tool to handle creation (as it’s the part with the CLI/GUI, and may need to do installations). “Validate the project in the current configuration” is a fine goal for this.

The biggest problem would be whether a tool like Tox wants to be the runner or the runnee (probably both…), but certainly platforms like GitHub Actions would want to do the matrix themselves and only run the tests once.

Also, it’s just a default. No project is prevented from having additional options or requiring users to use different commands to run them, just as you can still use build backends directly even though PEP 517 exists. It just enables programmatic handling of the default command, which is enough for a lot of scenarios.

I think the catch is here. The environment+command to run in CI, the environment+command the developer runs on their machine, and the environment+command a redistributor wants to use will often be very different.
And once you start adding those additional options, you eventually end up with Tox.