Providing a way to specify how to run tests (and docs?)

Conda has a nice declarative way to handle testing. We could encode essentially the same settings into pyproject.toml, with the limitation that $PYTHON is the only available executable other than console scripts that are installed as part of the test requirements.

1 Like

Not necessarily. Assuming we can generalise those categories, we could just end up with a variable (e.g. $CONTEXT) that can be used as part of the command. So your command is now tox -e $CONTEXT which becomes tox -e test_ci or tox -e test_dist depending on what the tool chooses to pass in.

The main thing this command enables is the person trying to invoke tests doesn’t have to figure out whether the project uses (e.g.) tox or nox. It in no way replaces what those tools do, and doesn’t even prevent project developers from having to know about them and how to invoke them directly.

Since I’m doing this exact task today, here’s a great example of a repository that would benefit from having somewhere to define the test command and dependencies. It’s not difficult to figure it out, but also far from obvious.

That would be fine wouldn’t it? The objective here is to have a simple mechanism to expose a way of running the tests, right? Or is it the objective to overtake the UI and work as a frontend for tools like tox/nox?

If we want flexibility but without overtaking existing projects, an alternative can also be inspired by PEP 517:

[tasks]
requires = ["tox"]
runner = "tox:taskrunner"

+

def get_supported_tasks_names(...) -> List[str]: ...
def run_task(name: str, params: list[str], ...): ...

+ some standardized task names such as test, docs, etc (all being optional) and other non-standardized task names

2 Likes

I guess the difficulty, then, is in generalizing the categories (task names), and making tox/nox (or at least their best practices) respect the chosen names.

1 Like

Yep, that’s what I think we would end up working towards.

For me, it’s the ā€œexpose a way of running the testsā€.

If you think about what it takes to run a test suite or build your docs, it’s basically:

  1. Get the files (from an sdist, VCS, etc.)
  2. Install the dependencies (either via an extra or hard-coded in nox or tox)
  3. Run the tests (which currently is hard-coded in e.g. nox, tox, or GitHub Actions)
  4. Repeat for the various environments available (which is hard-coded in e.g. nox, tox, or GitHub Actions)

What we are talking about is step 3 and a way to specify this in pyproject.toml so what command you need to run for your tests isn’t buried/hard-coded in a task runner like it is now.

BTW, step 4 could be automated thanks to requires-python in pyproject.toml and appropriate code to discover Python installs (which is doable as a library).

Maybe not. Depending on how we specify things, this could all be reasonably self-discoverable. Let’s say we go with the simple solution of being able to specify in pyproject.toml some [tasks] section which let’s you specify a command with any name:

[tasks]
test = ["$PYTHON", "-m", "pytest"]  # or ["$PYTHON", "-m", "nox", "-s", "test"]

We could then say that if there’s a matching extra in pyproject.toml or test, the expectation is that the extra is to be installed before running the task. And with requires-python you can even know what environment are expected to be tested against.

As for names, does anyone use something other than test or tests? I don’t think it’s unreasonable for a tool to assume that people want to run those tests against all the versions of Python available that fit the requires-python constraint.

For nox this could all be done today since it’s just code. I was actually planning on creating a library someday that effectively did all of this as an experiment and then try to get a GitHub Actions workflow to pick up on what’s in pyproject.toml as much as possible. Basically I want to take DRY as far as possible when it comes to setting up testing both locally and in CI (the test matrix is probably the biggest hassle right now).

I believe @bernatgabor said Tox 4 is going to support a nox-style code declaration approach, so that might also work in this scenario in terms of bringing this in w/o massive work on the side of task runners.

2 Likes

That’s the key for me. This pyproject.toml entry doesn’t have to do much, it just has to bootstrap the process of running tests. Everything else, including test environment setup, test matrix, etc. can be handled by the test runner that this entry bootstraps into.

The approach of matching the task name with ā€œextrasā€ seems to work fine for the pytest use case, but how do we conciliate this approach with some existing patterns in the community for tox/nox?

For example, using these articles/projects as reference:

We can see that there is a popular pattern of using extras to centralize lists of dependencies but offloading the responsibility of installing them to the task runner, while the task runner itself is not listed as a dependency…

We can imagine something like:

# pyproject.toml
[project.optional-dependencies]
test = ["tox"]
_test = ["pytest", "pytest-cov", "pytest-xdist", ...]
[task]
test = ["$PYTHON", "-m", "tox", "-e", "py"]
# tox.ini
[testenv]
extras = _test
...

But it does look a bit like a workaround for a limitation in the interface.

Well, sorta.
There’s functional tests, code formatting, coverage, static typing checks, performance checks , and who knows what else you can call tests, and it would be nice if they were kept separate.
For me the relevant axis is whether a failure means a bug or not (e.g. machine too slow, or slightly different version of mypy/black).

If we agree test/tests is only for the ā€œwarranty void if brokenā€ stuff, that would be great. But it’s not an intuitive meaning for everyone.

3 Likes

I use ā€œlintā€ frequently for running linters (not technically tests, I know). I don’t actually use ā€œtestā€, as the base tox environment is called ā€œpyā€, so I use that to run tests - tox -e py.

As I’ve already said, I don’t use extras, but rather requirements files to specify test dependencies (or I include them directly in the tox config).

1 Like

From Nixpkgs point of view we really would like to have a test entry point. However, we do generally tend to disable ā€œtestsā€ that are not relevant for us, such as checking whether black formatting passes and coverage. Therefore, I think we’d prefer an entry point that only runs actual tests.

1 Like

But the question is whether the task runner is required in this potential future to run the tests, or just convenient? Using the RPM example, would Fedora need to use tox/nox as provided by a project to verify something worked, or could their own tooling run the test suite based on the information provided and skip using tox/nox? And if tox/nox is required to run the tests then I would argue a test extra leaving those out is an incomplete extra.

But do you need to be able to lint the code of an sdist? I get wanting all of that while developing, but do you need all of that to validate a build?

I totally expect folks will have lint rules, etc., but I just don’t think in this discussion they are critical to try and potentially standardize if we are focusing on automatically running tests from an sdist. But if we do this right then I assume the community will figure out its own best practices around this and the naming scheme will just fall out of it for e.g. linting when working from source.

Are you using setuptools for the code? If not, then in a world of pyproject.toml is keeping dev-only requirements separate in a requirements file still what we think folks should do? For me, I always viewed the setup.py/requirements.txt dichotomy for library/app development mostly a way to avoid conflating setuptools with your development workflow (or any flow where you didn’t really need setuptools as a build tool). But now that we have pyproject.toml, you technically don’t need to use your build tool to read your (optional) dependencies to install stuff. For me, that suggests our historically strict separation from dev versus install dependencies in separate files isn’t quite so important.

1 Like

:thinking: It might make sense to place the task runner is in the convenient category…
However if the user decides to use a task runner, it also makes sense to think about it as the only ā€œofficialā€/ā€œsupportedā€ way of running the tests.

The redistributors are free/welcome to try running the tests in a different manner, but they cannot expect the authors to support this alternative method…

I can see a lot of sense in this reasoning, but maybe the mechanics of using the ā€œextraā€ field is not the most appropriate to express this kind of dependency? When you use tox/nox, you don’t have it installed in the virtual environment accessible to the test code right? The test code don’t need the task runner to execute properly…

If we go to for that, the example in Providing a way to specify how to run tests (and docs?) - #17 by steve.dower seems to make more sense…

I don’t know if I understand this comment…
How is extras_require in setup.cfg/setup.py different from optional-dependencies in pyproject.toml, and why using setuptools or another backend would make a difference in the decision to opt into requirements.txt for the development workflow?

1 Like

I think that depends on the project.

Actually I do, that way I can control for the version of the task runner. Plus it means I don’t have to assume a global install is available.

One requires setuptools to read the extras, the other doesn’t.

The question is why is someone like Paul using a requirements file for dev dependencies? Is it habit, is it because we have historically said ā€œlibraries use setup.py, apps use requirements.txtā€ and people view dev dependencies more like an app thing, etc.? Would pip-tools still read from a requirements.in file if pyproject.toml existed first? I’m just suggesting we may want to start thinking about where we are storing things now that we have a tool-agnostic way to specify stuff (hence this whole discussion about having a way to say how to run tests in a tool-agnostic fashion).

1 Like

Yes, but for reasons mostly unrelated to whether I want to put metadata into pyproject.toml.

I don’t know about ā€œweā€, but I certainly think so. For me, pyproject.toml is for defining the project and how to build it. It’s not a general place to put all config related to anything to do with development. I know many people (and tools) disagree with me, but that’s my view.

I don’t really know what you mean by this. For me, setup.py or pyproject.toml is for defining runtime dependencies as these are part of how the project is used. That’s entirely different from requirements.txt, which is simply a way of listing a bunch of ā€œstuff to be installedā€ for a given task. That task might be ā€œrun the testsā€, or ā€œbuild the documentationā€, or something unusual like ā€œbuild the demoā€, or ā€œdo a releaseā€. But what tasks make sense is entirely down to the individual project, and its workflow. We can suggest best practices, or even write standards to make certain approaches preferred, but it’s never going to be universal in the same way that building a project is (not unless we move towards some sort of ā€œone tool to rule them allā€ approach like some other languages, but this has never been the Python philosophy in the past, and I don’t see why it should be in the future).

That’s exactly the point I’m trying to make. If someone comes to me and says ā€œI tried to run the tests for your project and they failedā€ then I’d be unwilling to treat that as a bug report unless they followed the documented process for running the tests - just like I wouldn’t accept a bug report when running on an unsupported platform. The user can do the investigation and demonstrate that there is a bug, by providing a reproducer on a supported configuration, but it’s on them to do that work, not on me as project maintainer.

On the other hand, I have global installs of tox and nox using pipx, that means I don’t have to install anything in order to run the tasks for a particular project - so I don’t need a virtual environment for the project for anything other than interactive experimentation. It’s all about trade-offs, and that’s something each project (or developer) should be able to choose for themselves.

Because it’s not project metadata, and requirements files exactly capture what I want to express here, which is ā€œa list of requirements for pip to installā€. I’ll have as many or as few requirements files as I need for the project. Or I might not use a requirements file at all if it’s not needed - tox allows me to put the test requirements in tox.ini, so that’s what I do, because that way the requirements are with all the other information on how to run the tests, and I only have one file to manage.

I really don’t see why it’s so hard to understand that this might be a choice that some developers might want to make. It may not match what you prefer to do, but why should that matter? And before you ask, I understand that it means I can’t (for example) just push a button in VS Code to set up a test environment and run tests[1] - but I don’t care about that. Maybe I will in future, and I’ll change my mind (and modify my workflow) but for now I like running tox manually from the command line.

Why do we need consensus here at all? Why do we all have to use the same development workflow?


  1. Although maybe it could, by just letting me specify ā€œwhat’s the command to run the testsā€ :slightly_smiling_face: ā†©ļøŽ

1 Like

Maybe there was a communication misunderstanding with the expression: ā€œvirtual environment accessible to the test codeā€.

Let’s say you install tox/nox in a virtual environment, e.g. /tmp/.venv.

When you run the tests, the code will not have access to any package installed inside /tmp/.venv.

The test code will have access to the packages installed in /<project-root>/.tox/<testenv-name>. You will not be able to import tox when writing a test case…

The direct test dependencies and the task runner in this case are two distinct types of dependencies, installed in two completely different virtual environments.


We can make an analogy with the build process:

A backend can be seen as a dependency of a project. The project will need the backend installed somewhere to be built. However, during runtime, the backend is irrelevant.

It is the same for the task runner and the tests. You need the task runner to orchestrate the test, but you don’t need it when the test code is running.


Since setuptools introduced setup.cfg people didn’t need anything fancier than ConfigParser to be able to read extras_require, but some people still decided to use requirements.txt. I have been using extras_require in setup.cfg to store test dependencies because I like the convenience of centralising them in a single file. But I agree with Paul that these dependencies are not really metadata, they don’t allow my packages to expose extra features to my end users[1], so it is perfectly natural that some developers choose to not mix those two things.


  1. Some people might argue that ā€œbeing able to run the testsā€ is an extra feature for the end user, but that it is debatable, specially considering that it is very common for a wheel to not include tests. ā†©ļøŽ

Just to be clear, is this mainly for making repackaging easier for Debian, Conda, Nix, etc.?

Another use case is if you want to run the tests of known downstream libraries as part of your CI to make sure you aren’t accidentally breaking compatibility.

1 Like

This is a good point, and IMO it actually speaks to the same underlying distinction that @brettcannon alludes to above:

There’s really two separate layers (i.e. concerns) here—the testing tool (and its invocation), which collects, runs and reports the results of the project’s test suite, and the task runner, which is responsible for setting up environments and executes arbitrary project-defined tasks within them (tests, docs, linting, etc).

It would appear that simply providing a standard means of declaring the dependencies of and invocation for the project’s testing tool (and possibly also the docs builder), which downstream tooling could be responsible for installing and calling, would seem adequate to meet most of the immediate need here while being relatively straightforward to complex to implement—either following the form of @steve.dower 's suggestion (or paring it down further, just defining the invocation and using standardized extras names to single-source the dependencies for each).

This more or less follows the model of PEP 518 in defining the build entrypoint and the dependencies it needs (though in this case, it is a single hook rather than several), and make setting up and invoking the callee in a Python environment with the indicated dependencies the responsibility of the the caller, with the callee being responsible for the rest.

However, without clearly defined requirements and guidance, I could forsee two distinctly different usage patterns which could significantly complicate things for callers. The most obvious approach would be for projects might specify their test tool and test-specific deps as test requirements, and the test command as the invocation. Meanwhile, I could see others specifying simply their task runner as a dependency and its main/test entrypoint as the invocation, and defer the actual test environment creation and setup to it, outside the direct control of the caller.

Both provide value, but they solve somewhat different problems and target different layers in the stack, and I worry that unless we either define both, or be very clear about which one is expected, the result for callers will be worse than either since the caller will have no reliable way of anticipating which they’re calling, much less request one or the other. I would advise either providing both, or making it very clear it is intended for the former, since that seems to be what downstreams want here. But I’d like to hear more from repackagers about that.

There’s another alternative—allow defining arbitrary tasks in a standardized format in the pyproject.toml, with a handful of standardized names (test, docs, lint, etc), each with their own dependencies, invocation and perhaps other configuration, either a generalization of the task runner configuration format like PEP 621 for metadata (which would be nice in theory, but perhaps too limited in practice given the diversity of tools and approaches).

There appears to be some interest in standardizing this sort of thing, here and certainly elsewhere as a modern, standardized replacement for the various built-in and custom distutils/setuptools commands, but the scope is much more expansive and I worry it would get bogged down in complexity. Still, it might be something to at least keep in mind in terms of leaving the door open a future proposal when designing this one.

This is a good analogy, though arguably, a task runner is in some ways more analogous to a frontend, as it orchestrates the environment creation for the backend, the project’s actual testing system; and the two are only loosely coupled—the tests, like a build system, can be invoked with a modified frontend configuration, a different frontend or even directly, depending on the needs of the caller.

For example, upstream projects may use tox, nox, etc. during development, whereas repackagers may use tox with a custom plugin to skip dep management, their own env setup tooling or simply invoke the test tool directly, just like upstream developers typically use pip for installation but downstream distros really need something simpler and more customizable like installer.

Yes, but I think @brettcannon 's point is that is an unstandadrized bespoke format tied to one tool, which downstream tools cannot rely on being present or canonical, unlike pyproject.toml.

Just like any other packaging standard, nothing that is decided, specified and implemented as a result of this discussion would require you as a package author to adopt a certain method of specifying your test, etc. dependencies or invocation. Likewise, no one is required to fill out the various core metadata fields, specify declarative metadata in the PEP 621 [project] table, declare their build backend and dependencies in a PEP 518 build-system table, etc—or even use a task runner framework, write and run a test suite, or document how to build, install and run the code.

However, like providing and encouraging a standard, interoperable, tool-independent way of specifying test, docs, etc. invocation and dependencies, these things all help other people, tools and ecosystems use, distribute and contribute back to a given project, ultimately benefiting everybody. Of course, there is a cost-benefit, as it requires some amount of effort on the part of the package author, but I don’t see how at least offering and encouraging a standardized mechanism for this is such a bad thing.