Proposal for tests entry point in pyproject.toml


I would like to discuss a proposal for a test backend in pyproject.toml. What I wrote here is how I see it from a Nixpkgs distribution point of view.


A proposal for defining the entry point and the test dependencies in pyproject.toml.


In the past it was common that projects would use the same invocation, python test, to run tests. This is however no longer the case, with many projects using different test runners or custom entry points.

For many end-users this is not relevant, because they would only use packages. Distributions, on the other hand, typically also run (parts of) the test suites of packages to test whether their build and packaging was correct.

A well-specified method for describing how to run the test suite would ease the process for distributions, and allow for further automation.


As extension to build-system in the pyproject.toml file I propose we also add a test-system section for running tests using the current interpreter by calling what is in the field test-backend, in an environment that has the package for which the pyproject.toml exists, along with additional test dependencies (the field requires).

An example:

requires = [ pytest mock ]  # PEP 508 specifications.
test-backend = "pytest"

The test-backend should only run tests in the environment (interpreter version, operating system) that is provided by the front-end.


No specification for a test-system

Not standardizing would make automation difficult and lead to further wild growth of entry points.

Use Tox

Some may argue that Tox handles this, so why bother? Tox takes charge of the environment, and that is something distributions do not like because they provide the environment for testing and eventually for the end-user. Using Tox for a single environment is not a problem, however.

Note this was brought up on the setuptools issue tracker:

1 Like

Maintainer of the tox here (note should be written always with lowercase t - just as pytest). tox by default takes full charge of the environment management via virtualenv, however note this doesn’t have to be the case. the tool is flexible enough to allow any environment management to be plugged in. For example this could be a simple shim instructing the OS to prepare an environment with following packages and then run this as a test. I’m currently in process of reorganizing the internals to make this even easier. Once done, we try out in practice, and if works good we can standardize it. Your current proposal misses tunes of nuances though to make it workable (like environment variables, working directory, test dependencies, test commands, parameterization of the commands, etc).

Yes, I should clarify that in the post. I intend to update the post along the way.

Indeed. After I wrote it I realized e.g. that the current working directory should be usable in case of a script and runtests test-backend. Thank you for your feedback.

Hi! Fedora packager here.
We had chat about exactly this issue at EuroPython, and since then we’ve been working on a plugin that makes tox use the environment it is running under:
Please read the README there as my reply :‍)

I’m looking forward to standardizing something like [test-system], but it seems that doing it right now would be premature. PEP 517 [build-system] is still provisional, after all.

Could you move it to a GitHub Gist or something equivalent? That way it’ll be easier to have context as we go and we can also clearly see what the changes made are (vs the Discourse UI).

That’s not how TOML works. :upside_down_face:

I think it’ll be useful to figure out “what’s needed” for a test runner and then trying to design an interface with that understanding – instead of the other way around.

I’m totally on board.
My use case is an internal cicd service.

pyproject.toml seems like the sensible approach to be able to handle multiple different build tools, but i’m yet to test that.

Additionally I’m running tests, building documentation, running mypy, checking code coverage, and ratcheting code coverage up. I can imagine i’ll be doing some timing stuff in the future as well.

I will look at configuring this via pyproject.toml and report back at some stage.

Maybe pretending pypi is actually a cicd server for all python packages is a long term goal?


Some of us did chat a bit about this at the core dev sprint in London this year. I do think that testing in general may be a bit too complex to fully specify, considering the various ways test matrices need to be configured and the various escape hatches people tend to use. Part of the reason the tox.ini file is so complicated is that it is not super tightly scoped to testing, but it’s also not even as complicated as it could be if you included a lot of the tooling that ends up going into CI configuration. I think we will probably want to start working to define the various roles in a test system and specifying what their responsibilities should be, then build a simple API around that - the same way the build-system section has divided builds into “front-end” and “back-end”.

I think it makes some sense to think of things in similar terms - a “front-end” that constructs the system for you and provides test environments and a “back-end” that runs the tests in those environments. One rough sketch of how a PEP 517-like API for tests could look would be something like:

  1. Front end creates an isolated build environment with the requirements specified in test-system.requires.
  2. Front end passes some metadata about the system (platform, possibly type of test being run, etc) to test-system.backend and gets back a dictionary of isolated test environments that it should set up.
  3. Front end sets up each test environment and invokes test-system.backend in it with the specified key.

In this conception the current role of tox would be split into two parts, since it is the virtualenv manager and the test runner in one. An example might be something like this:


requires = ["tox >= 4.0"]
backend = "tox.test_meta"

tox.ini (could also go in pyproject.toml, but that’s irrelevant - it’s up to tox to determine where to store config)

envlist = py27, py37

deps = pytest >= 5.0
commands = pytest

If you then invoke some fictional front-end nottox in this directory nottox would invoke tox.test_meta.get_environments(*args, **kwargs), and tox would return a dictionary containing something like:

    "py27": {
        "python_version": "~=2.7",
        "dependencies": [
            "pytest >= 5.0",
    "py37": {
        "python_version": "~=3.7",
        "dependencies": [
            "pytest >= 5.0",

At which point nottox would allocate test environments for py27 and py37 and pass them back to tox in some way.

Open questions

I have at least two open questions with this approach:

  1. How nottox can pass an environment to tox without requiring that tox be invoked from within that environment. I am envisioning something where the “environment” could be anything from a virtualenv to a chroot or a docker container, but the backend should only need to be installed in the test-system environment not in each test environment.

  2. Whose responsibility should it be to decide which environments are run? Front-end? Back-end? Both? The front-end can filter environments by name (so you can do nottox -e py37, for example, and that’s the front-end’s responsibility), but we probably also want some way for the project to specify that certain environments are targeted for certain platforms or other conditions.

    For this one, I’m leaning towards the backend doing per-environment platform specification that is passed back to the front-end.

1 Like

Frontend calls backend and asks tell me what you can run, backend returns list with requirements, frontend builds the environments he wants to actually run, and then invokes only those, leaving the frontend to actually invoke the tester, and return a status code.

Before coming up with API proposals, maybe the first step is to figure out goals and scope? The OP specifically doesn’t want to support tox-style testing…

The scope would be to make it possible for people needing to repackage something from source to validate the correctness of the operation in a standard way/interface. Think e.g. Fedora, RHEL etc.