Pre-PEP: Standardizing test dependency and command specification

hroncok · November 21, 2024, 3:25pm

Pre-PEP: Standardizing test dependency and command specification

Hello Pythonistas. I’d like to start a discussion on an idea that has been brewing in my brain for a while now. Let’s call it a pre-PEP, but nothing here is set in stone. I merely wish to present an idea and see if it sticks. If it does I am willing to work on this and make this into a real PEP.

Abstract

This (not yet a) PEP specifies a mechanism for storing test requirements and commands in pyproject.toml files such that test runners can retrieve them and execute tests. It could look like this:

[tests]
extras = ["extra1", "extra2"]
dependency_groups = ["group1", "group2"]
dependencies = ["pytest>5", "tomli;python_version<'3.11'"]
environment = { HOSTNAME = "localhost" }
commands = [["pytest", "-v"]]  # or perhaps ["pytest -v"]

Motivation

In Fedora, we like to utilize Python packaging standards when we build Python packages.

We can get information about the build and runtime dependencies, we can get information about license files, and license expressions. One thing that remains “muddy” is testing. When we build Python packages as RPMs, we ask our packagers to:

figure out how upstream specifies test dependencies
figure out how upstream runs the tests

We have several tools available in our toolbox, so e.g. if upstream uses tox, the packagers can opt-in to use tox to figure this out. If upstream uses extras or dependency groups or requirements.txt files for test dependencies, the packagers need to figure that out and reuse it. I would love for our packagers to have a simple way to get test dependencies and test commands. That way, each Fedora’s Python RPM package would not need to be different in this regard.

Beyond Fedora or other distributions, we believe that using this standard would allow casual contributors to easily discover how tests are executed. E.g. a developer could clone a repository and run a well-known tool (e.g. tox or a potential simpler test runner) without examining the project structure and/or CI configuration. It would also allow project maintainers to have a canonical source of test dependencies and commands that could be shared between their local development tools and CI.

Perhaps this could be seen as a replacement for test_suite and tests_require, which have been left out in the PEP 517 transition.

Rationale

When we brought this up in the past in various discussions, we’ve been told: “use tox” – we have tried, but unfortunately not all upstreams use tox, and asking them to use one specific tool to run their tests might not be as well received as asking them o follow a standard.

This is why I’d like to propose a standard that existing test runners could follow. For example, tox or cibuildwheel could gain support for this in addition to the existing support for their native configuration:

Example of relevant existing tox configuration

[tool.tox.env_run_base]
extras = ["extra1", "extra2"]
dependency_groups = ["group1", "group2"]
deps = ["pytest>5", "tomli;python_version<'3.11'"]
set_env = {HOSTNAME = "localhost"}
commands = [["pytest", "-v"]]

Example of relevant existing cibuildwheel configuration

[tool.cibuildwheel]
test-extras = ["extra1", "extra2"]
test-groups = ["group1", "group2"]
test-requires = ["pytest>5", "tomli;python_version<'3.11'"]
environment = {HOSTNAME = "localhost"}
test-command = "pytest -v"

Specification

This pre-PEP defines a new section (table) in pyproject.toml files named tests. The tests table contains keys specified here and MUST contain at least the commands key. It is heavily inspired by the tool.tox.env_run_base table from tox.

`tests` table keys

`commands` key (mandatory)

The commands key contains a list of commands. A test runner will execute one by one in a sequential fashion until one of them fails (their exit code is non-zero) or all of them succeed. The outcome of the tests is considered successful only if all commands succeeded.

NOTE: One command could either be specified as a list of strings to be passed to subprocess.run() (as tox does it) or it could be specified as a string to be passed to subprocess.run(..., shell=True) or shlex.split() (e.g. as cibuildwheel test-command). I see benefit of both approaches, perhaps we can support both?

`extras` key (optional)

A list of names of “extras” from the package to be installed. For example, extras = ["testing"] is equivalent to pip install .[testing]. This key is only allowed when the pyproject.toml also specifies a Python package, which is determined by the presence of either the project or build-system table in the same pyproject.toml. A test runner will ensure the given extras are installed in the test environment before the commands are executed.

`dependency_groups` key (optional)

A list of names of dependency groups (as defined by PEP 735). A test runner will ensure the given dependency groups are installed in the test environment before the commands are executed.

`dependencies` key (optional)

A list of names of the Python dependencies. Each value must be one of:

a Python dependency as specified by PEP 440,
a requirement file when the value starts with -r (followed by a file path),
a constraint file when the value starts with -c (followed by a file path).

A test runner will ensure the given dependencies are installed in the test environment before the commands are executed.

`environment` key (optional)

A dictionary of environment variables to be set by a test runner when executing the commands.

Specifications for the test runners

The kind of environment used to execute the tests.commands is up to the test runner. It MAY be a fresh or re-used virtual environment, a container, a current Python environment where the test runner is installed, etc.

A test runner MUST ensure that the defined Python dependencies (via tests.extras, tests.dependency_groups, and tests.dependencies) are installed before it executes tests.commands.
If the pyproject.toml file also has a project or build-system table, the test runner MUST also ensure the very same package is installed (either via a wheel or editable install); if it doesn’t and the tests.extras key is present, the test runner MUST error.
If the defined Python dependencies and/or the tested package cannot be installed, the test runner MUST error.

A test runner MUST set all environment variables from the tests.environment dictionary. The test runner may preserve or clean the existing environment as it deems appropriate (e.g. it MAY allow users to configure this behavior).

A tets runner MUST must execute tests.commands one by one in a sequential fashion until one of them fails (their exit code is non-zero, in that case the test runner MUST also fail with a non-zero exit code) or all of them succeed (in that case the test runner MUST suceed with a zero exit code).

A test runner MUST ensure that executing python or python3 in tests.commands works and executes the same Python for which it installed the dependencies. It MUST also ensure that scripts from the specified test dependencies have preference on PATH.

A test runner MUST execute tests.commands from the Project Root Directory of an unpackaged Project Source Tree. That is, the directory with the pyproject.toml containing the executed.

A test runner MAY support running tests specified in a Distribution Archive but in that case, it MUST extract the archive before executing tests.comamnds.

Recommendation for projects using this

The definition of “tests” is intentionally left out of this PEP. However, it is assumed by the PEP authors that the specified tests.commands will run unit tests (or similar) of the project. As such, the recommendations are:

Do not include tests that require complex setup.
Do not include code linters or type checking tests or test coverage – include tests that ensure the software functions.
Do not include tests.commands that will have unexpected side effects.
Do not include tests.commands that would (un)install Python packages.
Do not include tests.commands that are platform specific.
Do not run test runners (e.g. tox) from tests.commands.
Do not assume a specific test runner is used.

Package Building

Build backends MUST NOT include the data from tests table in built distributions as package metadata. This means that PKG-INFO in sdists and METADATA in wheels do not include any referencable fields containing the tests dependencies or commands.

Out-of-scope ideas

Multiple test envs and different Python versions etc. – I don’t want to standardize that. Tools like tox can still use the information from the tests table for the “default” testenv.

What needs to be solved

Passing positional arguments – I find the way tox does it in the TOML configuration a bit clumsy. But if we go with the string commands, we can have {posargs}:

[tests]
commands = ["python -m unittest {posargs}"]

Default dependency group – if a dependency group called tests exists, it might be used as a default. That would make the simples use case simpler, but it might be too magical:

[dependency-groups]
tests = ["pytest>=8"]

[tests]
commands = ["pytest"]

Unspecified keys in the tests table – Should we explicitly forbid those or allow the test runners to read them? E.g. tox could read other keys and use them, but there is a risk if a future standard adds them with different meaning:

[tests]
pass_env = ["FOO"]  # only used by tox, but not specified in the standard
...

…and if the PEP forbids this, would it require test runners to ingore additional keys, or error if such keys are found?

Only allow dependency groups? – Perhaps this new standard does not need to cover all possibilities of specifying dependencies. But if it does, the simple use case is simple:

[tests]
dependecnies = ["pytest>=8"]
commands = ["pytest"]

…hence, not sure.

Platform specifics – How do we deal with path separators, etc.?

jamestwebber · November 21, 2024, 3:36pm

Something similar to this was discussed quite recently: Idea: Introduce/standardize project development scripts .

That discussion is very relevant to this proposal–it’s hard to define the limits of this in a way that is both useful and general. It sounds like this is maybe a subset of that more general proposal, but that doesn’t necessarily make it simpler.

hroncok · November 21, 2024, 3:37pm

I plan to check that out as well. Perhaps it overlaps a bit. My understanding from some of the examples in that discussion was that people would use that to run tox while I want to standardize what tox should run. Does that make sense?

hroncok · November 21, 2024, 3:43pm

This might be interesting for other distributors, such as Linux distros (@mgorny, @mcepl, @stefanor) or conda folks (not sure who to tag, perhaps @h-vetinari).

matterhorn103 · November 21, 2024, 4:30pm

Why not? Wouldn’t it be a simple case of making it [[tests]] rather than [tests]?

hroncok · November 21, 2024, 4:31pm

It’s not something I personally have motivation for. I desire a single point of entry that can be used to “run tests”.

pf_moore · November 21, 2024, 4:33pm

This particular recommendation seems problematic. Given that most projects use a test runner, I’d confidently expect many projects to simply do something like

[tests]
dependencies = ["nox"]
commands = [["nox", "-s", "tests"]]

I know I would.

Expecting projects to migrate their existing testing mechanisms to this new approach (which will clearly be less featureful than any actual test runner) seems like it would be a bit much.

hroncok · November 21, 2024, 4:35pm

Because nox manages test environments and installs stuff into them. As such, we would be unable to use this e.g. in an offline environment like we do in Fedora (where the “fetch dependencies” and “run tests” are separate steps).

EDIT: Your example is an equivalent of using e.g. tox to run nox.

matterhorn103 · November 21, 2024, 4:36pm

Sure but it’s very easy to make it an array of tables rather than a single table, and you’re not personally obliged to use more than one.

hroncok · November 21, 2024, 4:37pm

If it’s an array of tables, how do we expect to know which one to run?

hroncok · November 21, 2024, 4:50pm

About nox: nox is specific in a way that it has imperative rather than declarative configuration. However, if nox would to support this standard anyway, it would allow nox users to run tests specified in that standard way.

I.e. you could just clone a repo and run nox. And even if the repo you cloned does not use nox, your local nox would still be able to run some tests, assuming they are specified this way. Does that make sense?

In the ideal (albeit unlikely) scenario where everybody follows this standard, you could execute tests of an arbitrary project in a way that you prefer (without examining the project first to figure out what is their preferred way of running tests), in the Python environment you care about.

matterhorn103 · November 21, 2024, 4:56pm

Well I suppose communication. I can think of many options, maybe Fedora would have a specific policy. You could assume that the first test in the array is the default, or you could make it clear you’ll run only a test named “default”, or you could request that upstream provide a test called “fedora”, or you could ask upstream which to run (which is still easier than the status quo right?) or maybe you’d figure that if upstream runs multiple it must be for a good reason and just run them all, I don’t know.

You just made it sound like multiple test environments is something that some tools currently support and some packages use and some users want, and then it seemed like a natural thing to include, that’s all.

pf_moore · November 21, 2024, 5:02pm

Pip’s tests have a fairly complex noxfile to ensure that (for example) we don’t use the possibly-broken development version of pip to install dependencies. I wouldn’t want to trust that this mechanism would handle that sort of situation.

But - the fact that some projects wouldn’t use this mechanism isn’t in itself a reason that the feature shouldn’t exist.

I’m personally largely indifferent to the feature - I like nox, so I’d likely just ignore this new approach.

hroncok · November 21, 2024, 5:05pm

Perhaps I could show this as an example.

Imagine an upstream project that uses tox. They could have this configuration:

...

[tests]
dependency_groups = ["tests"]
commands = [["pytest", "-v"]]

[tool.tox]
requires = ["tox>=4.21"]
env_list = ["3.13", "3.12", "type"]

[tool.tox.env.type]
dependency_groups = ["typing"]
commands = [["mypy", "."]]

They use tox and can benefit from their multiple environments.

Somebody else enters the projects and runs nox --python 3.14. Despite the project authors do not use nox, nox would still be able to know it needs to install a dependency group and run pytest in a Python 3.14 venv.

Or perhaps it could be a simple pyproject-run-tests command rather than nox. It wouldn’t matter.

I do not wish to design a standard of interpolable multi-env multi-purpose runner. I simply wish or a standard that communicates: install this and run this to run the tests. I am not saying that the multi-thing standard is a bad idea, but I do not have a use case for that myself.

For the pip’s case, perhaps pip could have:

[tests]
dependencies = ["-r tests/requirements.txt"]  # or better a dependency group
environment = {LC_CTYPE = "en_US.UTF-8"}
commands = ["pytest {posargs:-n auto}"]

And keep the noxfile configuration as is, yet replace:

run_with_protected_pip(session, "install", "-r", REQUIREMENTS["tests"])
...
session.run("pytest", ...)

With something like:

run_with_protected_pip(session, "install", *pyproject_tests.dependencies())
session.run(pyproject_tests.commands(), env=pyproject_tests.environment())

That way, you could still benefit from the features of nox and we could run the tests in a standardized way when we build pip for Fedora. (I understand there is no added value for pip here – you don’t get anything you didn’t already have.)

bwoodsend · November 21, 2024, 6:45pm

Hmm, I feel like these two contradict each other. For many maintainers, the canonical test requirements+command they want new contributors to use are the ones with all the extra linting/formatting/coverage/report generting hoo-hahs rather than the strictly functional setups that downstream packagers prefer (and often require). I don’t know how to fix this though. Either we say the new configuration is specifically for downstream packagers or we try and shepherd people into providing two configurations – one that they use and one that downstream packagers should use. Neither sound to me like they’d actually be followed.

fungi · November 21, 2024, 6:52pm

Somebody else enters the projects and runs nox --python 3.14. Despite the project authors do not use nox, nox would still be able to know it needs to install a dependency group and run pytest in a Python 3.14 venv.

With my upstream software maintainer hat on, I think this situation would just result in me adding a lot more contributor documentation saying that if you don’t use the recommended test running tool then your problems are your own and you should not report them upstream because I don’t personally have time to support people who either don’t read the project’s contributor documentation or, worse, choose to ignore it.

Or perhaps it could be a simple pyproject-run-tests command rather than nox. It wouldn’t matter.

This sounds like an argument for reintroducing a setup.py test equivalent. Perhaps it’s a good opportunity to revisit why that was removed in the first place.

sirosen · November 21, 2024, 7:37pm

I’m very curious, why would you want to include this when you already have dependency groups. And why should it support requirements files (which, remember, aren’t standardized) and constraints files?

Sure, I think this is a fine goal. But I don’t think all projects have or want a single test invocation, which is a big part of why tox and nox are popular. So you need to account for this and make sure you are meeting the needs of package maintainers. Otherwise, I can’t imagine anyone using this unless someone asks them – and even if they’re nice and friendly maintainers (like I try to be), this will probably go over poorly. I’ve been burned enough times by drive-by contributions to ask for motivation/justification before making project infrastructure changes, and to generally be reticent to make such changes without a very good grasp of what I’m signing up to do.

Let me take a realistic package example.^[1] I have a tool which is some kind of data validator. It supports data formats as follows:

it supports JSON always
it prefers orjson when orjson is installed
it supports JSON5 if json5 or pyjson5 is installed
it supports TOML on 3.11+ or if tomli is installed
it supports YAML if pyyaml or ruamel-yaml is installed

Each of these format support scenarios is described by a tox factor. Furthermore, tests measure coverage with coverage[toml]. In the special case of testing py3.10 without tomli, it is necessary to disable the installation and use of coverage in order to accurately test that the missing dependency is handled.

So the test invocations are dual (coverage run -m pytest and pytest), and the “test requirements” are in the form of a matrix.

Why shouldn’t such a project, if it chooses to use this mechanism, simply declare tox>4 in it’s requirements and tox run as its invocation? Wouldn’t that be better for any non-repackaging usage, since it actually runs all of the tests?

I’m not against the basic idea – I think the desirability is there, given that repackaging is a thing – but I don’t think this proposal as it stands solves the basic impedance mismatch between the library maintainers’ needs and the repackagers’ needs.

One thing which I think is a prerequisite for me to be an active advocate for this idea is for it to fit in with IDE workflows, as the way that vscode, pycharm, and related tools invoke tests. IDEs have a very similar desire to be told “how do I run the tests” and I think a solution which works only for repackaging or only for IDEs is missing the bigger picture.

this is based on check-jsonschema, some minor details altered ↩︎

hroncok · November 21, 2024, 7:42pm

This was inspired (read: copied) from tox configuration. It is not a necessity. Only supporting dependency groups is also an option (included at the very end).

(I read the rest of your comment as well, thanks for the input.)

In a way, this could be a replacement for setup.py test for the post-setup.py era.

h-vetinari · November 21, 2024, 9:33pm

It could run all of them, one after the other?

h-vetinari · November 21, 2024, 9:51pm

Thanks for the ping. The thrust of the effort is very much applicable to what we’re doing in conda-forge. Currently we need to do much of the same work (analyse test dependencies & execution, encode that in our own metadata, etc.), and it would be great overall if that were reflected in pyproject.toml.

Although I haven’t gone over things with a fine-toothed comb, one thing that seems worth considering from my POV is that many projects have some in-repo test data that needs to be present to run tests. If running tests in-tree this is obviously not an issue, but for testing the final package as installed (where the tests and supporting data aren’t present), it would be good if the package author had a way to make this requirement explicit.

Pre-PEP: Standardizing test dependency and command specification

Pre-PEP: Standardizing test dependency and command specification

Abstract

Motivation

Rationale

Example of relevant existing tox configuration

Example of relevant existing cibuildwheel configuration

Specification

tests table keys

commands key (mandatory)

extras key (optional)

dependency_groups key (optional)

dependencies key (optional)

environment key (optional)

Specifications for the test runners

Recommendation for projects using this

Package Building

Out-of-scope ideas

What needs to be solved

`tests` table keys

`commands` key (mandatory)

`extras` key (optional)

`dependency_groups` key (optional)

`dependencies` key (optional)

`environment` key (optional)