Pre-PEP: Standardizing test dependency and command specification

bwoodsend · November 23, 2024, 8:21pm

Sorry to beleaguer this point but surely, whether it’s a %run_tests macro or py2pack or something else, something still has to tokenize arbitrary shell script to find which instances of the strings python/pytest/some-other-entrypoint represent executables (to be replaced with versioned equivalents) and which should be treated literally. Who fancies writing a regex that can handle these (admittedly contrived) examples?

python -m pytest -k 'not python2'
xvfb-run pytest
pytest python/tests
DATETIME_IMPLEMENTATION=python-datetime pytest
my-cli-entrypoint --self-test

You could probably use a heuristic of replacing python only if it matches (\s|^)python(\s|$) and get most of the way there but other entrypoints would still be impossible.

What I think need is some light templating or special syntax to highlight the per-python-environment commands. For example (exact choice of templating syntax TBD):

[tests]
...
commands = [
    "%pytest%",
    "xvfb-run %python% -m unittest",
    "%my-entrypoint% --sanity-check",
]

hroncok · November 24, 2024, 1:26pm

Now that’s a problem that isn’t relevant to anything other than distros

Anyway, it only works directly with the commands. If anything uses subprocess, a shell script etc. You are back at the original problem.

stefanor · November 24, 2024, 3:59pm

I don’t think so at all. It’s relevant to the tox-like workflows where you’re testing with a specific Python.

e.g. if you have to bisect cpython to find what introduced a regression, it’s very useful to be able to easily run a package’s test suite with a particular python executable.

pitrou · November 24, 2024, 9:41pm

That’s surprising, because you need some sort of build+install step (editable or not) as soon as your package is not pure Python.

mcepl · November 24, 2024, 11:45pm

python -m pytest -k 'not python2'
That perfectly works for us %pytest -k 'not python2'
pytest python/tests
Again, no problem %pytest python/tests
DATETIME_IMPLEMENTATION=python-datetime pytest
Still easy:
```
export DATETIME_IMPLEMENTATION=python-datetime
%pytest
```
%check section of RPM Spec runs in a subshell, so I don’t care that much what variables are exported to the environment there.

So, what remains is:

xvfb-run pytest
my-cli-entrypoint --self-test

From 3479 python-* SPEC files in the project openSUSE:Factory, there are 2664 packages with %check section in them and 2074 of them match RE ^%py(test|unittest). That’s pretty high proportion for me (and some packages just plainly were not cleaned up yet). So, yes, it doesn’t work for all our needs, but it makes our situation significantly easier to manage.

bwoodsend · November 25, 2024, 12:31am

But what I’m really asking is how are you going automate that translation? Sure, you yourself can manually map any command to the the appropriate spec syntax but doesn’t that defeat the purpose of a standardised machine readable location for a test command (instead of say dumping it in a README)?

steve.dower · November 25, 2024, 2:01pm

Build-in-place works just fine, if it’s a Python project, or already being referenced by a search path (e.g. an embedded project). More complex projects likely aren’t using a venv, but are producing a complete layout for the app (e.g. building a Docker container), or are using such a mix of build systems that it’s easier to copy outputs around manually.

I’m not saying these approaches are better or worse than editable installs, just that they are what I see being used by people outside of our bubble here.

brettcannon · November 25, 2024, 10:55pm

Yes to a one command being an array of strings.

And I don’t understand what you mean by “solve passing positional arguments”? subprocess.run() takes a list of strings so I don’t see why this is an issue here.

But you could also come from the other angle and say, “there’s nothing here named “extra” with this thing I’m referencing. What’s it talking about?”

I would just stop at python and let it act as a stand-in for what Python interpreter will be executed.

You can do it above in whatever is act as the runner for this if the path is its own string in your command array if it’s necessary. Otherwise I expect most Python test runners already support this situation.

I don’t think the proposal prevents that if it sticks to just a python placeholder in the command.

mcepl · December 2, 2024, 7:45am

I am not sure whether I understand your question. If you are asking whether I would prefer some universal successor of python setup.py test, then the answer is yes, that would be lovely. However, what I was trying to convey in my previous message was that it isn’t that big deal even for us, because most ordinary situations are covered and what remains are mostly corner cases (like testing against xvfb and similar weirdness). And yes, I would prefer people not writing their own test runners (e.g., that my-cli-entrypoint --self-test thingy).

What am I missing?

bwoodsend · December 2, 2024, 11:12am

I don’t know how I can phrase my question differently. Let’s just drop it.

ncoghlan · December 8, 2024, 1:38pm

With some of the conflicting requirements being raised here, this is starting to feel a bit like a frontend/backend problem akin to the PEP 517 build system one, only for task management systems like tox/nox/hatch/etc, rather than for building release artifacts.

Suppose the pyproject.toml addition looked like this:

[test-system]
requires = ["tox"]
test-backend = "tox.backend:tests"

Command execution hooks:

run_tests: run the test suite in an externally prepared environment
run_static_checks: run static checks (linting, typechecking, etc) in an externally prepared environment

Optional environment preparation hooks:

get_requires_for_run_tests: report the requirements that need to be available in the run_tests prepared environment
prepare_files_for_run_tests: export the test data files from the source tree that are needed when executing the tests
prepare_env_for_run_tests: set up the environment for run_tests the way to backend normally would
get_requires_for_run_static_checks: report the requirements that need to be available in the run_static_checks prepared environment
prepare_env_for_run_static_checks: set up the environment for run_tests the way to backend normally would

(Hook naming scheme intentionally based on the one used in PEP 517)

tox and nox should probably stay as pure backends for this approach, but could optionally decide to support acting as frontends as well. Project management tools like hatch, pdm, etc might decide to offer test subcommands, with --static-checks-only and --no-static-checks options to decide on the scope of the tests to execute. Distro build systems would be frontends (mapping requirements to system packages rather than upstream ones), as would IDEs.

The advantage I see to the narrower proposal focused on just testing is that it means any backend API definition can be more semantically meaningful (just like PEP 517), rather than being so absurdly generic as to be meaningless (which is a problem affecting my corresponding task-runner suggestion in the task management thread).

If we went down this path, future expansion out to also define a generic docs building interface would be via a docs-backend table, again for the semantic benefits that brings to the backend API design process (such as allowing the environment setup to be handled by the frontend, as distros would prefer, even if the normal build process uses a separate automatically created virtual environment).

ncoghlan · December 13, 2024, 5:44pm

An additional complication that occurred to me for the test execution use case:

any consumer which wants to display test result breakdowns won’t want to just execute the tests, it will want a way to request exporting of the results in a structured format (such as Junit-compatible XML, or a Test Anything Prototocol report). This may be for a HTML report, or for a local IDE windows (like the one for the native C# testing framework in Visual Studio).

A similar structured data export problem arises for test coverage data, where the frontend would want a way to request a coverage report in a well-known format (such as Cobertura XML or LCOV).

The C#/Visual Studio case feels like both an example of the developer UX that can be achieved when the IDE understands exactly how tests are executed and their results are reported, as well as the difficulty in achieving those UX benefits in a way that is independent of the specific choice of testing framework.

steve.dower · December 16, 2024, 11:45am

Without standardising this universally, what IDEs would likely do is set every environment variable setting they know about (to choose format, output files, etc.) and if they don’t find anything there at the end, they’ll display regular console output to the user. Probably with a link to their own documentation on what needs to be configured for integration to work. Environment variables are perfect for this, because tools will ignore those that don’t apply, and IDEs still control the launch environment.

That said, there’s a movement towards only showing console output to users anyway, largely because there’s no realistic way to integrate everything. This is disappointing for users, but unless the tool makers find some alignment, it’s the only feasible approach.

Most IDE teams have the same or fewer developers dedicated to tests as each test frameworks has contributors, so despite the (common, not universal) big company backing, in practical terms there’s less effort there to help.

brettcannon · February 12, 2025, 9:33pm

As another example of a project that would benefit from having a standardized way to know how to run tests, SWE-bench/swebench/harness/constants/python.py at main · swe-bench/SWE-bench · GitHub is how SWEBench does it, which is hard-coding the commands to run for each project they benchmark against.

ofek · February 14, 2025, 3:42am

That’s a quite poor example IMO because as you can see from the commands, the execution context depends on a specific environment. If you scroll down that page you can see a bunch of associated metadata without which the commands themselves are quite useless. Unless I’m significantly misunderstanding the project, they would have precisely zero use for such a standardized mechanism.

steve.dower · February 14, 2025, 11:38am

Even so, someone had to spend time figuring out the first 15 lines of that file. The number of lines does not reflect the amount of effort it takes to do that.

There are other proposals that would help with projects not specifying their dependencies reliably enough. We don’t have to solve that problem here to get value from a consistent “run my standard set of tests” command.

ofek · February 14, 2025, 2:52pm

I understand the rationale for standardization (even if I disagree with certain aspects) although what I’m saying is that the project would not use the field because it wouldn’t make sense to read partial data from another file when all of the data is already defined and third parties would be unable to make use of the commands without knowledge of the environments they define.

brettcannon · June 17, 2025, 4:31pm

I was talking with @jefftriplett and he mentioned that having something like this would benefit Django in the same way it could benefit CPython: running the test suites of a collection of projects against the latest version/build/whatever to see if unexpected breakage has occurred, how bad a change might break things, etc.