Pre-PEP: Standardizing test dependency and command specification

mcepl · November 21, 2024, 10:01pm

I am looking at this, but it would take some time to get through it. After the first run, I would say +1, but traditionally you were always much more optimistic about the automatic use of this upstream data, so I have less experience with some issues you are dealing with.

Thank you very much for writing this.

bwoodsend · November 21, 2024, 11:11pm

I’d expect OpenSUSE with its way of building for multiple Python versions in one go (and all installed at once into the same prefix) would need to do some pretty hairy conversions on the test command?

pytest → pytest-$python_version
python → python3.x (python on its own is not present)
arbitrary-entry-point --help → arbitrary-entry-point-$python_version --help

Depending on exactly what all the %pytest, %__python, … macros that Fedora uses are for, I’d be surprised if Fedora doesn’t hit a similar can’t just use the command as is issue?

That may be possible with argument list commands but sounds impossible for the arbitrary shell script case.

mcepl · November 21, 2024, 11:26pm

We are already doing that, and we have RPM/Lua macros for it, so we have: %pytest (run pytest for all versions of Python with appropriately set $PYTHONPATH for installed modules), %python_exec (run the following command with all interpreters), %pyunittest (similar to %pytest but with $python -munittest), %python_expand (run the following command in the environment where appropriate interpreter, directories are in right place).

Generally, we don’t like (and that’s probably a difference from Fedora) running whole tox machinery, but we usually rather look inside tox.ini and run just the command there (most often pytest these days).

See for example our SPEC file for setuptools.

barry · November 21, 2024, 11:47pm

I don’t have time to do a detailed review right now, but I’ll just say that I really like the way hatch lets you set up different environments, one of which could be a test environment. I do things a little differently in my own libraries partly for historical reasons, but this does make it really nice to just do hatch test (or in the former case, hatch run all to recursively run my tests, lints, and doc test builds.

My suggestion is that if we’re aiming to standardize this, we should look at other tools as well. I used to be a tox user, but now with hatch I don’t need it.

bwoodsend · November 22, 2024, 12:32am

But would this pre-pep proposal actually help you do that though? Would you be able to automate going from test-command = "arbitrary shell script" to the appropriate RPM/Lua macro or would this just be yet another place to manually look for a test command to approximately copy/paste?

brettcannon · November 22, 2024, 12:38am

To avoid quoting issues across OSs I would want this to be an array of strings.

This name isn’t the same as what’s used in the [project] table. I think it would be worth matching it and using optional-dependencies.

I would make it only for python3 as you are otherwise we’re standardizing ourself out of Python 4 and that’s not our call.

Yes; my immediate reaction was “ditch dependencies”.

You could require that POSIX-style slashes be used since Python will translate accordingly.

Paul Moore:

This particular recommendation seems problematic. Given that most projects use a test runner, I’d confidently expect many projects to simply do something like
[tests]
dependencies = ["nox"]
commands = [["nox", "-s", "tests"]]
I know I would.

Nox may also gain helper code to run the tests from the [test] table.

I think the way to view this is for simpler test command scenarios this works, and if it doesn’t than you may want to wrap Nox.

You could also add labels to distinguish between them. It may also allow for setting a single test as default and that’s what users like Fedora use and thus skip the e.g., coverage test.

mcepl · November 22, 2024, 8:47am

Absolutely. If I could have a macro %run_tests which would on its own add appropriate BuildRequires: and run appropriate commands, I would have it immediately.

hroncok · November 22, 2024, 10:20am

Based on context I assume you want one command to be an array of strings, correct?
How would you solve passing positional arguments?

This is a very good point. However, I am afraid that having optional-dependenciesin the tests table would sound like “those dependencies are optional for tests”.

I have a bit of trouble understanding this sentence, but my intention was not to “standardize self out of Python 4”. I merely assumed there will be no Python 4. But I guess this PEP does not need to make that assumption, it can say “the python and python{major} (e.g. python3) commands”.

On which level? E.g. will it translate slashes in subprocess.run?

hroncok · November 22, 2024, 10:54am

Suppose we do this:

the tests is an array of tables
- one of them is required to have the “default” label (or perhaps can have no labels at all?)
- others are not permitted to have the “default” label
- (if we allow the “no label means default”) others are required to have at least one label

The recommendations from my proposal apply to the default test table only.

Distributors (or users who want to run tests before installing the package) now have a single entry point. Developers can now run linters, coverage, and complex integration tests as well.

This also opens ~~a can of worms~~ possibilities about data inheritance (e.g. extending the list of dependencies from the default test table from another test table).

steve.dower · November 22, 2024, 11:12am

I assume this just meant to have PATH/environment configured when launching tests such that python and python3 both call through to the target interpreter, rather than whatever the system normally has configured.

All of my thoughts are on the standardised project development scripts thread, where I was very much taking this use case into account. In short, generalise it to arbitrary command name and packaged command executors/backends (like for build systems), then let distros simply specify which command(s) they’ll launch (e.g. they might announce they’ll try distrotest-fedora or else distrotest or else test, so that projects can provide as specific a test configuration as is needed).

A lot of the discussion here is too specific for an interoperability specification updating pyproject.toml, but would be entirely suitable for a design spec that implementers could follow.

henryiii · November 22, 2024, 3:05pm

Why is this limited to only tests? What about documentation, etc? IMO this is really just standardizing a generic runner. I think it would be better something like this:

[[tasks]]
name = ["tests", "coverage"]
dependency-groups = ["tests"]
commands = [
  ["python", "-m", "unittest", {posargs="extend"}],
]
[[tasks]]
name = ["coverage"]
commands = [
  ["coverage", "run", "-m", "unittest", {posargs="extend"}],
]

I’ve added several features that might or might not be good ideas, but the key ideas are multiple top-level names, extending existing environments (that’s optional, but I would have liked it with tox’s TOML config so including it here), commands are lists of lists (saw that above somewhere too, and tox uses this), posargs is not a special string, but a TOML dict, like dependency-group’s include-group and the way tox handles it).

There should be specific names for common tasks - that’s something missing from the dependency-groups proposal, but was actually in extras - the extra “test” is reserved for testing dependencies, and “doc” is reserved for documentation (though most people ignored this and called the documentation extra “docs”). You want to avoid test vs. tests.

I think this should be heavily informed from what tox has done in supporting TOML configuration (also pixi and hatch support task configuration in TOML).

If something was developed, I’d be happy to work on getting support for it in nox.

hroncok · November 22, 2024, 4:37pm

I have a problem I want to solve. I want a single entry point to run tests. So that’s what I proposed. (In no way I am saying that my proposal is better, I just try to explain why it is proposed the way I proposed it.)

Anyway, if folks would rather have a general “commands” specification I guess Idea: Introduce/standardize project development scripts is the way. And if we have that and I manage to standardize a script name for my use case, it might also be a solution to my problem.

pf_moore · November 22, 2024, 4:39pm

As I was the one who brought nox up, I’ll note that my point was the other way round - not that I want nox to be able to handle this type of task definition, but rather that if I have something in nox already, I’d probably leave it there rather than migrating it to this sort of task definition (even if I could still run it using nox either way).

I’m not against nox having support for this, it just isn’t related to the point I was making.

sirosen · November 22, 2024, 5:28pm

How would you feel about it if it covered both of these?

this is a thing which nox can use if you want to do it that way
this is a way of declaring how to install and invoke nox, if you want to do it the other way around

The outcome could be that

pip declares in this config how it uses nox
users may eventually invoke nox via some other tool (e.g., pycharm)

There are good and bad things about this. Good is that the test runner is more discoverable. Bad is that you might get spurious bug reports of “pip test config is wrong” due to external tool bugs or user confusion.

I want to acknowledge the downside risk of this hypothetical up front. I think it’s real but I don’t think it’s much worse than the current possibility of user reports that “nox config is broken” because they’ve done something wrong. Perhaps you disagree.

pf_moore · November 22, 2024, 5:46pm

Personally, I’d be OK with that but I simply wouldn’t bother using it. At best, if adding some boilerplate that said “run nox -s test” was sufficient, I’d add that if someone asked. But if I started getting complaints that it ran the linter, or it didn’t run the linter, or it ran tests against multiple versions of Python, or it didn’t…, then I’d likely simply delete the boilerplate again and say sorry, it’s too much bother.

I don’t have any problem with this as a feature, in any form that people agree on. I’m just a little bit concerned that it’s being presented as some sort of “best practice” before it’s even been accepted. I get that it makes life easier for distribution repackagers, and I have no wish to deliberately make things harder for them, but conversely if a key benefit of the feature requires that everyone uses it, then I don’t think we can assume that will be the case.

sirosen · November 22, 2024, 8:50pm

Likewise, which troubles me a little. If I’m not using this new config for anything, will I keep it current? Even if I am okay with it and I have some notion that it’s there for “IDE support for my coworkers”, I think I need to see some benefit as a maintainer.

What if pipx added pipx test as a command, which is simply wrapping nox/tox? I might use such a thing. It is definitely nice to run cargo test when I try to contribute to rust projects.

One thing I noticed is that the idea as stated doesn’t have a form of include-package = true or skip-install = true control.

I’m also not seeing other important knobs, like declaring which Python version should be used. So I’m not sure I can imagine using it without wrapping an existing environment manager or invoking it from an environment manager.

I think this might be a problem for usage from the IDE context, in which we definitely need to decide which test environments need the local package installed, and we need to pick which interpreter version to use.

stefanor · November 23, 2024, 9:49am

From Debian’s point of view, yes, this would probably be interesting.
If there was a single entry-point for executing tests, and a well-specified list of test dependencies, we could use that to automatically create/update our packages.

We currently have heuristics to determine which test-runner to use. A well-defined test execution mechanism would be more reliable than heuristics.

I’d have the same concern about this that was raised for OpenSuse:

We’d also want to run this on multiple python versions, when multiple versions are supported. We’d ideally be doing this outside a virtualenv, so we’d want to specify the correct python executable to run the tests. Tox’s built-in {envpython} is nice for this.

stefanor · November 23, 2024, 9:51am

Something I don’t see in the specified is whether the package needs to be “installed” for the tests. i.e. $PKG.dist-info available on sys.path and entry-points available on PATH.

For the distro package building use-case, it’s marginally easier if packages aren’t installed. But upstream developers probably expect them to be installed in a virtualenv, when running tests.

steve.dower · November 23, 2024, 11:48am

It’s a mix. I sure don’t expect my own package to be installed when I’m testing them, and the vast majority of non-tool-obsessed^[1] projects I encounter (mostly at work) don’t seem to either. Install the dependencies and run from the source tree.

The “editable install” workflow is popular among the people who post a lot here, but I really haven’t seen it that much in the outside world.

Not meaning to denigrate the projects that do try and automate everything. I just don’t have a better term for projects who really just want to get their job done, and don’t care to be on the most up to date tools/standards. e.g. most of these projects are figuring out how to migrate off distutils right now (or how to install setuptools so they don’t have to). ↩︎

hroncok · November 23, 2024, 2:40pm

See:

For distro packaging, we would point PYTHONPATH to the buildroot which contains the installed dist-info. That works.