Providing a way to specify how to run tests (and docs?)

CAM-Gerlach · April 18, 2022, 12:22am

I just want to emphasize, again, there are really two different layers in the stack that are intersecting here that need to be distinguished—the testing tool (Pytest, Unittest, etc) and a task runner (tox, nox, etc), that if a careful distinction is not made, may make this proposed functionality not very usable for either most upstream developers or most downstream distros and repackagers.

And it’s worth keeping in mind that the primary motivation here—as it says on the tin, to provide a standardized way to specify how to invoke the project’s test suite, particularly for downstreams that need to run the project’s tests in their own environment, not necessarily to standardize task runner configuration (which certainly has distinct value too).

This proposed syntax would seem to require the project to use a Python task runner (never mind one compatible with the new hooks it would need) and work on the basis of targets, which already excludes the unfortunate large majority of Python projects ^[1] This creates a much larger barrier to entry for projects to adopt it, particularly when many maintainers are already hesitant to bundle tests and expose the appropriate config given its something that benefits downstream packagers, users and tools more than their own development.

Moreover, it would seem such an approach would seem to require additional complexity to actually solve the motivating problems for downstreams:

There would need to be a hook to invoke a task without (at least Python) environment isolation, in the current environment, so that downstreams could actually test the project as packaged for their distribution.
There would need to be sufficient standardization of at least the core tasks that downstreams need (tests, docs, etc) so they can be consistently programmatically invoked.
Both downstream tooling and task runners would need to implement support for these interfaces, and project authors adopt compatible versions (or switch task runner).

All of these things latter are likely doable, but adds complexity to this approach, while only working for the relatively small fraction of upstream projects that use a Python task runner, rather than providing a generalizable solution.

This syntax is closer to what others were suggesting and would bypass many of the problems above, (though it is still framed in terms of tasks and target without any clear indication of what the target actually does, which may simply be a copy/paste oversight). However, it runs into the issue discussed above—there’s conflation between two levels of the stack, where either a testing tool or a task runner may be invoked here.

This has significant implications for what actually gets tested and where when it comes to downstreams, who want to test the installed package in their environment (and as others mention, it needs to be clearly defined that this should only run the project’s actual tests, not linting checks), so its unclear if this would be actually useful as-is without being more strictly defined.

That said, I’m far from the expert like you all and still giving it more thought myself about what specifically to propose; I like the flexibility, extensibility and DRYness of defining a more generalized task-based interface and configuration rather than specific ones for tests, docs, and whatever else is needed, but on the other hand, that dramatically increases the scope and scale of this effort well beyond the original motivation, while being less well-suited for or requiring additional complexity to actually fulfill such, and I fear it may result in the great being the enemy of the good.

Based on the disappointing fact that only 7% of Python developers in the Python Developers Survey 2020 used Tox for any of their projects, compared to around 50% usage for Pytest and 30% usage for unittest (for reference, it was a multiple-selection question; 63% of participants used some form of testing, and no other test/task runners were mentioned, with “other” at 1%, presumably including Nox) ↩︎