Providing a way to specify how to run tests (and docs?)

That will depend on what do you mean by being present or canonical… Let’s consider the following:

  1. If you decide to specify test requirements as the test extra, whoever is interested in running your test suite can create a virtual environment and install yourpkg[test]. This is accessible independently of backend or configuration file. It is not bespoke to a tool. Note here the key point is that the interface for consuming extras is stable, backend independent and standardised.

  2. If a tool just want to “inspect” which dependencies are included in the test extra, still nowadays the only reliable standardised way that will cover all cases and all backends is to read the core metadata. That was possible before pyproject.toml came into the picture.

I don’t think the decision of using extras for test dependencies heavily depends on the configuration format… As Paul previously mentioned, some developers simply don’t like the idea in the conceptual level (and that is fair).

People that like the idea of using extras, could have been doing it since before pyproject.toml (I definitely was).

Repackagers and other downstreams (the main case we’re discussing here) aren’t going to want to (and often, their policies would not allow) creating a separate virtual environment and using pip to install packages from the internet into it, just so they can introspect it to determine what gets installed and figure out how to replicate it in their own environment. Of course, they can build the package at least to a sdist and then extract the extra’s dependencies from the core metadata in the PKG-INFO, but that’s option 2.

Previously, yes, the only reliable standardized way to get an extras deps is to actually build the package and inspect it. But thanks in part to your work implementing PEP 621 support for Setuptools, tools now have a second way, which is reading it from project.optional-dependencies in pyproject.toml, which is much easier and cheaper to do, as they can be read statically from a file rather than having to build the whole project, unpack it and then parse them out of the appropriate RFC 822 headers.

Sure, fair, but I believe @brettcannon 's point was that it is now much more accessible and practical for downstream consumers, thanks in no small part to your hard work. :smile:

Thanks @CAM-Gerlach, all restrictions you mentioned exist and shape the workflow of repackages/other downstream. However, I don’t think they affect anything I was discussing before (specifically, the fact that the interface for consuming extras exists, is stable and backend independent).

If repackagers/other downstream want to use extra dependencies to run the tests, at that point in time they already need the package to be built (and thus have access to core metadata in (2)). Moreover,
doesn’t it also mean that they will have to obtain these test dependencies somehow and install them anyway? It doesn’t matter if the dependencies come from PyPI or if they use their own installer/repositories instead of pip.

Going back to your original point, tools cannot rely on the project.optional-dependencies being present… Even if the chosen backend does support PEP 621, there is always the possibility it is specified as dynamic. The only truly universal way of inspecting the extras is via core metadata.

2 Likes

Other stuff that you most likely also support for a minimal viable product before running the test:

  • Discover targets defined (and select the ones to run)
  • Altering the current working directory
  • Setting environment variables
  • Passing through environment variables (which you should always pass and which you should remove)

And then the not must have, but probably nice to have concepts for a more robust/powerful usage:

  • per target temporary folder
  • setup/teardown commands
  • environment reuse between runs

Most django projects do, because the django version is included in the target name. Similarly, many people like having with coverage and without coverage variants (-cov suffix often). I’ve seen a few projects that also separate into unit and integration tests in separate targets so you have a quick test env and a slower but more robust one.

There’s a plan to add that. The interface haven’t been groomed and implemented just yet though but might be a reality next year. (PS. also tox is always all lowercase).

This likely is the easiest path ahead. It has to likely tell just the dependencies and default target(s) to call for OS repackagers. E.g. could specify that for OS repackage the style checks are not needed, so only call the py target (and the target can be interpreted by the tool). Something like:

[project.tasks]
requires = ["tox>=4"]
test-target = ["py-unit", "py-integration"]

This does imply that we need test runners to only support a PEP-517 style API that can take the target list. Alternatively, we could make the interface CLI bound:

[project.tasks]
requires = ["tox>=4"]
target = ["tox", "-e", "py-unit", "py-integration"]

I prefer tough the PEP-517 interface because we can then add an endpoint of get_valid_targets that could return not just test targets but lint targets too.

3 Likes

I just want to emphasize, again, there are really two different layers in the stack that are intersecting here that need to be distinguished—the testing tool (Pytest, Unittest, etc) and a task runner (tox, nox, etc), that if a careful distinction is not made, may make this proposed functionality not very usable for either most upstream developers or most downstream distros and repackagers.

And it’s worth keeping in mind that the primary motivation here—as it says on the tin, to provide a standardized way to specify how to invoke the project’s test suite, particularly for downstreams that need to run the project’s tests in their own environment, not necessarily to standardize task runner configuration (which certainly has distinct value too).

This proposed syntax would seem to require the project to use a Python task runner (never mind one compatible with the new hooks it would need) and work on the basis of targets, which already excludes the unfortunate large majority of Python projects [1] This creates a much larger barrier to entry for projects to adopt it, particularly when many maintainers are already hesitant to bundle tests and expose the appropriate config given its something that benefits downstream packagers, users and tools more than their own development.

Moreover, it would seem such an approach would seem to require additional complexity to actually solve the motivating problems for downstreams:

  • There would need to be a hook to invoke a task without (at least Python) environment isolation, in the current environment, so that downstreams could actually test the project as packaged for their distribution.
  • There would need to be sufficient standardization of at least the core tasks that downstreams need (tests, docs, etc) so they can be consistently programmatically invoked.
  • Both downstream tooling and task runners would need to implement support for these interfaces, and project authors adopt compatible versions (or switch task runner).

All of these things latter are likely doable, but adds complexity to this approach, while only working for the relatively small fraction of upstream projects that use a Python task runner, rather than providing a generalizable solution.

This syntax is closer to what others were suggesting and would bypass many of the problems above, (though it is still framed in terms of tasks and target without any clear indication of what the target actually does, which may simply be a copy/paste oversight). However, it runs into the issue discussed above—there’s conflation between two levels of the stack, where either a testing tool or a task runner may be invoked here.

This has significant implications for what actually gets tested and where when it comes to downstreams, who want to test the installed package in their environment (and as others mention, it needs to be clearly defined that this should only run the project’s actual tests, not linting checks), so its unclear if this would be actually useful as-is without being more strictly defined.

That said, I’m far from the expert like you all and still giving it more thought myself about what specifically to propose; I like the flexibility, extensibility and DRYness of defining a more generalized task-based interface and configuration rather than specific ones for tests, docs, and whatever else is needed, but on the other hand, that dramatically increases the scope and scale of this effort well beyond the original motivation, while being less well-suited for or requiring additional complexity to actually fulfill such, and I fear it may result in the great being the enemy of the good.


  1. Based on the disappointing fact that only 7% of Python developers in the Python Developers Survey 2020 used Tox for any of their projects, compared to around 50% usage for Pytest and 30% usage for unittest (for reference, it was a multiple-selection question; 63% of participants used some form of testing, and no other test/task runners were mentioned, with “other” at 1%, presumably including Nox) ↩︎

1 Like

It’s already the case with package build backend and frontends, so if anything we would just remain consistent and not deviate.

task runners can/should provide a mode to run in host mode, e.g. for tox see GitHub - fedora-python/tox-current-env: tox plugin to run tests in current Python environment that makes this a non-problem.

Would be fairly trivial to standardize those via defining your test and docs targets for the task runner:

[project.tasks]
requires = ["tox>=4"]
targets.test = ["py-unit", "py-integration"]
targets.docs = ["docs"]

This would be fairly simple tough on both ends. For task runners we only really need tox, nox and pyinvoke. There’s precedence here, adopting PEP-517 was fairly easy for flit/setuptools. They just had to expose what they already did under a new common API.

Hence why I don’t like it that much. If the user sets up a testing (or documentation) tool here it easily can fail downstream or on another machine because you never addressed all the other factors at play here:

I think it’s similarly important that whatever we come up can live together with task runners and not cannibalize it. It would be a bad place where some of your test setup/teardown logic is in the tasks section and the rest in nox/tox configuration files (ini, toml or python file).

1 Like

Sorry for the delay!

IMO this would indeed be useful, but not that much, and I am not convinced it is worth the trouble for everyone.

I think distros like Fedora would benefit the most, so please take their feedback into account.

I don’t think anyone is finding it hard to understand that there are differing opinions. Personally, my questions are around understanding and not stating that you or anybody else is doing anything wrong or poorly.

But we are an association of people who exist to help standardize stuff to make it easier for things to work together. Now most of our standards are optional, and I don’t view this entire topic as any different, so no one is being told they have to do something. But I think we are discussing whether there is a pattern here of people wanting a way to write down how to run their test suite, maybe build their documentation. And if so can we agree on something that covers the 80% case for those folks where it makes sense?

Yes, I think this covers well where I’m coming from. There are plenty of tools that potentially want a way to execute your test suite directly and know the results without all the nice extras that nox/tox provide.

To be even more concrete, VS Code needs access to the command used to run pytest. Why? Because we need to have pytest tell us what tests there are to populate the test explorer. Right now we either have to hope the command is nothing more than pytest or that folks fill in VS Code-specific settings to tell us what flags to pass. We could try to read your tox.ini file, or your noxfile.py file, or some other bespoke way of specifying your tests, but not everyone uses those tools (goes back to Paul’s point about not requiring folks to follow a specific workflow). But if we can give a carrot for getting help to Linux distros like Fedora (which also help test CPython prior to every release), potentially make a single command specification work across tools, etc., then I think it’s worth having this discussion.

From a Fedora perspective, you’re right. But from a VS Code perspective where I may want to help you install your dev dependencies, expecting core metadata via PKG-INFO or METADATA isn’t reasonable to assume.

1 Like

I completely agree with Bernát, perhaps in part due to both of us maintaining such a tool.

I think what most posters in this conversation are missing is that tox, hatch, nox, etc. should not be thought of as task runners but rather as environment managers, which is totally different and far more complex.

Through that lens, what seems to be happening here is distributions like Fedora & Conda want a universal way to map the config of such managers (since most projects use one) to their own build system’s format.

As such, I’m quite against standardization on this one. Perhaps tox and the like could offer a command that outputs the JSON config of the default or base environment that distributions could consume and translate to their liking.

Right, they both create environments (environment manager), and run user-configured tasks within those environments (task runner). Your comments raise the important point, which mine did not make clear, that it is the primarily the former (not the latter) that make them conceptually distinct from test runners, exist at a higher layer of the “stack” and behave fundamentally differently.

And indeed, it seems to be this environment management functionality that, without a standardized way for consumers to determine a test command will trigger it, and signal that it should be bypassed, appears to be the main potential pitfall for the primary consumers of the metadata proposed here—distros, packaging ecosystems and other downstreams who want to test the package in their respective real-world environments, not the isolated one produced by the upstream’s particular tooling.

I may certainly be wrong, but that’s not the impression I got from what downstream people and others have shared here and in many previous discussions, and that’s not how I would describe my needs in my (limited) role as a Conda-Forge package maintainer.

The primary thing package consumers seem to want is a standardized way of being able to programmatically extract the dependencies of and invoke the project’s test suite (and, to a somewhat lesser extent, build the project’s docs), as a modern, tool-independent replacement for the deprecated and to-be-removed test and build_sphinx/upload_docs commands, bespoke hacks or manual guesswork (see the comments on those issues for an example of some of those requests). Any further environment setup is the responsibility of the consuming tooling.

While given the challenges, its possible that it may simply not be practical to standardize how to provide that in a way sufficiently useful to enough of the downstream ecosystem for enough of the upstream ecosystem that justifies the effort and complexity, I don’t think we should flatly reject exploring possible approaches to such before having fully considered the problem space and potential solutions to the potential issues raised with the current proposals.

Could you share a source on that? As I cited above, per the official 2020 Python Developer survey, only 7% of Python developers used Tox in any of their projects, whereas nearly 70% used a testing framework of some kind.

My point is that we shouldn’t require upstream projects adopt a whole environment management and task running tool just to be able to expose their test dependencies and invocation in a standardized, programmatically accessible way, or adoption is likely to be very limited, and mostly concentrated in the projects that are already (per @encukou) the least effort for downstreams to test.

This sounds to me like you want to standardize the test runners’ interface, not define generic task runners. How do you get from how to run the test suite to what tests would be run? E.g.:

    pytest \
      --cov "{envsitepackagesdir}/tox" \
      --cov-config "{toxinidir}/tox.ini" \
      --junitxml {toxworkdir}/junit.{envname}.xml \
      {posargs:.}

For pytest you can hardcode some flag in there, but what if the user uses another test runner? Or more importantly how do you know which flags are needed for running the test suite (e.g. coverage flags) vs which ones influence the test discovery (which you really care about I feel like).

I think for a task runners it’s critical to define not just how to run the task but what environment it should be run into. tox e.g. might use this information to create a runtime environment to run the task in it, while downstream build systems (like Fedora/Debian) might use it to make sure their current environment setup satisfies that and fail hard otherwise. See for example this feature addition by the Fedora team to achieve this Ability to disable provisioning · Issue #1921 · tox-dev/tox · GitHub
I don’t think downstream people don’t care about the environment the task needs to be run, just that they’d use it as a check rather than a setup.

You’re comparing apples to oranges here. The survey wasn’t made in between maintainers of projects that get repackaged downstream. I’m pretty sure that projects that end up being repackaged by various distributions the number is more like 70% use tox, nox, hatch or escons; and 95% have test suites. E.g. the data science community (included in that survey) tends to not write test due to the explorative nature of their task,

We can have a default task runner that basically implements your built-in assumption on how the environment is set up (pip install project and pip install requirements.txt if present, run task with cwd set to project root and inherit all env-vars from the host), but I truly believe that defining how to call targets without specifying in what environment to run it will have very limited benefit.

1 Like

What you actually need, surely, is a way to ask the project “what tests are there?” After all, the project might not even be using pytest.

I’m not trying to be difficult, just trying to pin down the actual requirements (something I’ve been trying to get clarity on for ages here). It sounds like your real requirement is “have a way to get a machine readable list of tests from the project” rather than “know how to run the tests (without any requirements on what, if any, output is produced)”. This may well be different from the requirements redistributors have.

We have a lot of this sort of problem with pip as a build frontend, where the PEP 517 interface is fairly tightly specified, but doesn’t allow any sort of introspection of backend output for error or progress reporting, for instance. It sounds like it would be good not to repeat that mistake here.

2 Likes

I asked that already once and was swiftly shot down, so I’m explicitly not asking that.

Right now we run pytest for you, so we can inject the appropriate flags for our discovery (which will be a pytest plug-in when we rewrite that bit of code). See Testing Python in Visual Studio Code for the configuration, but we have you specify what to add on after pytest, not the full command to run.

Then VS Code won’t support it. Not much I can do about that w/o an API and as I said, got shot down for that idea already. Plus, let’s be honest, I can support the vast majority of people bothering with testing by setting up pytest, and a bit more for unittest. But basically it’s my problem to worry about that detail.

Because most people don’t set up their tests to also run coverage with us. My bet is most people make coverage a separate task for faster test execution in the REPL developer loop they are typically in and not something they run quickly and frequently.

As mentioned above, pytest didn’t like that idea.

To be frank, there are not enough folks who fall outside of pytest and unittest for me to worry about that case by trying harder to get a common test API. Only so much time in the day. :wink:

I would say that’s ideal, but not a requirement because …

… I can at least work with this scenario. Pragmatically, we could check the command for pytest and then do the right thing in our case (same goes for unittest).

Considering pytest supports unittest, why not always just try with assuming pytest and bail out otherwise? :thinking: If that two are the two main type you want to support and nothing else.

“Almost all unittest features are supported”. The only limitation I’ve run into in practice was subtests. unittest.TestCase Support — pytest documentation