Conda has a nice declarative way to handle testing. We could encode essentially the same settings into pyproject.toml, with the limitation that $PYTHON is the only available executable other than console scripts that are installed as part of the test requirements.
Not necessarily. Assuming we can generalise those categories, we could just end up with a variable (e.g. $CONTEXT) that can be used as part of the command. So your command is now tox -e $CONTEXT which becomes tox -e test_ci or tox -e test_dist depending on what the tool chooses to pass in.
The main thing this command enables is the person trying to invoke tests doesnāt have to figure out whether the project uses (e.g.) tox or nox. It in no way replaces what those tools do, and doesnāt even prevent project developers from having to know about them and how to invoke them directly.
Since Iām doing this exact task today, hereās a great example of a repository that would benefit from having somewhere to define the test command and dependencies. Itās not difficult to figure it out, but also far from obvious.
That would be fine wouldnāt it? The objective here is to have a simple mechanism to expose a way of running the tests, right? Or is it the objective to overtake the UI and work as a frontend for tools like tox/nox?
If we want flexibility but without overtaking existing projects, an alternative can also be inspired by PEP 517:
I guess the difficulty, then, is in generalizing the categories (task names), and making tox/nox (or at least their best practices) respect the chosen names.
Yep, thatās what I think we would end up working towards.
For me, itās the āexpose a way of running the testsā.
If you think about what it takes to run a test suite or build your docs, itās basically:
Get the files (from an sdist, VCS, etc.)
Install the dependencies (either via an extra or hard-coded in nox or tox)
Run the tests (which currently is hard-coded in e.g. nox, tox, or GitHub Actions)
Repeat for the various environments available (which is hard-coded in e.g. nox, tox, or GitHub Actions)
What we are talking about is step 3 and a way to specify this in pyproject.toml so what command you need to run for your tests isnāt buried/hard-coded in a task runner like it is now.
BTW, step 4 could be automated thanks to requires-python in pyproject.toml and appropriate code to discover Python installs (which is doable as a library).
Maybe not. Depending on how we specify things, this could all be reasonably self-discoverable. Letās say we go with the simple solution of being able to specify in pyproject.toml some [tasks] section which letās you specify a command with any name:
[tasks]
test = ["$PYTHON", "-m", "pytest"] # or ["$PYTHON", "-m", "nox", "-s", "test"]
We could then say that if thereās a matching extra in pyproject.toml or test, the expectation is that the extra is to be installed before running the task. And with requires-python you can even know what environment are expected to be tested against.
As for names, does anyone use something other than test or tests? I donāt think itās unreasonable for a tool to assume that people want to run those tests against all the versions of Python available that fit the requires-python constraint.
For nox this could all be done today since itās just code. I was actually planning on creating a library someday that effectively did all of this as an experiment and then try to get a GitHub Actions workflow to pick up on whatās in pyproject.toml as much as possible. Basically I want to take DRY as far as possible when it comes to setting up testing both locally and in CI (the test matrix is probably the biggest hassle right now).
I believe @bernatgabor said Tox 4 is going to support a nox-style code declaration approach, so that might also work in this scenario in terms of bringing this in w/o massive work on the side of task runners.
Thatās the key for me. This pyproject.toml entry doesnāt have to do much, it just has to bootstrap the process of running tests. Everything else, including test environment setup, test matrix, etc. can be handled by the test runner that this entry bootstraps into.
The approach of matching the task name with āextrasā seems to work fine for the pytest use case, but how do we conciliate this approach with some existing patterns in the community for tox/nox?
For example, using these articles/projects as reference:
We can see that there is a popular pattern of using extras to centralize lists of dependencies but offloading the responsibility of installing them to the task runner, while the task runner itself is not listed as a dependencyā¦
Well, sorta.
Thereās functional tests, code formatting, coverage, static typing checks, performance checks , and who knows what else you can call tests, and it would be nice if they were kept separate.
For me the relevant axis is whether a failure means a bug or not (e.g. machine too slow, or slightly different version of mypy/black).
If we agree test/tests is only for the āwarranty void if brokenā stuff, that would be great. But itās not an intuitive meaning for everyone.
I use ālintā frequently for running linters (not technically tests, I know). I donāt actually use ātestā, as the base tox environment is called āpyā, so I use that to run tests - tox -e py.
As Iāve already said, I donāt use extras, but rather requirements files to specify test dependencies (or I include them directly in the tox config).
From Nixpkgs point of view we really would like to have a test entry point. However, we do generally tend to disable ātestsā that are not relevant for us, such as checking whether black formatting passes and coverage. Therefore, I think weād prefer an entry point that only runs actual tests.
But the question is whether the task runner is required in this potential future to run the tests, or just convenient? Using the RPM example, would Fedora need to use tox/nox as provided by a project to verify something worked, or could their own tooling run the test suite based on the information provided and skip using tox/nox? And if tox/nox is required to run the tests then I would argue a test extra leaving those out is an incomplete extra.
But do you need to be able to lint the code of an sdist? I get wanting all of that while developing, but do you need all of that to validate a build?
I totally expect folks will have lint rules, etc., but I just donāt think in this discussion they are critical to try and potentially standardize if we are focusing on automatically running tests from an sdist. But if we do this right then I assume the community will figure out its own best practices around this and the naming scheme will just fall out of it for e.g. linting when working from source.
Are you using setuptools for the code? If not, then in a world of pyproject.toml is keeping dev-only requirements separate in a requirements file still what we think folks should do? For me, I always viewed the setup.py/requirements.txt dichotomy for library/app development mostly a way to avoid conflating setuptools with your development workflow (or any flow where you didnāt really need setuptools as a build tool). But now that we have pyproject.toml, you technically donāt need to use your build tool to read your (optional) dependencies to install stuff. For me, that suggests our historically strict separation from dev versus install dependencies in separate files isnāt quite so important.
It might make sense to place the task runner is in the convenient categoryā¦
However if the user decides to use a task runner, it also makes sense to think about it as the only āofficialā/āsupportedā way of running the tests.
The redistributors are free/welcome to try running the tests in a different manner, but they cannot expect the authors to support this alternative methodā¦
I can see a lot of sense in this reasoning, but maybe the mechanics of using the āextraā field is not the most appropriate to express this kind of dependency? When you use tox/nox, you donāt have it installed in the virtual environment accessible to the test code right? The test code donāt need the task runner to execute properlyā¦
I donāt know if I understand this commentā¦
How is extras_require in setup.cfg/setup.py different from optional-dependencies in pyproject.toml, and why using setuptools or another backend would make a difference in the decision to opt into requirements.txt for the development workflow?
Actually I do, that way I can control for the version of the task runner. Plus it means I donāt have to assume a global install is available.
One requires setuptools to read the extras, the other doesnāt.
The question is why is someone like Paul using a requirements file for dev dependencies? Is it habit, is it because we have historically said ālibraries use setup.py, apps use requirements.txtā and people view dev dependencies more like an app thing, etc.? Would pip-tools still read from a requirements.in file if pyproject.toml existed first? Iām just suggesting we may want to start thinking about where we are storing things now that we have a tool-agnostic way to specify stuff (hence this whole discussion about having a way to say how to run tests in a tool-agnostic fashion).
Yes, but for reasons mostly unrelated to whether I want to put metadata into pyproject.toml.
I donāt know about āweā, but I certainly think so. For me, pyproject.toml is for defining the project and how to build it. Itās not a general place to put all config related to anything to do with development. I know many people (and tools) disagree with me, but thatās my view.
I donāt really know what you mean by this. For me, setup.py or pyproject.toml is for defining runtime dependencies as these are part of how the project is used. Thatās entirely different from requirements.txt, which is simply a way of listing a bunch of āstuff to be installedā for a given task. That task might be ārun the testsā, or ābuild the documentationā, or something unusual like ābuild the demoā, or ādo a releaseā. But what tasks make sense is entirely down to the individual project, and its workflow. We can suggest best practices, or even write standards to make certain approaches preferred, but itās never going to be universal in the same way that building a project is (not unless we move towards some sort of āone tool to rule them allā approach like some other languages, but this has never been the Python philosophy in the past, and I donāt see why it should be in the future).
Thatās exactly the point Iām trying to make. If someone comes to me and says āI tried to run the tests for your project and they failedā then Iād be unwilling to treat that as a bug report unless they followed the documented process for running the tests - just like I wouldnāt accept a bug report when running on an unsupported platform. The user can do the investigation and demonstrate that there is a bug, by providing a reproducer on a supported configuration, but itās on them to do that work, not on me as project maintainer.
On the other hand, I have global installs of tox and nox using pipx, that means I donāt have to install anything in order to run the tasks for a particular project - so I donāt need a virtual environment for the project for anything other than interactive experimentation. Itās all about trade-offs, and thatās something each project (or developer) should be able to choose for themselves.
Because itās not project metadata, and requirements files exactly capture what I want to express here, which is āa list of requirements for pip to installā. Iāll have as many or as few requirements files as I need for the project. Or I might not use a requirements file at all if itās not needed - tox allows me to put the test requirements in tox.ini, so thatās what I do, because that way the requirements are with all the other information on how to run the tests, and I only have one file to manage.
I really donāt see why itās so hard to understand that this might be a choice that some developers might want to make. It may not match what you prefer to do, but why should that matter? And before you ask, I understand that it means I canāt (for example) just push a button in VS Code to set up a test environment and run tests[1] - but I donāt care about that. Maybe I will in future, and Iāll change my mind (and modify my workflow) but for now I like running tox manually from the command line.
Why do we need consensus here at all? Why do we all have to use the same development workflow?
Maybe there was a communication misunderstanding with the expression: āvirtual environment accessible to the test codeā.
Letās say you install tox/nox in a virtual environment, e.g. /tmp/.venv.
When you run the tests, the code will not have access to any package installed inside /tmp/.venv.
The test code will have access to the packages installed in /<project-root>/.tox/<testenv-name>. You will not be able to import tox when writing a test caseā¦
The direct test dependencies and the task runner in this case are two distinct types of dependencies, installed in two completely different virtual environments.
We can make an analogy with the build process:
A backend can be seen as a dependency of a project. The project will need the backend installed somewhere to be built. However, during runtime, the backend is irrelevant.
It is the same for the task runner and the tests. You need the task runner to orchestrate the test, but you donāt need it when the test code is running.
Since setuptools introduced setup.cfg people didnāt need anything fancier than ConfigParser to be able to read extras_require, but some people still decided to use requirements.txt. I have been using extras_require in setup.cfg to store test dependencies because I like the convenience of centralising them in a single file. But I agree with Paul that these dependencies are not really metadata, they donāt allow my packages to expose extra features to my end users[1], so it is perfectly natural that some developers choose to not mix those two things.
Another use case is if you want to run the tests of known downstream libraries as part of your CI to make sure you arenāt accidentally breaking compatibility.
This is a good point, and IMO it actually speaks to the same underlying distinction that @brettcannon alludes to above:
Thereās really two separate layers (i.e. concerns) hereāthe testing tool (and its invocation), which collects, runs and reports the results of the projectās test suite, and the task runner, which is responsible for setting up environments and executes arbitrary project-defined tasks within them (tests, docs, linting, etc).
It would appear that simply providing a standard means of declaring the dependencies of and invocation for the projectās testing tool (and possibly also the docs builder), which downstream tooling could be responsible for installing and calling, would seem adequate to meet most of the immediate need here while being relatively straightforward to complex to implementāeither following the form of @steve.dower 's suggestion (or paring it down further, just defining the invocation and using standardized extras names to single-source the dependencies for each).
This more or less follows the model of PEP 518 in defining the build entrypoint and the dependencies it needs (though in this case, it is a single hook rather than several), and make setting up and invoking the callee in a Python environment with the indicated dependencies the responsibility of the the caller, with the callee being responsible for the rest.
However, without clearly defined requirements and guidance, I could forsee two distinctly different usage patterns which could significantly complicate things for callers. The most obvious approach would be for projects might specify their test tool and test-specific deps as test requirements, and the test command as the invocation. Meanwhile, I could see others specifying simply their task runner as a dependency and its main/test entrypoint as the invocation, and defer the actual test environment creation and setup to it, outside the direct control of the caller.
Both provide value, but they solve somewhat different problems and target different layers in the stack, and I worry that unless we either define both, or be very clear about which one is expected, the result for callers will be worse than either since the caller will have no reliable way of anticipating which theyāre calling, much less request one or the other. I would advise either providing both, or making it very clear it is intended for the former, since that seems to be what downstreams want here. But Iād like to hear more from repackagers about that.
Thereās another alternativeāallow defining arbitrary tasks in a standardized format in the pyproject.toml, with a handful of standardized names (test, docs, lint, etc), each with their own dependencies, invocation and perhaps other configuration, either a generalization of the task runner configuration format like PEP 621 for metadata (which would be nice in theory, but perhaps too limited in practice given the diversity of tools and approaches).
There appears to be some interest in standardizing this sort of thing, here and certainly elsewhere as a modern, standardized replacement for the various built-in and custom distutils/setuptools commands, but the scope is much more expansive and I worry it would get bogged down in complexity. Still, it might be something to at least keep in mind in terms of leaving the door open a future proposal when designing this one.
This is a good analogy, though arguably, a task runner is in some ways more analogous to a frontend, as it orchestrates the environment creation for the backend, the projectās actual testing system; and the two are only loosely coupledāthe tests, like a build system, can be invoked with a modified frontend configuration, a different frontend or even directly, depending on the needs of the caller.
For example, upstream projects may use tox, nox, etc. during development, whereas repackagers may use tox with a custom plugin to skip dep management, their own env setup tooling or simply invoke the test tool directly, just like upstream developers typically use pip for installation but downstream distros really need something simpler and more customizable like installer.
Yes, but I think @brettcannon 's point is that is an unstandadrized bespoke format tied to one tool, which downstream tools cannot rely on being present or canonical, unlike pyproject.toml.
Just like any other packaging standard, nothing that is decided, specified and implemented as a result of this discussion would require you as a package author to adopt a certain method of specifying your test, etc. dependencies or invocation. Likewise, no one is required to fill out the various core metadata fields, specify declarative metadata in the PEP 621 [project] table, declare their build backend and dependencies in a PEP 518 build-system table, etcāor even use a task runner framework, write and run a test suite, or document how to build, install and run the code.
However, like providing and encouraging a standard, interoperable, tool-independent way of specifying test, docs, etc. invocation and dependencies, these things all help other people, tools and ecosystems use, distribute and contribute back to a given project, ultimately benefiting everybody. Of course, there is a cost-benefit, as it requires some amount of effort on the part of the package author, but I donāt see how at least offering and encouraging a standardized mechanism for this is such a bad thing.