Idea: Introduce/standardize project development scripts

I’m not really sure this is of any real benefit. IDEs are already going the other direction from this with devcontainers that are isolated and preconfigured, and I believe you yourself earlier said you thought automatically running scripts was an unnecessary security risk.

There’s also a lot of other places where this already has some level of ad-hoc standardization (in that the existing solutions are so common, people know how they work and can just jump in, not that there’s rigid specification language), from pre-commit hooks to makefiles/justfiles, to runners and entrypoints defined by workflow tools like pdm, and this proposal wants to do something new instead rather than meet users where they already have solutions.

Partly because that’s the approach that works, since we don’t have standardised development tools, but it is also solving a slightly different part of the problem (acquisition). Once you’re in these containers you still need to choose to launch the task.

The closest we have to standardisation is a GitHub Actions workflow. That is, you read the YAML file to see how they do it there, and then do the same thing locally.

VS Code configuration files (which can be checked into a repo) offer the kind of standardisation we’re talking about, though primarily through extensions (TypeScript, preinstalled) or shell commands (specified per-shell, IIRC). So I think you have to accept that IDEs like this direction, and perhaps the right place to standardise is between IDEs rather than within a single ecosystem.

It wouldn’t be automatic, but through a command triggered by the user. These commands today typically run python -m unittest or python -m pytest unless you configure them, and anyone who writes their tests outside of these frameworks is likely going to force users to always reconfigure.

Okay, that’s a fair set of points when framed that way, though I still think this isn’t quite the right way to go about standardizing this as presented, and that standardizing somewhere else at a level above the python ecosystem is actually correct here if IDEs need this. (for the record, I don’t know what the right way is here, I’ve never really had an issue with just reading a contributing.rst file and following project specific instructions on the things they want for formatting, tests, docs, so my perspective is this is a non-issue or a small issue)

I don’t see the kind of language here that allows Requiring certain tools that are expected to exist, and then just invoking those tools, and I don’t really think we want pyproject.toml to expand into knowing how to specify a requirement on non-python tooling right now (for instance, specifying a set of compilers needed for native deps)

I’m not sure what the right solution is here, but this feels too limiting compared to what exists currently, and that if we go this route, I think it’s going to result in the wrong kind of pressure to adopt or just be ignored by those who see it as too limiting. it’s may even lead to people doing things we don’t want here to shove it into “well it works how users wanted to use it”, like downloading a compiler in a python script just so that it runs here.

Some of the proposals (including mine) suggested installing a PyPI package and invoking that. That tool can then do whatever it likes, such as checking for certain tools, potentially installing them before executing, or whatever else is needed (like a build backend for wheels).

I agree it would be poor standardisation to somehow require arbitrary tools already be present on the user’s machine.

The only two problems I’ve had is (a) when no such file exists, and (b) when I’m trying to do as many projects as possible as quickly as possible (e.g. when I was setting up test runs on ARM64 machines a couple years ago).

Having to read and understand prose just isn’t as efficient when your goal is “run the basic build and tests”. python -m build is a huge step forward in the first part, and having something equivalent for arbitrary named tasks (with a subset of recommended standard names) would be equally useful.

2 Likes

Some of the proposals (including mine) suggested installing a PyPI package and invoking that. That tool can then do whatever it likes, such as checking for certain tools, potentially installing them before executing, or whatever else is needed (like a build backend for wheels).

Right, this is already quite doable today, and has been for about a decade already, in projects that ship a https://pypi.org/p/bindep config:

Have your automation create a venv and pip install bindep into it if it isn’t already present, then invoke it (optionally specifying a profile name corresponding to the tests you plan to run). It either quietly succeeds at which point you proceed to run your tests, or errors out with a list of the distro packages you’re missing. You could optionally feed the returned package list directly to your package manager tool as part of your automation, we do that extensively in CI jobs for example.

There are probably other similar tools available these days, but my colleagues and I couldn’t find a canned solution at the time which was why we created our own.

I can understand the motivation here, but I’m not sure it would work out in practice. And what bothers me is precisely that - in the enthusiasm for the idea, we don’t look closely enough at the pitfalls, and we standardise something that in the end fails to deliver.

Let’s take it as given for the moment that we do standardise something roughly like the proposal @noirbizarre made (just for the purposes of having something concrete). The first thing we would need is for everyone to start using it. Otherwise, your workflow just becomes “each project, check if it uses standard task scripts - if so, run a command to execute the standard task, otherwise, check what they do use and run that”. Which is basically “check what they use and run that”, just with “standard task scripts” being an extra option.

So let’s assume the new standard is popular enough that everyone uses it. That’s a big expectation (why would stable projects in low-maintenance mode bother?) but we can come back to that.

Now, we have the problem that there will be people who take the simplest possible approach and just set up a “test” task that does “nox -s test” or “tox -e test”. So you still need nox and tox installed. And worse, people will quite likely assume that because they are using a standard, all they have to do is to document “to run the tests, just use the standard test task”. And now you have to work out the dependency on nox for yourself. Which isn’t hard, but wasn’t the point to have to do less work than “read the docs, install what they say and run the command they tell you to run”?

In addition, the project’s requirements for running tests have just increased, from “install nox and run nox -s test” to “install nox and pytask, and run pytask test”. Maybe pytask will be uv, or some other existing tool. But it’s an additional dependency, unless every test runner becomes capable of consuming the new task definitions (and I doubt that, for example, pytest will…)

Having said all of the above, I do actually like the idea of having some form of standardised task definition[1]. I just don’t think we should claim benefits that we can only achieve if the standard is universally adopted.

My fear is that we create a standard which generates external pressure on project maintainers to conform, without offering sufficient benefits to those project maintainers. Similar to the pressure we see to “add typing” without considering whether the project benefits from adding (and maintaining) type annotations.


  1. although I wish we could have it without the requirement that anything that wants to use it must be structured as a Python packaging project with a pyproject.toml ↩︎

2 Likes

You could have chosen this example (which is more along the lines of what I proposed, but I found this one before finding my own):

The requires metadata specifies what to install. Obviously things that aren’t installable from PyPI are just as unreachable as they’ve always been, but nox and tox are perfect examples of libraries that I would expect would be used here, probably (hopefully) with a Python interface rather than a shell interface.

So the requirements go back to “run pytask test”. Is this not okay?

Yes, this is somewhat unavoidable, and why I allow the idea of a task that takes its arguments (specified in the tasks file) and passes them to subprocess.run. You can still list additional dependencies to pre-install, and if the tool doesn’t have a native interface then you can still call it via the shell.

There’s also really no way around this anyway. Projects already face pressure to support certain non-standard tools (have you heard about pre-commit? :upside_down_face: ), and changing a development process is never a benefit to the people who designed the existing process. Those who want to ignore it will continue to ignore it, no matter what tool or process comes up. That isn’t our fault, and we don’t have to refuse to design something else just because some people won’t use it.

If anything, we owe the community a replacement for python setup.py test, which (while I believe it was on the way out already) was completely removed by moving everyone away from setup.py.

1 Like

OK, so task runners have to do environment management as well. Which (as we found with PEP 517) is a lot messier than it seems at first. It’s probably OK (I can’t see an immediate way to achieve the forkbomb that you can create with PEP 517) but it’s non-trivial. And potentially slow (uv notwithstanding…).

But this is details, which can be deferred until we have an actual PEP.

I think “owe” is a bit strong. After all, setup.py test was far from universally adopted. And tox (or nox) is a perfectly good replacement - certainly they aren’t standardised, but neither was setup.py test[1].


  1. it’s not like it was supported by distutils, which was at least in the stdlib ↩︎

1 Like

Please, can we stop suggesting users install yet another mysterious tool on their machine just to get simple things done?

The fact that almost nobody seems to use bindep in Python communities (apart perhaps from OpenStack, which is a very peculiar corporate-dominated ecosystem), despite being “about a decade” old, should perhaps tell you how reasonable your suggestion is.

The amount of pushback against the idea of standardizing something as simple and common as “please run the test suite for this project” is just flabbergasting. In the meantime, Rust users can happily run “cargo test” on virtually any project without having to check per-project docs and paste a bunch of obscure commands.

3 Likes

But in that case, the tests are written in rust and run by cargo. We don’t have a “one tool” ecosystem, and people apparently aren’t content to leave it as “pdm run …”, “hatch run …” etc based on the project of choice’s tooling, which would be the obvious analog.

1 Like

Yeah, I think this is the upstream issue–there’s no cargo for Python, in the sense of a unified tool that sets the de facto standard simply by being official. Standardizing behavior across a bunch of disparate projects is way harder than the core team just deciding on how things should work, and that standardization isn’t obviously better in the way having one tool is better.

I believe “cargo for Python” is the stated goal of Astral, but that’s a 3rd-party tool and I doubt it will become the “blessed” official Python workflow manager.

The amount of pushback against the idea of standardizing something as simple and common as “please run the test suite for this project” is just flabbergasting. In the meantime, Rust users can happily run “cargo test” on virtually any project without having to check per-project docs and paste a bunch of obscure commands.

Sorry if I touched a nerve, and I’m not pushing back against the suggestion at all (it seems like a pretty reasonable desire on the face, even if implementation details may get hairy).

But is it really the case that Rust users can happily run “cargo test” on virtually any project without having to install things like non-Rust libraries those projects might need to integrate?

How do I know what I should replace “…” with?

It’s certainly not an “obvious analog”, because the role of the various tools and frontends in Python-land has become extremely murky and confusing. Should I use pip? hatch? Something else? It might depend on whether I’m a developer of the project or a user of the project, but running the tests should be the same command in both cases. So which one should it be?

If there are non-Rust libraries without any existing Rust bindings perhaps, I’m honestly not sure. But for most projects without such dependencies, that should not be an issue.

(also it’s a bit weird to bring up non-Python dependencies, which is a use case that the official Python packaging ecosystem has always refused to address, which led to the creation of conda and related software distributions Anaconda and conda-forge)

1 Like

I don’t dispute that the end command you as a user or developer might run isn’t obvious as a result, but this is the cost of leaving these things out of core tooling that can be assumed available. A rust project is obviously using cargo, so you use cargo. A python project, check what they are using, use it.

There are so many ways of doing things, and not all of them in the python ecosystem are even going to be supportable if the assumption is “available on pypi”. Build systems aren’t restricted this way, and we have wheels so that users don’t need to compile it themselves and have to go get dev/build-time dependencies, but dev scripts can and sometimes do need to interact with things not appropriate to distribute on pypi. I have several projects that build python wheels at work where the build system is just zig.

If there are non-Rust libraries without any existing Rust bindings perhaps, I’m honestly not sure. But for most projects without such dependencies, that should not be an issue.

(also it’s a bit weird to bring up non-Python dependencies, which is a use case that the official Python packaging ecosystem has always refused to address, which led to the creation of conda and related software distributions Anaconda and conda-forge)

I wasn’t the one bringing it up, I was trying to provide a current example of an existing solution within the Python ecosystem addressing Michael H’s and Steve Dower’s comments about identifying project-specific tools and (presumably non-Python) dependencies that need to be installed first in order to run tests.

Steve Dower specifically mentioned “…installing a PyPI package and invoking that. That tool can then do whatever it likes, such as checking for certain tools, potentially installing them before executing…” and I just wanted to say that exactly this does already happen in some projects.

You seem to have some negative personal opinion about the Python projects where it’s happening (or of me personally? I can’t tell which, but I truly hope it’s not the latter), but it doesn’t change the fact that this exact sort of tooling is run constantly by hundreds of projects for many years already, indicating the need for such solutions is valid, and making me suspect that it’s not the only instance of a community trying to solve that exact challenge.

I don’t think I’ve ever worked on a project which didn’t fall into either category of:

  • It uses the almost ubiquitous test command+setup:
     pip install -r tests/requirements.txt  # (or possibly .[test])
     pytest
    
  • The project doesn’t have one single blindly-runnable test command. I’ve had C binaries that needed recompiling with special enable debug auditing flags for some tests to be runnable but need disabling for benchmarks, Qt variant agnostic code that requires specifying which Qt variant to use, fuzz testing which you don’t every time due to its (possibly infinite) duration and limited scope, tests that are slow but modularise well so that selecting only the tests corresponding to changed source code is the expected (and the only sanity-preserving) developer flow.

In the first case, changing that to pip install pytask; pytask test doesn’t do me any favours but it does reduce transparency and customizability (as I’ve alluded before, given the choice between hunting down the virtual environment that pytask creates for itself to install pbdp or a local copy of some dependency into, or just ignoring pytask, I’d ignore the task runner).

In the second case, pytask would just be a misleading distraction. I’d probably need to create a commands = ["python -c 'raise SystemExit(\"No! Just read the darn README!\")'"] to stop people from telling me my project is broken.

1 Like

Obviously this depends on the context but assuming this is a library maybe there should be one “standard” test command for the project. It shouldn’t cover everything that is e.g. tested in CI but providing a command to run standard tests against the installed library is a reasonable feature for users/downstream. I would want e.g. distros to be able to run the tests with their patched version of Python etc. Many users might want to do the equivalent of make test especially if building from source. If a user reports a crash etc then it is nice if there is an easy way for them to run the test suite and report results.

2 Likes

You might surprised how many people use unittest if you think pytest is “almost ubiquitous”.

I also don’t think the distros care about what tests failed as part of this as they can dig into that later; they just need to know whether there’s an issue to begin with. It’s not like Nox or Tox know why anything failed, just that something did.

I’ll also say it seems to me we’re looping back to only focusing on the test case and not so much general task running.

2 Likes

Pytest supports running unittest suites too. Test suite using purely unittest will usually function fine with pytest as the runner.

My experience has been usually if pure pythonish it’s either pytest or unittest based and pytest works either way. If it’s not pure python then things may get messier and you start to see other stuff appear.