Idea: Introduce/standardize project development scripts

Not at all. I think that none of the pyproject.toml related PEPs had any impact (or minor) on the Python interpreter itself. It’s about bringing a shared convention/standard to the tooling ecosystem around it.

But this doesn’t force you into anything, but here it’s about the ecosystem that would like it and leverage it.

I’ve been working for many companies in the last years, only once devcontainer was a thing and it was for compatibility purpose with a library which wasn’t installable on nowdays OSes. The general case I had was:

  • n projects, all of them with different tooling, different way of running tasks
  • .vscode/tasks.json provided for VSCode users only, PyCharn, neovim, emacs, … users, do what you can (thanks for the VSCode tasks plugins, which by the way perfectly illustrate the benefit of a standard)
  • sometimes a Makefile, some other a Justfile… but everytime was an issue for windows users, or user not allowed to install some tooling on their computer (for just)
  • migration project on a single tool, tasks management in pyproject.toml is a requirement (for discovery and context switch reduction), ends by migrating to pdm or poetry+poe
  • each newcomer onboarding, we had to spend time helping them just running those tasks (I know, a README should be enough, yes, but not everybody is updating them in case of change, so as the CONTRIBUTING, they are not always up to date)

On open-source side, contributing start by reading the CONTRIBUTING.{rst,md}, half of the time, it’s not up to date. When it is, we often fallback on the previous case: platform specific tooling, mega-scripts not allowing to run just one tasks…

The point is not about requiring or expecting tools to exist, the point is about allowing tools willing to share a standard way of describing tasks. In the gist I shared, I compared 4 tools defining the same thing in a slightly different way. Since then, I have seen 3 more, which all have their definitions. So we have 7 redefined ways of expressing the same general concept, and they redefine each time because it’s missing (same as project metadata before PEP 621).
As a consequence, not a single tool tries to integrate those tasks.

If you look at the JavaScript world, all the tools without exception are using package.json scripts:

  • IDE can discover and execute them
  • general tooling document how to add their tool as a script
  • CI jobs accept a script parameter
  • yarn, pnpm, lerna or any project management tool did not redefine tasks, they just use package.json scripts

I can also give a real example, because tox is often coming in the discussion. I use it mostly for testing against multiple Python versions. To do that and avoid duplicating both dependencies and scripts, I use the tox-pdm plugin (I wrote the pdm script integration). Since dependency-groups were officialized with PEP 735, I can use the same dependency declaration because both pdm and tox integrated it (while none of them were forced to). Since then, I only relied on tox-pdm for script integration. There are 2 issues there:

  • to do that, you actually need 1 plugin by package manager/script runner because without a common spec, you have to integrate each task definition model
  • it creates extra maintenance with double upstream changes (tox-pdm needs to follow both pdm and tox changes)

Having an official tasks section definition would allow tox to reuse tasks independently of the package manager used.

Note that the effort duplication on integration is not specific to tox, you can see the same thing for VSCode tasks runners plugins, PyCharm…

On the dependencies’ topic, which I tried not to integrate (and I still think standardizing scripts does not involve their dependencies), I would say the way to go is to reuse as much as possible dependency groups. So I think it should look like one of:

# Only reusing dependency groups
[tasks.my_task]
dependencies = ["group1", "group2"]

# Allowing the full dependency group syntax
[tasks.other_task]
dependencies = [
    "tox", 
    {use = "test"}
]

Personally, I prefer the first one as there is something entirely defined for that, and allowing the full syntax is more complexity and duplication of purpose.

I also, I think for this to work, “running tests” shouldn’t the only use case. In the company I work, we projects have between 10 and 20 tasks, running tests is only one of them and most definitively the special one as people want to run them quickly in integration with their specific IDE test tooling, CI often have a dedicated phase for it… Tests are a special case here and should be the base for specifying what is a development task (and they might need a PEP on their own if testing need to be standardized).
I can provide lots of use cases where tasks don’t involve testing and the operator is not necessarily a developer who has the ability to write full Python scripts (datascience being one of them).

7 Likes

Then please accept that your experience isn’t representative of the range of projects out there, and by making a statement that’s effectively “there isn’t as much variance as the people who say there’s variance claim” that you are directly denying other people’s experience, rather than contributing valuable data points. It’s not helpful.

I have a repository (at work) with build and test scripts for about 40 widely-used projects. About three of them fell into your two categories. The others all have variations.

Agreed. I see this as important for linting, formatting, type checking, etc. whatever other development time processes should be run as part of the “inner loop”[1].

Dependency groups or not is too complex for me to think through. I haven’t needed them, so I’ve never tried to understand them :smiley: But you may beright that that’s going to be the way to specify it, even though I’m not a fan of the indirection (specifying the requirements separately from the command that needs them).

Potentially, though I wouldn’t specify it as tightly as PEP 517 did, or as strictly as pip implemented it (which I personally think is overkill). A task runner can decide how it wants to do it, whether to reuse environments, etc.

But yes, it means that implementing a task runner is a slightly bigger job than moving strings from a TOML file into subprocess.run. That’s fine, there are people who want to do that job.

Right now, every single user has to do that job every time - moving strings from a Markdown file into a terminal - and I have no doubt that many would gladly take a slow tool over whatever existing manual copy-paste process they use.


  1. Treat deployment/releases as the point where you jump off the loop. Everything before that is your inner loop; everything after that (“production”) is the outer loop. ↩︎

1 Like

I’ll repeat what I’ve said before - I’m not against the idea of having a standardised way of writing down the common tasks for a project. But I dispute the idea that the benefits of doing so are obvious.

What I’d really like to see would be a fully-explained example of a case where having a standard would be of significant benefit. There are some claims I’ve seen made which are unconvincing to me - I’m happy to debate my views on these claims, but unless someone changes my mind, these are not “obvious benefits”:

  1. Ease of switching between tools. I don’t think it’s common for people to change what tools they use when working on a project, and even when they do, it’s a significant change in workflow (new subcommands to learn, scripts to update, etc). And I don’t think many projects are willing to support people using arbitrary tools to work on the project - most projects have some form of “developer guide” or “contributor document” which states the supported way of doing things.
  2. Portability. You don’t get portability by standardising how you execute tasks. You get it by writing tasks in a portable way. For us, that probably means “writing a Python script rather than a set of shell commands”.

The one argument that does have some merit is “you’ll be able to set your IDE to run the same commands you do”. But that’s not precisely true. At the moment, I run nox -s test to run my tests. I’d have to change what I run to some_task_runner test (which would run nox -s test under the hood). Yes, if I change my workflow to use some_task_runner, then the benefit applies. And if poetry, pdm, hatch, uv and other workflow tools all supported the new format, then I wouldn’t have to find this new task runner, I’d get my new workflow “for free”. So that’s something, I guess. Is it enough of a benefit? I’m unsure - that’s why I say it’s not obvious.

To be honest, I think this discussion may have run its course. At this point, we need someone to either put together an implementation that is becomes sufficiently popular[1] to justify standardising it as existing practice, or someone to create a PEP that addresses the points made here. Whether that PEP gets approved depends on if it gets consensus (among other things) and the discussion here suggests that isn’t guaranteed, but IMO it’s likely, as long as the PEP is well written and clear about the benefits.


  1. While it’s technically in violation of PEP 518, the task definitions could be in the [tool.RUNNER_TOOL] section of pyproject.toml, with that tool giving explicit permission for other tools interested in supporting the new format to read that section. ↩︎

2 Likes

Who is “us” in that sentence? Some of the examples given for who could benefit from standardizing this is not Python-first tools like Fedora’s testing infrastructure, editors and IDEs, etc. In those worlds a Python script is not necessarily more portable than a shell command.

There’s two points here I think I disagree with. One is you would “have to change”. Why do you think you will be compelled to change things? Just like anything else, switching is a choice. I know you have made the comparison to type checking as being concerned this will be hoisted on to you against your will, but this will only come up from people trying to contribute to your project, not all users which is a much smaller group and one that’s easier to say “no” to.

The other is assuming Nox would run under some hypothetical task runner. If this is standardized why do you think Nox wouldn’t support it natively? In that instance I would assume you would move some things from Nox to pyproject.toml and then call some nox API to do the execution for you, making your CLI experience the same (and even if they didn’t, I’m sure someone would write a package to do that; toss in inline script metadata and you would just have to change nox to <tool-that-knows-inline-script-metadata> noxfile.py).

Just to make the point obvious, we’re at the stage that someone needs to write a PEP as the next step. And don’t worry about a sponsor as there are enough core devs here who are interested that you will get one (just don’t come up with multiple PEPs; if multiple people want to work on this then please coordinate).

4 Likes

It was “Python developers”. I was focusing on the benefits that would be gained by the developer writing the pyproject.toml that contained the task definitions. Sorry for not being clearer.

Distro testing is a different problem as they have stated that they wouldn’t want a test command that ran nox -s test.

And even in an editor/IDE, a command like test.command = ["mv test_data test_data_copy; pytest"] would still be non-portable. Maybe portability isn’t important, but if it is, shell commands are strictly less portable than a Python command (that’s basically my point).

I didn’t mean to suggest you’d be compelled to do so. Just that this is simply another feature that you can choose to adopt, and standardising doesn’t give any “free” benefit. After all, I can already get VS Code to run my tests by using its existing task definition capability. But I don’t want to push back too hard on what you said - my point was that there is some merit to this argument, I’m just not sure it’s enough to justify a standard, so I’d like to either here other benefits, or hear how this is more beneficial than I think it is…

I don’t think that at all. But not all nox sessions are as simple as this proposal would support, so there will be cases where the only way of using this proposal would be to use it to invoke nox, and let nox do the heavy lifting. And if nox did support this proposal, it would need to take this into account, as a user could easily want to have build and test tasks defined in pyproject.toml, while having a docs task defined in Python code in a noxfile and invoked from pyproject.toml as nox -s docs. So yes, I was taking a simplified view, but it wasn’t “nox won’t support this proposal”, it was “how does this proposal support using nox when the nox session is too complex for this proposal to handle natively?”

Edit: For context, my mental image is pip’s noxfile.py, where the test session is way more complicated than I’d want to put in a simple task definition of the form being described here (as are most of the other sessions we have).

4 Likes

FWIW, I’m a big fan of hatch’s environment manager support, both in its configurability in pyproject.toml and its UX for invoking commands. I tend to add a commands like hatch run all or hatch run qa:qa and standardize my own projects on those UX. It’s hard for me to imagine anything better[1].

I think there’s value in exploring standardization here, whether it’s by PEP or convention. It’s nice to be able to walk up to just about any git clone and know how to do some basic stuff, like run the tests, build the docs, build a wheel/sdist, etc. I’m not sure what the top-level command would be, since about the only thing you’re guaranteed to have is python[2]. Maybe @brettcannon 's Python Launcher would be a good candidate for that, especially if it were official, converged with the standard Windows launcher, and you always got it for free.


  1. and I used to advocate for standardizing on tox as a universal front-end, but now I’m glad that didn’t go anywhere :wink: ↩︎

  2. or maybe python3 really ↩︎

1 Like

Here’s a pyproject.toml example from one of my OSS projects, which is very similar to what we generally use at work. So to answer your bullet points:

  • hatch test runs the tests for the default version of Python
  • hatch test --all runs for all supported Pythons
  • hatch test --all --cover runs all the tests and gives me coverage
  • hatch run qa:qa runs ruff lint and formatting checks, and mypy
  • hatch run qa:fix runs ruff’s safe fixers
  • hatch run docs:docs builds the docs
  • hatch run docs:serve runs Sphinx autobuilder (i.e. build and serve the docs)
  • hatch run all does what it says on the tin (runs tests, linting, and builds the docs)
  • hatch build builds the sdists and wheels.

Since I use GitLab on my personal stuff, and an internal instance of GitLab at work, I always basically plumb hatch commands into the .gitlab-ci.yml file. Same with any pre-commit hooks we might use, all in the interest of DRY.

1 Like

I talked about some of my plans at the core dev sprint, but they don’t directly involve this (although I’m sure AI would appreciate the standard instead of having to try and infer everything from READMEs and AGENTS.md).

3 Likes