As the Python packaging ecosystem continues to grow in great ways over the past couple of years, I feel like the need grows for a standardized way to define development tasks and scripts in pyproject.toml to optimize workflows.
A standardized approach would improve onboarding, enable better tool interoperability, and maybe even help integration with IDEs and CI systems.
There is tooling that allows this, but having an entry point that is native—like project-scripts—would alleviate requirements to bring in yet another dependency.
Some prior discussion happened (relatively, in package story terms) briefly a few years back but this is a shot at reviving and seeing through the inclusion of a [project.dev-scripts] entry point that users can use that will not make its way into distributions.
I would be happy to formally bring this into a PEP and codify this if it is accepted.
Even with frequently used tools like tox and nox, I always forget the details of how to invoke them (OK, I don’t forget how to invoke tox, I use it too much for that. But I do regularly forget to pass -s before the session name when using nox).
So I’m in favour of having a way for projects to specify the relevant commands to run for operations like:
a full test run
running linting/autoformatting/typechecking
finishing setting up a local clone (for example, there may be extra steps to fetch git submodules)
setting up a virtual environment containing project dependencies that is suitable for an IDE to use for live linting and typechecking
There may be other operations that are common enough to be given suggested names, in addition to projects being able to define their own names for operations that may not be generally needed, but are relevant to that project.
+1. Would love to see this standardized. There’s definitely a high demand for this feature as evident by various community solutions. Hatch, PDM, thx, taskipy, and poethepoet all offer a way to do this but a lack of standard means there’s no interoperability there. I also agree with @ncoghlan that the standard should recommend task names for very common tasks.
I commonly use Makefiles for this purpose. While I like the idea, it seems like there are a bunch of relevant questions raised by such a proposal.
What is the context in which the commands would be invoked? subprocess.run(shlex.split(command))?
Can I include more than one command in a command? (npm run ... imo sucks specifically because it forces you to chain the commands into one long string), and a theoretical lint dev-script would generally tend to include e.g. ruff check, mypy, pyright, ruff format --check, etc.
does the failure of a command stop execution of later commands?
Is this just standardizing the metadata but leaving all of the execution details to package managers?
would pip implement a pip dev-script foo command?
I guess keeping it limited to a scope similar to that of npm run would maybe cap the complexity ceiling, but i feel like i probably have enough instances of requiring something dynamic (reading an env var, running some python and passing the result to a command) that it probably wouldn’t be able to fully replace Makefiles without becoming a full blown task-runner.
i doubt that running tasks (in development) is a language-specific problem. and toml as grammar is sub-optimal. why not just use a dedicated task runner? it’s exaclty solving the rembering invokation problem and onboarding while providing useful facilities. it is a very good replacement for what make has often been used like.
In principle, this seems like a good idea. However, without a specific proposal for how this would work, it’s hard to say anything definite.
There’s no way of answering that in the absence of a spec, but it feels more like something a workflow manager would provide than an installer, so I’d be inclined to say that pip would not provide a command like that.
When something like this is asked I guess it might be worth considering that maybe it is not really about pip but rather about “the tool that is shipped by default with Python”.
OK, but changing that tool would require a PEP of its own. And if the question is really “will this functionality be provided in a default install of Python?”, then my response would “the only place I can see this being likely to be part of the stdlib is in pip, and <insert my comment about pip here>”.
Edit: Just to be clear, my comment about whether pip would adopt this was simply my personal view as one of the pip maintainers about whether I felt this would be in scope for pip. The other maintainers might disagree, and ultimately it’s something we’ll need to discuss within the project.
But let’s drop this for now. The main point I was making was that we need a spec to say anything meaningful here.
I was bringing up pip because i wasn’t sure if there was any other PEP-defined metadata in the pyproject.toml that wasn’t used by pip/1st-party python tooling. All the [project] metadata is used to produce distributions, installed by pip. the build system stuff is obviously used by pip. I suppose dev-dependencies is not directly relevant to pip in lieu of lockfiles.
It just (maybe) seems somewhat irrelevant to PEP this feature if python’s own tooling doesnt care about it. Consistency across python package managers, I suppose, might be nice. But it feels like some sort of pip acknowledgement of the metadata (even if it were just printing out the available commands) ought to be a requirement of the pep; otherwise I feel like the value of consistency isn’t super obvious to me.
I think you will get strong disagreement on this from active folks in the packaging community who have been (last few years at least) pushing away from the “all python packaging is based on what pip does” mindset, and who would disagree that pip can even be called “python’s own tooling”. There are certainly some features standardized in PEPs that pip does not use or that it supported much after adoption by other tools.
It seems to me like this is a push to standardize a subset of environment manager behaviors, in pyproject.toml, similar to what hatch does. On that premise, the core idea seems nice but the details matter a lot.
My inclination would be that this goes in a non-project table. I don’t know if it’s a strict rule that everything in project maps to published metadata, but it’s a pretty good mental model for what’s in that table.
Suppose I have a project with an existing environment manager, tox.
Should this section list tox invocations or should tox invoke these commands? If the answer is “either way”, that’s okay but then I find it more difficult to understand.
If this standardizes behaviors that tox has today, what does the usage look like once the standard arrives and tox supports it?
What would be contained in this table? Commands, commands with dependency specifications?
Bonus question: how will you distinguish commands which need the project installed, like testsuites for packages, vs commands defined for non-package projects (there is no [project] table), vs commands which don’t depend on the current package like autoformatters?
I think it’s unlikely that pip would directly support this, even assuming it becomes an approved standard and existing workflow tools support it.
Mainly, I don’t see how you can make use of this without some kind of environment management, which strikes me as out of scope for pip.
IMO a better goal here is to allow something like pipx or pip-run to be used.
Having spent a lot of time integrating projects into environments other than the developers’ machines and CI setups, I’d appreciate any suggestions, examples and standardized names to maintain clear separation between:
a test run – if this fails, the software is not functioning as intended
lint run(s), like code formatting or type checking – only interesting to the upstream developers; failures here (e.g. due to a different version of a formatter) usually aren’t even reportable as issues
installation/setup for the above (If PyPI is firewalled off, I want to replace this. Also, I don’t want to install Black/Ruff only to ignore the issues it finds.)
a “CI” run that combines all of the above, for convenience
Agreed. As far as I know, all of the existing workflow manager tools already have some form of script definition capability (with the exception of uv and they will probably have one by the time I finish writing this post ). So this proposal is essentially about providing a standardised replacement for those capabilities. And as such, the key question is whether the new standard is good enough for PDM, Hatch, Poetry, tox, etc, to switch from their custom mechanism to the new standard.
The point of standardising this would therefore be
To give everyone a common means of specifying scripts (easier to document, teach, remember, etc.)
To allow easier switching from one workflow tool to another.
To allow standalone tools like pipx or pip-run, or a dedicated tool, to support the same scripts as workflow managers do.
to give downstream repackagers (conda, Linux distros, etc) a standardised way to invoke the automated test suite on the repackaged version to make sure the repackaging hasn’t broken anything.
(This is arguably a special case of “standalone tools”, but I think it’s noteworthy enough to give it a dedicated entry)
Repackaging test execution is entirely manual at the moment, so this is one of the biggest weaknesses in the automated repackaging tools. (We got rid of direct execution of setup.py for a lot of good reasons, but the old ./setup.py test convention also going away was a genuine downside of doing that)
I suspect the starting point would be to define a top level [project-dev] table with its own requires entry that specifies what is needed to run the dev commands.
That way we wouldn’t be trying to replicate everything tox/nox/pdm/poetry/hatch/etc can do, we’d just be trying to define a way for a project to specify which of those tools they use, and how the tool should be invoked for particular common operations.
I agree with this. Standardization is nice, but it’s not clear that a standard is helpful in this case, or even possible, and I think I’d prefer to let new tools innovate on solutions rather than try to constrain what’s possible.
A really important use-case to consider here is all of the projects that are a mix of languages beyond python–if the standards can’t handle those projects then they can’t become fully standard in a way that may just add additional confusion to the ecosystem.
I think I agree if it’s meant to be a mini environment manager spec, which tells you how to invoke nox. If it’s instead some standard data for “this is what nox should invoke”, it makes more sense to me. But I see problems in either direction:
if it’s data for nox / tox / hatch to use, what about the customizations and arg passing features of those tools?
if it’s data about how to invoke those tools, don’t we need an environment manager to even consume it?
With PEP 723 support now available in (at least) pipx, pip-run, uv and hatch, development scripts can pretty easily just be written as Python scripts, with dependencies declared inline if needed. And if all you want to do is run a command,
subprocess.run("some command here", shell=True)
is pretty simple.
So I think that the first question should be, why isn’t that sufficient? People do use task runners like make or just, or they use IDE features like VS Code’s “Run task” feature. And requests like this do pop up. So I don’t think we can ignore the fact that there’s something about writing helper Python scripts that isn’t sufficiently ergonomic. But I’ll be honest, I’m not sure what it is (even though I’m one of the people who never writes such scripts…)
My own projects do tend to end up with a “misc” folder full of utility scripts. There’s no common convention even to the level of what setup.py test used to offer, though.
“How to invoke tools” is certainly the direction I was thinking, including platform specific options like make targets and a folder full of shell scripts.
I genuinely don’t know if we could get something along those lines into a fully coherent form, but even something like a project-dev.suggestions table (indicating that the listed commands are just one way of working rather than the only way) that says “run command X from relative folder Y to achieve task Z” might be better than the status quo of “no machine readable hints for common project commands like relocking dependencies, running the test suite, building documentation, etc”.
For Python dependencies, since dependency groups are now standardised, each dev task suggestion could also name dependency groups to install if any are needed to launch the command.
that was where my mind was going also, not that that’s necessarily a PEP ought to do. When I go define a Makefile in one of my projects, it’s going to contain make lock install test lint format build publish docs or perhaps other project-specific ones. It’s not that I personally need those make commands to know how to operate my project, it’s so that (in order of descending importance)
contributors/teammates can arrive at a new codebase and be aware of the various project-specific commands that might need to be run, particularly for less often performed tasks (changelog generation, say)
I can put make install and make lint in CI and know that I can locally run the exact commands CI is going to run. I could see this being useful for more standardized CI/platform tooling.
It’s otherwise annoying to run ruff src then ruff format --check src then mypy src then pyright src etc in sequence versus a single command
I really feel like unless the interface through which this is can be invoked is tool agnostic (e.g. pip), then there’s kind of no point. $tool run lint (or whatever command name) might already be taken, across the various tools that might want to implement this. I assume uv might want to do a lot more than just subprocess.run (lock, sync, “activate” venv, run). And if the command is ultimately still project specific, i still need to go searching for what magic command to invoke to tell me what’s available (whereas a Makefile is self-evident).
pip <run,scripts,commands,whatever> --listminimally seems required, otherwise I dont see any value over just using the tool-specific script mechanism, if it exists, because you ultimately still need to know what tool the project is using and how to use it, in order to do anything.
Whereas complex dependencies and whatever else concern me much less because anything dynamic seems like it could just revert back to being bash/python scripts that you define somewhere and invoke through this interface.