Idea: Introduce/standardize project development scripts

steve.dower · October 16, 2024, 5:07pm

This idea is clearly never going to go away (nor should it, because reading this stuff from pyproject.toml is better than reading it from .github/workflows/*.yml).

However, while wearing multiple of my hats (including IDE developer and security reviewer), I would really like to see it be based around dedicated interfaces into those tools, rather than arbitrary commands. PEP 518 is a perfectly good inspiration for a “tasks” table:

[tasks.<arbitrary name, probably "test" in this example>]
requires = ["pytest", "pytest-asyncio", ...]
run = "pytest.task"
arguments = ["-x", "--asyncio-mode=strict", "./tests"]

[tasks.lint]
requires = ["pylint"]
run = "pylint.task"
arguments = ["--config", "lint.cfg"] # I'm inventing args here

Where whatever command to run the task will import some interface from a module that it’s installed. The front-end tool can decide whether to use isolated environments or not, but specifying the package names allows it the option, whereas specifying arbitrary commands are forced to use whatever they use.

We’d have to map out exactly how pytest.task turns into something importable and then callable, but the point is that we aren’t reliant on PATH or system configuration (unless the task backend chooses to just do subprocess.call(), which it totally could).

pf_moore · October 16, 2024, 5:44pm

Hatch, PDM and Poetry all have theor own version of this. If we want to standardise, we should probably create something that they could use to replace their current mechanism. So IMO the best starting point would be to research how those tools configure scripts/tasks, and base any design on that.

ngoldbaum · October 16, 2024, 5:50pm

I don’t think spin has been mentioned yet in this discussion. It’s yet another tool for automating this sort of thing that is gaining some traction in the scientific python community. Just want to raise it because it’s being actively used already by a few popular projects like NumPy and scikit-learn.

steve.dower · October 16, 2024, 6:47pm

Oh that’s nice, I like that.

Ultimately though, the problem isn’t having a nice implementation. It’s having something “official” so that people who want to use “standard” tools don’t have to make a choice (yes, I’m being a bit facetious, but after watching this play out for over ten years, I think not unfairly). We can invent as many nice tools as we like, but the demands for the One True Tool will continue until it exists, and if it happens to be weaker than all the rest, it won’t matter (though it’ll be a shame).

ncoghlan · October 17, 2024, 4:53am

If we went down this path, then it would make sense to use entry point object references

The defined callable signature for task runners could then be something like:

    def task_runner(task_name:str, task_details: Mapping[str:Any], project_dir: str) -> int:
        # `task_name` is the name of the `tasks` table entry in `pyproject.toml`
        # `task_details` is the contents of the `tasks` table entry
        # `project_dir` is the folder containing `pyproject.toml`.

        # If the task runner needs info from outside the task entry
        # (e.g. from the `tool` table), it can reopen and read the
        # whole `pyproject.toml` file from the `project_dir` folder.
        # It can also read tool-specific config files (such as `tox.ini`).

        ... # actually execute the task
        return return_code # or just raise an exception for a non-zero return

That way tools like tox, nox, etc could define a generic task runner that mapped operations to their existing config (following hypothetical examples are based on my current project, which uses pdm for dependency management and tox for task execution):

[tasks.lint]
requires = ["tox", "tox-pdm"]
run = "tox.pytask"  # Automatically invokes `tox -e lint`

[tasks.typecheck]
requires = ["tox", "tox-pdm"]
run = "tox.pytask"  # Automatically invokes `tox -e typecheck`

There should be a reserved key that holds the task config details in a runner-defined format that the task runner understands. config would probably work for that purpose.

To avoid boilerplate, we could allow defining a default task runner that would be used if a task definition didn’t include a run target reference:

[task-runner]
requires = ["tox", "tox-pdm"]
run = "tox.pytask" # Maps task names to tox environments

Non-Python runners (such as make or a bunch of helper scripts in the project repo) would need a Python shim package to generate the appropriate subprocess invocations.

ofek · October 17, 2024, 7:34pm

Quickly writing some notes because I don’t have much time.

I’m extremely against such proposals because what Steve says here is the actual issue. People aren’t asking, not really, for interoperability but rather a single tool to do environment management. This is an attempt to standardize an implementation and UX rather than what we have historically used standardization to do which is give more freedom to tools and by extension their users.

I’ll repost my agreement with @bernatgabor’s point of view:

Providing a way to specify how to run tests (and docs?)

Providing a way to specify how to run tests (and docs?)

Providing a way to specify how to run tests (and docs?)

However, it runs into the issue discussed above—there’s conflation between two levels of the stack, where either a testing tool or a task runner may be invoked here.

Hence why I don’t like it that much. If the user sets up a testing (or documentation) tool here it easily can fail downstream or on another machine because you never addressed all the other factors at play

Providing a way to specify how to run tests (and docs?)

I think it’s similarly important that whatever we come up can live together with task runners and not cannibalize it. It would be a bad place where some of your test setup/teardown logic is in the tasks section and the rest in nox/tox configuration files (ini, toml or python file).

I completely agree with Bernát, perhaps in part due to both of us maintaining such a tool.

I think what most posters in this conversation are missing is that tox, hatch, nox, etc. should not be thought of as task runners but rather as environment managers, which is totally different and far more complex.

Through that lens, what seems to be happening here is distributions like Fedora & Conda want a universal way to map the config of such managers (since most projects use one) to their own build system’s format.

As such, I’m quite against standardization on this one. Perhaps tox and the like could offer a command that outputs the JSON config of the default or base environment that distributions could consume and translate to their liking.

Here he is expressing the reality of what people are asking with examples, and then again elsewhere, and then right below that another maintainer of a different tool Poe expresses the infeasibility.

The concept of defining an interface for granular functionality (e.g. testing) has been all but rejected because there is no maintainer/tooling buy-in for technical reasons:

That comment from the maintainer I mentioned above expressed a similar idea to Bernát’s here. Basically, the only concrete way that makes sense (although I personally have doubts still) is to standardize interactions with runners i.e. the highest abstraction possible.

I mention it in passing here but I want to be more explicit now that I’ve had time to think. Anything that is not literally Brett’s proposal I would likely be against and never choose to implement in Hatch.

bernatgabor · October 17, 2024, 7:38pm

I think of UV/tox/nox/hatch not as test runners, but instead as orchestrating systems and running commands is part of that orchestration flow.

ncoghlan · October 18, 2024, 3:34am

Agreed. Even with that caveat, though, I think there’s potential merit in a spec that allows standardising the following:

A way for generic multi-project development tools (such as IDEs) to determine what the expected orchestrator for a project is (either the orchestrator itself if it is written in Python, or an interface adaptor for non-Python orchestrators like make, meson, CMake, etc), as well as any additional supporting libraries that may be needed (such as tox, pdm, and tox-pdm for a project that uses both tox and pdm)
A standard way to query the specified task runner to get a list of defined commands and the Python entry point object references to invoke them
A standard way to invoke a defined task based on the information returned from the command query

The bare bones version of that would consist of just a [task-runner] table, intentionally modelled on the way [build-system] works (since we’ve had plenty of positive experience with that approach):

[task-runner]
requires = ["pdm", "tox", "tox-pdm"]
task-backend = "tox.task_runner" # Hypothetical submodule name!

The initial version of the spec could include a single query API:

from os import PathLike
from typing import Mapping

def get_tasks(project_dir: PathLike[str]) -> Mapping[str, TaskSpec]:
    """Return task names and specifications for given project directory"""
    ...

# The `project_dir` is referenced rather than specifically `pyproject.toml`
# because many orchestrators don't use that for task configuration.

# Returns a mapping to make it clear that duplicate task names are
# not permitted

from typing import Any, Callable, TypedDict

class TaskSpec(TypedDict):
    name: str                  # The short name of the task
    description: str           # An explanation of what the task does
    target_ref: str            # Entry point object ref for the task API
    config: Mapping[str, Any]  # Runner-defined task config details
    project_dir: PathLike[str] # Location of project defining the task

# Expected signature for task API targets
# Return value is expected to be usable as a process exit code
type TaskTarget = Callable[[TaskSpec], int]

# `target_ref` is defined for each task to allow task runners freedom
# to choose between returning different target refs for each task or
# a single standard target ref that looks up the task based on its name.
# Either way, the call signature is just to pass the task spec back in.

# `project_dir` is included to allow the `config` to include values that are
# only valid for that project directory. Even `target_ref` may be project
# specific if it depends on how the command is defined in that project.

This keeps us out of the business of trying to define the UX of the individual orchestrators, and purely in the space we want standardisation to occupy: facilitating communication between multiple frontends (in this case, mostly IDEs and downstream system integrator build systems), and multiple backends (in this case, task orchestration and project management tools)

steve.dower · October 18, 2024, 12:13pm

I think this starts getting too deep. I’d rather have each command listed in the pyproject.toml, so it’s possible to list them without having to load any other tools, and allow providing arguments in the task defenition:

[task.lint]
requires = ["pdm", "tox", "tox-pdm"]
task-backend = "tox.task_runner"
args = ["lint"]

[task.test]
requires = ["pdm", "tox", "tox-pdm"]
task-backend = "tox.task_runner"
args = ["test", "./tests"]

This is simple, direct, and makes it easy to mix and match tools for those who’d rather reference (e.g.) pylint, black and pytest directly, rather than having to learn yet another tool that is needed to orchestrate a set of tasks.

Nobody is prevented from installing and using tox/etc. directly, just as PEP 518 doesn’t prevent anyone from using a build backend directly. But for the simple cases we’d like to be able to automate, a few default tasks would be handy (e.g. the Python template for GitHub Actions would suddenly be able to build, test, lint and publish all projects, regardless of tools used, without modification, and an IDE can bind shortcuts or pre-commit actions that work across all your projects).

sirosen · October 18, 2024, 2:09pm

tox has a minversion field which I find very useful on teams for keeping everyone in sync. If you run tox v3 and minversion is set to 4, it will bootstrap a new environment and dispatch your command to the new version. (I bet other tools have this too, but I only know the one.)
I would make supporting such bootstrapping usage a goal. If I can’t eventually expect tox to use the new data, I’ll be writing tox>=4.22.0 in two places forever.

I feel like I’m hearing folks agree at length that we don’t want to go too deep, we don’t want to try to standardize UX, and we want to make sure we leave room for the orchestration tools to do their thing.

Both of these last two posts feature some pretty similar tables, drawing what looks like inspiration from build-system. I like that line of thought, but am concerned that it leads in a direction which tox, nox, hatch are not interested in.

If there are task backends, are there task frontends? If so, what is the equivalent of the build package?

The moment that it becomes possible for a generic tool to be written to build an environment and invoke a task, we’re entering environment manager territory. I don’t think that’s off limits – maybe the day has come to think about a mini environment manager which lives in the stdlib like pip and only supports the current Python?! – but if we approach that topic I want to make sure we do that eyes-wide-open.

pf_moore · October 18, 2024, 2:33pm

This is a very good question. I don’t see this as something pip would implement, for example. (It’s important to remember that pip is an installer, not an environment manager or a workflow orchestration tool).

steve.dower · October 18, 2024, 4:00pm

Yes, I’d expect the existing task front-ends to adopt this interface, e.g. workflow tools, IDEs, and CI systems.

sirosen · October 18, 2024, 5:46pm

I see tox, nox, and hatch as solving four problems at once right now ^[1]:

They define task definition formats (nox is sorta cheating though )
They define task backends which can “do the work” of environment management
They define task frontends which can invoke those backends
They define task frontends^[2] which can examine and present that task definition data

(1) is tox.ini, [tool.hatch.envs] in pyproject.toml, noxfile.py (“just” the data format)
(2) is tox._scary_backend_modules, hatch._also_intense_backend_code, nox._nope_im_scared_of_all_of_these_modules ^[3]
(3) is the CLI interface provided via tox, nox, and hatch (the commands, not the packages) which lets you do stuff.
(4) is the CLI interface provided via tox list, tox config, hatch env show, and so forth.

If we want to only standardize (1), then we’re probably restricted to a strict subset of what tox, hatch, and nox^[4] already support. I’m not clear that there’s value in that if we go with “the minimal subset”. We’ll lose many valuable features that way.

If we standardize (1) with something small plus add a tool to the standard distribution which can do an intentionally very limited version of (2-4), with the note that “if you want more than this, use tox, hatch, nox, or something like that”, I could easily and enthusiastically get onboard.

I think it does solve a real problem, and if (1) is backed by a standard, then, e.g., hatch can support reading from it to be compatible, but still say “you should really use [tool.hatch.envs] if you don’t need to interoperate”.

Imagine such a tool exists, python -m $TOOL.
I can then define a bootstrapping flow alongside simplistic cases:

[task.lint]
requires = ["pre-commit"]
invocation = ["pre-commit", "run", "-a"]
[task.test-all]
requires = ["tox>=4"]
invocation = ["tox", "run-parallel"]

I might never, in practical fact, run tox in this way. python -m $TOOL test-all is not as nice as tox p in my shell, after all.^[5]

But someone new to the project can show up, run python -m $TOOL lint and expect it to do something. The invocation is declared somewhere which will introduce them to the project.

I’ve tried to narrow the standardization effort. There’s no notion of a uniform task backend in this suggestion, nor are any of the CLI interfaces or similar standardized. There’s some standardized data format, and a concrete new tool which uses that data.

Trying to discuss this without talking about introducing a tool which is an ultra simple wrapper over venv, pip, and subprocess (or exec? I think subprocess) feels to me like we’re pushing too much from the outside to unify concepts in tox, hatch, uv, etc. without buy in or a champion amongst their maintainers.

As we all know, their feature-sets are overlapping but not identical. I think we have to enthusiastically embrace the fact that they’ve solved bigger problems than are in-scope for any centralized community effort.

Maybe more? I tried to keep it coarse enough to talk about. ↩︎
I said “frontends” twice, but it’s still two different jobs. ↩︎
I kid, but it’s notable that all of these are the heart of these tools and are where some of the hardest parts of the problem space are. ↩︎
Okay, again, it gets weird with nox with it’s constant ability to cheat! I’m going to stop mentioning it. ↩︎
Also, how do I sync this with tox’s minversion field? ↩︎

bwoodsend · October 18, 2024, 7:02pm

Does anyone else feel that all these layers just make things more and more opaque?

I already avoid contributing to projects that use workflow manager tools like uv/hatch/poetry (or if that’s not an option, circumvent the build system using pip install -e . or PYTHONPATH and installing dependencies manually) because I know I’ll just spend all afternoon trying to figure out how to make otherwise trivial customizations like passing flags to pytest or installing pdbp into the temporary environment or not using the wrong version of Python.

Another layer on top of that doesn’t fill me with eager anticipation.

pf_moore · October 18, 2024, 8:01pm

As a project maintainer, I very much sympathise with the idea that “if you want to contribute to this project, you must run tests/lint/etc in the way described in our contributing guide”. A “standard way to run tests” would be just another way of not running the tests the way I documented, in that context (unless I make it the documented way, which means I now have to use that abstraction layer even though I don’t want to).

I’ve seen far too many cases where someone has raised issues saying “when I run the tests I get a bunch of failures” and it turns out they weren’t running the tests correctly. I feel that the most likely use for a standard interface like this would be to allow distribution maintainers, IDEs, etc., to use one approach for every project. That would therefore result in a significant increase in people “not running tests the way I documented”, and a corresponding pressure on me, as project maintainer, to support that usage.

Ideally, supporting a standard API would be seamless. But it’s still a maintenance cost, and as I said, if I don’t use the standard scripts myself, there’s a risk of bugs creeping in.

I honestly don’t know. I can see some attraction to the idea, but I suspect the underlying motivation is something I wouldn’t be comfortable supporting…

steve.dower · October 21, 2024, 4:47pm

As I said earlier, with my security hat on (and my involvement in multiple tools that would be obvious front-ends), I do not want arbitrary process commands here. If that were to be standardised, it would be blocked by default in Visual Studio and VS Code, in favour of configurations that are easier to detect and control (and yes, I know you could write a backend that does insecure stuff, but it’s far easier to create a safelist for known packages than arbitrary CLI commands).

sirosen · October 21, 2024, 4:53pm

Okay, I missed/misunderstood that but now I think I’m starting to understand.
There’s some particular danger in allowing the pyproject configuration to include commands, but which isn’t present in having tasks defined by the project?

The code owned by the project could obviously do all kinds of horrible things, like a test which simply uses shutil to try to blow away the user’s homedir, and we aren’t going to try to secure against that. And obviously I can setup a task which runs a script in the repo, uses subprocess, etc.

I’m not able to see how this provides an attacker a different avenue to do nasty things via a malicious repo. Is the idea that tasks run off of pyproject.toml config would exist in some other security context (e.g., separate process), vs the python code they eventually invoke?

pf_moore · October 21, 2024, 5:01pm

Agreed. And furthermore, why is this any more problematic than (say) PDM scripts, or task definitions in a package.json for Javascript?

I’m not convinced we should do this, but I don’t think it’s productive to hold Python to unreasonably high standards that other ecosystems don’t consider necessary and then try to argue against this proposal because it can’t meet those standards…

steve.dower · October 21, 2024, 5:04pm

There are two aspects here, and it’s a delicate balance, but assuming a non-malicious repo with arbitrary commands there are either additional vectors or unreasonably complex resolution rules that would apply. Some examples (I don’t want to try to address each one right now, because I don’t have to convince you, I have to convince entire security teams who have been burned so many times that they’re really hard to convince): do we search PATH, cwd, both? Do we install from PyPI/user preference/nowhere? Do we require them to have been installed by a trusted installer? Do we set/clear/censor environment settings? Do we have a separate environment for task dependencies or is it shared?

And it is entirely possible that the tasks may be run in some other security context. A CI system might be set up to run the “lint” task in a separate job from the “test” task. Or a Python-based tool may choose whether to run it in-process or as a subprocess (well, not in the case where it’s an arbitrary command of course, since in-process is impossible).

If anyone asked me about PDM scripts then I’d have said the same thing, probably. Task definitions in other files came earlier, and so we’ve seen how they were misused. We can make the same mistakes if we want, but while they got to claim ignorance, we can’t.

steve.dower · October 21, 2024, 5:06pm

Cross-platform support is the other reason to not use arbitrary commands. And in particular, cross-platform (or cross-shell) argument quoting rules