Idea: Introduce/standardize project development scripts

If I’m following you correctly, you’re saying that the current solutions available don’t meet the security standard which would be needed. We can’t just take some amalgam of rules and behaviors from existing tools, call that “the standard”, and focus on making that work, with room for other tools to extend it.

Either that or we’ve somehow miscommunicated.

In my mind, the starting point is something, crude, simple, and derivative. At its core, nothing more than

if user_command[:2] == ["python", "-m"]:
    command = [sys.executable, "-m"] + user_command[2:]
else:
   command = [
        sys.executable, f"{venv_path}/bin/{user_command[0]}"
    ] + user_command[1:]
subprocess.run(command, ...)

If something like this is strictly off the table on account of the security concerns, then I’m not sure how the existing tools are okay. And if the existing tools are not okay, then I really don’t think we’re ready to try to build a standard yet.


I don’t need convincing as such. I’m pretty ready to trust that someone with more context can see the problems I can’t see. But I want to learn about what I don’t know. :grin:

That makes sense to me. It would probably be useful to enumerate some of the mistakes that you are referring to, if only because I (and possibly others) aren’t seeing them.

If these are the sorts of mistakes you’re meaning, then I’m still a little confused. Yes, of course the environment in which a task is run needs to be carefully defined. But that’s true whether the standard says how you take an argv array and run a command from it, or the standard says you can only run a Python function locatedin a specific way. It’s just that in the latter case, defining how the command is run is made the user’s problem. It feels to me that it’s better to have a standard answer that people can rely on, and not “you need to read the code”. After all, the way I imagine tasks working, there will always be a need to (say) run a git command, or something like that.

OK, that’s a good point. I want to say “just use an argv”, but I concede that even that isn’t portable on Windows (assembling an argv into a CreateProcess call is done by the application, and even saying “it should work like subprocess.run” begs the question of how tools written in Javascript or Rust, for example, can ensure they follow that rule…)

Nevertheless, I still think that if we restrict tasks to Python function calls (or entry points, or anything like that) all we’re doing in practice is making users write their own implementations, and by so doing, making it even harder to be sure that what they are doing is safe.

To put it another way, while I agree we don’t want to make the mistakes other ecosystems made, I’m not sure we want to invent our own, potentially worse, mistakes either :slightly_smiling_face:

1 Like

Yeah, as far as I’m aware, none of the existing solutions (broadly speaking, not specifically those for the Python ecosystem) specify anything more than “run this command”. If they did, they’d likely be better in practice, but still unlikely to meet the level of “we [an IDE] can run this for our user without making them read and approve the command manually first”.

Whereas “load API X from module Y that’s been installed from a package on PyPI” can be determined ahead of time to be trusted enough (by the tool’s choice of criteria, which might include past use in other projects, known package feeds, known tools, etc. we can’t determine these ahead of time). It’s an API that is possible to be defended without needing users to manually intervene each time. And if an API that basically just does subprocess.run() gets approved, then so be it, but at least it’s probably been checked at least once and is attached to a package name on a central repository, rather than whatever the end user happens to have installed/configured that day.

1 Like

Hmm, the conflict of requirements might be too big here. I can see why tools might want a tightly locked down interface like that, but as a user I’d find something where I had to go via a trusted package like that as too restrictive to be worth using. In practice, I’d either go looking for (or write) a package that ran some sort of arbitrary command line for me (defeating the object) or I wouldn’t use this feature (also defeating the object).

So unless there’s some sort of workable compromise between tool needs and user needs, I’m not optimistic that we can come up with a viable standard here :slightly_frowning_face:

4 Likes

If we allow arguments in the specification, then someone can write a tool that just runs the arguments :wink: That tool will probably find itself on a blocklist/warnlist, and potentially in malware scanners soon enough, but it is a pretty straightforward escape hatch (just like the intree hooks for build backends).

I would expect the major code quality/testing tools would provide a main(argv) style interface quickly enough that it wouldn’t be much of an issue. I’m certainly not proposing anything complicated, just that it doesn’t go via shell processing.

1 Like

You mentioned that VSCode would block it by default. Perhaps i’m misunderstanding the difference, but the VSCode will happily run npm scripts, which appear to be able to run arbitrary shell commands, from a quick test. I’m still not certain I see the problem, at least compared to the existing set of arbitrary mechanisms that projects are choosing to author their tasks using.

And to many of the other questions raised, i feel like the most minimal first solution ought to make zero assumptions/actions about the execution environment, including installation of dependencies (e.g. assume you’re already in a VIRTUAL_ENV/correct execution environment). It feels too fraught and likely to conflict with other tools to do much more than run the command.

Certainly i can see the value in being able to execute python so that you can write (your example) better cross platform tasks than arbitrary commands might yield, but also certainly it feels like a lot of cases it’s going to be a lot more obnoxious a restriction.

1 Like

I’m speaking hypothetically, though as I’m member of the security team who reviews the behaviour of VS Code (among other Microsoft products) and makes recommendations on how to avoid risks to our users, I believe my hypotheticals probably carry a bit more weight. If or when it becomes a consideration, it will likely become my advice to the dev team, though I can’t force them into any particular design.

What about storing as an array and checking that no argument refers to a path outside of the working tree?

I also don’t see the concern here if we are considering allowing a new PEP 517-like API which can execute arbitrary code.

It’s the first argument that’s hard to validate. Are we going to restrict it to running scripts/executables that are in the working tree? Is it allowed to do PATH resolution? What about pipx? What about parameter injection or quoting rules that make it difficult to validate?

Basically, passing arguments with Python semantics makes it easy to validate. Passing arbitrary shell arguments for arbitrary shells is near impossible to validate.

The greatest concession I’d make would be to allow subprocess.run to be a “special” backend that lets you provide an arbitrary command (specified as a list, not as a single string). But I’d much rather see tools provide their own backends for being invoked in this context.

See also @hroncok’s proposal:

3 Likes

Hi :wave:

I contributed a lot to pdm scripts handling, made a lot of benchmarking and trials on this topic for the past years (lots of unpublished experiment).
I tried something years ago there, but I never answered back with the result of what I did (basically I had entrypoint based task system working custom branches most common tools) as it was not conclusive.
I wanted to start a new conversation, but I found this thread.
Furthermore, I tried to compare and draft an idea of what could be a base for a standard pyproject.toml task management system. This is something I wanted to share and debate on, so I share it there instead of starting another thread.

There are probably plenty of debatable things, most likely typos, as this also a condensate of multiple notes I have.
I would love to have some feedback on this and take part to this discussion too.

6 Likes

Thanks for posting this. It’s an excellent summary and includes some examples I’ve not seen anywhere else in this discussion. The proposed mechanism looks clean and effective - possibly a little too close to PDM’s design for some people, but I’m fine with that personally.

The problem is that this reads far too much like a design document for a tool feature, rather than a standard. That’s not surprising in one way, project development scripts are a tool feature. But reading your proposal made me realise that’s exactly the problem I have with this whole discussion - we’re trying to design a feature, not develop a standard. And the feature is closely tied to tool user interface in a way that will cause problems for standardisation.

For example, a workflow tool that emphasises security would almost certainly have a big problem with allowing arbitrary shell commands as task definitions. And an environment manager that has ways of associatng environment variables with a given Python environment could quite reasonably be unwilling to support .env files for tasks as well.

This highlights a problem with the whole “standardise what workflow tools do” approach for me. In order to use a workflow tool, you have to buy into all of the decisions that tool makes. You can’t like the task running capabilities but hate how it manages environments. You can’t love its emphais on security, but be frustrated that you can’t just define a task as a shell command. People are very keen to use standards to define a uniform way to handle certain features, but that ignores the fact that there are different tools for a reason - we hear lots of talk of “one standard project management tool” but everyone seems to think that means “everyone will use the tool I like” rather than “I’m going to be forced to use a tool that I hate”.

I’d like to take a step back and understand why we even want to standardise how development scripts are defined. Looking at the “Motivation” section of your document:

  • Fragmentation in the ecosystem. Well, yes, but isn’t that just saying “there are a lot of tools competing in this area”? The fragmentation here is because the workflow tools market is fragmented, which in turn is because people want different things…
  • Lack of standardization. We want to standardise because we don’t have a standard? This doesn’t explain why we need a standard, though.
  • Reproducibility challenges. If reproducibility matters, require that everyone uses the same workflow tool. Again, this is simply a consequence of people having different preferences for what tool they use.
  • Tool migration overhead. This is often brought up, but is it honestly an issue? How often do people change what tool they use? And if they do, is the syntax for defining tasks really the big problem? Surely learning new commands and understanding the design differences that made you want to change in the first place are far more work than simply transcribing task definitions to a new format?
  • Limited interoperability. This one is the key for me, as it’s explicitly about interoperability. But for this, we need to work at a lower level - this came up in the thread about test dependencies where it was pointed out that a “test” task that says “run nox -s test” is useless for interoperability because (for example) Linux distros don’t want to depend on nox.

So of these, “interoperability” is the one that matters (to me, at least). But what are the use cases? The average user doesn’t care - they will simply run pdm run test or nox -s test, or whatever the project documentation says to type to run the test task. They have to know what to do to make the pdm/nox command available, but that’s part of setting up the development environment. The proposed solution doesn’t offer environment management, so you can’t write a command that runs the lower level pytest without the user needing to set up an environment (and know what to install in it). Linux distros have expressed a need to not use the project’s preferred workflow tool, and hit exactly this problem - without knowing how to set up the test environment, knowing “how to run tests” is useless.

Conversely, it’s possible today to write a test.py script, with script metadata defining the needed dependencies, which can be run to invoke the tests.

# /// script
# dependencies = ["pytest"]
# ///
import subprocess
subprocess.run(["pytest"])

Why is that not sufficient to define an interoperability standard - “Tests must be runnable by invoking a test.py script in the project root directory”? Or if you prefer, “projects MAY have a scripts directory at the top level, containing a set of Python scripts to run tasks, the available tasks can be determined by listing the directory - a task ‘test’ is defined in ‘test.py’”.

Remember - the goal here is interoperability, not “a friendly UI”.

I think we need to understand the use cases, if only to get a clear understanding of why the “named script” approach isn’t acceptable. I’m not pretending it is - it’s a strawman - but I genuinely don’t know what problems it has that the other proposed solutions don’t also have.

8 Likes

Thanks for the feedback :pray:

Let’s start with the “Why do we need a standard ?”: interoperability as you highlighted, is the key. I would go further and say user experience. And I can give some example:

  • as a newcomer on a project, I want a well-known and quick way to discover possible actions on a project (dev tasks, sample script, data science operations, whatever…)
  • as a developer, I want my IDE being able to discover and use the same tasks
  • as a DevOps, I want to be able to build a Python pipeline without having to handle a diversity of package managers and script runners
  • as a maintainer, I don’t want to rewrite all tasks definitions just because I change the package manager
  • as a library developer, I would like to be able to reference existing development tasks to be run in tox against multiple python versions without duplicating the task definition itself (same goes for nox)

And I would take the npm.scripts case as an example of what we can expect from it:

  • whether you use npm, yarn or pnpm, scripts will just work the same from the same definitions
  • migrating from one of those tool to another is almost free
  • whatever IDE I use, tasks are being autodetected and easily runnable
  • most reusable CI actions can launch scripts as hooks

Note that my proposal goes way further as I used it a lot and I find npm.scripts useful but way too limited.

I think that the goal is not to provide an interoperable execution environment (the spec does not explain how you install dependencies, or these kind of things), but an interoperable way of:

  • discovering tasks
  • execute them in a preexisting environment

Setting the environment is another topic to me, there can be so many ways.
This is the reason why I did not include the notion of dependencies there.
It is possible to add a dependencies property having the same syntax as a dependency-group so it can reuse existing dependency groups or specific dependencies, but I think this deserves another discussion/PEP.

possibly a little too close to PDM’s design for some people

Actually I also used a lot poe as reference as it is I think the declarative-based tool which is bundling the most features.
But no wonder it is close to pdm: I contributed to this specific part, and pdm allowed me to already try and implement some of the ideas of this document. So it goes a bit both ways, it’s close to pdm but pdm is also close from what I propose (and I actually proposed to 2 days ago an evolution going in the same direction: Proposal: scripts and hooks improvements · pdm-project/pdm · Discussion #3344 · GitHub).

In order to use a workflow tool, you have to buy into all of the decisions that tool makes. You can’t like the task running capabilities but hate how it manages environments.

I don’t think it is about like or hate, but actually the hole point of lots of PEPs:

  • PEP 621 gives project metadata specification without explaining how to process them in a package
  • PEP 621 and PEP 631 giving dependencies specification without saying how to install them
  • PEP 735 not saying how to install dependency groups but giving guidelines and format specification

Each time, I think it is sane for those specifications not to step on these topics as it is either another PEP focus on that, either a responsibility delegated to the tools using this PEP.

But in each one, they tried to handle most if not all known cases.
But this is also for that I tried and compared many tools in the document, to have the wider vision possible of what is needed, what is used and what is working (even if I know I don’t have all the tools neither all their usage relying on details nor specific features, and I saw at least 2 tools in this thread that I didn’t test and I am going to).

I want to say spec is not the limit, but at least it provides a common ground to work on.

PEP 621 defined the tool.* usage, maybe there should be a similar convention for tasks like “tools can extend, but they should prefix their keys with <tool>- (poe-*, pdm-*, …)”

For example, a workflow tool that emphasises security would almost certainly have a big problem with allowing arbitrary shell commands as task definitions. And an environment manager that has ways of associatng environment variables with a given Python environment could quite reasonably be unwilling to support .env files for tasks as well.

In my opinion, what you describe there is the environment of execution and not the purpose of this discussion (at least not in my vision).
The specs should provide a common ground for describing tasks, and it’s up to each tool to chose what it uses.
Ex: You think that shell, call, env or envfile are not secure enough for the scope of your tool, just ignore them. The spec doesn’t say that runners should use all the properties, documenting that for security purpose only cmd tasks will be executed or env will be ignored by a tool doesn’t invalidate their possibility in a standard. Users complies with the tool they use, while tools follow available standards when possible.
It doesn’t mean that the spec should ignore security, actually it should warn of the known possible security issues (spawning a shell script, sourcing environment variable will always be a security somehow, it doesn’t mean it should never be used, but user should be warn and aware of the risks)

I think we need to understand the use cases, if only to get a clear understanding of why the “named script” approach isn’t acceptable.

I see at least 4 reasons:

  • lack of interoperability with non-python tooling (each one need to have a parser for this format, while toml is well known) (ex: I rely a lot on being able to query toml from scripts without extra dependencies)
  • fragmentation: it’s not possible to a have a list with details of the possible operations without opening each file to read the descriptions, while I can scroll the same pyproject.toml to have all known tasks
  • overhead: pyproject.toml will be most of the time already loaded for multiple reason and the reading process just have to perform what’s in the task definition to run a task, while the name script approach impose de-facto to run a wrapping python process. A listing of tasks has to list directory content or glob it, and parse each one
  • entry cost: while this kind of script is great for complex tasks, if I just want to create 2 tasks test = "pytest" and doc = "mkdocs build", we already have 2 files to create, with some boilerplate to write (importing and using subprocess is already development, I can’t say to a translator or a designer not really knowing Python “Just add you script calling your tools”, I’ll need to do it for them if I want it to be able to pass arguments which involves passing through sys.argv)

The self-contained scripts approach is great (PEP 722 or any other from), but it’s for single-file scripts as opposed as pyproject.toml which is a project descriptor. So I think that from the start both specs are going in different directions.
But I think a goal of this standard is to be able to reference these scripts.
Maybe this standard needs a bridge toward PEP 722 as sound as it handle dependencies, like a script property being able to extract metadata from it, but again, I voluntary exclude the dependencies topic.

Remember - the goal here is interoperability, not “a friendly UI”.

I think one implies the other. We don’t do interoperability for itself but for tooling being able to benefit from it, and one of the first and direct consequences of interoperability is end-user being able to use whatever tool they want (the one they think as “a friendly UI”) while having this same tool being able to better integrate (and so have this “friendly UI” more functional so more friendly :slight_smile: )

This is why I wanted to talk about user experience at the start of this post, and this is what drove me initially to this topic years ago.

4 Likes

Thanks for that summary @noirbizarre.

I recently contributed to a project that did use the “scripts in a folder” approach (invoking poetry run ... as necessary). So I think it would be reasonable to treat an npm.scripts style solution (with a command string to be executed) as the baseline capability since:

  • it can wrap the “run” commands of existing environment managers
  • it can call scripts with inline dependency declarations

A task-runner table to declare the external dependency for the first case may still be useful, but an in-process backend API wouldn’t technically be necessary if we only supported tasks statically listed in pyproject.toml.

Where a backend API would provide a benefit is as a way of bridging different task declaration mechanisms, and it may be better to instead define that as “workflow tools should automate syncing the static list of CLI tasks based on the native task definitions” rather than as something that is done dynamically when invoking the commands.

I use a lot of “scripts in a folder” too, but same as what you describe:

  • I reference each one of them as a task in pyproject.toml (tool.pdm.scripts in my case)
  • most of the time they are mixed with inline task definitions.

Maybe being able to give one or more glob patterns to explicitly list used scripts can help. Something like tasks.scripts = "scripts/*.py". However, it would be different from the call approach, maybe complementary as the glob approach means 1 file == 1 script while the call approach allows having multiple scripts in a single file as you reference a callable.

For the backend “task runner delegation”, I really like the use keyword used in dependency groups. I reused the same syntax keyword in tasks composition.
So maybe we can have a root tasks.use mutually exclusive with tasks definitions. If present, it means tasks definition/handling is entirely delegated to another tool.
I see 2 possible approaches for it’s content:

  • either a command entry and everything is forwarded to this command
  • either an entrypoint, can be an existing scripts entrypoint or something new entrypoint if just being runnable is not enough

Writing a Python script just to invoke a third-party command really feels like an antipattern (and a slight waste of time).

Well, as you say, people like to use “well known” ways of invoking tests, etc. Not project-specific scripts that they have to learn about and remember every time they make an occasional encounter with your project.

(this is part of why ./configure && make && make test has been so successful in the past, despite being based on absolutely horrid tooling)

3 Likes

Not once you find you need it to be cross-platform, or to handle the command being from different locations, or any variety of things that complicate it.

My experience has been that you’re very lucky to get away with less than 20 lines for something like this (and I’m largely limited to Windows and one Linux distro) :smiley:

1 Like

I’m not sure what this is responding to, but the alternative AFAICT is between a per-project Python helper script or a standardized test script configuration in pyproject.toml.

Then you can write a 20 lines Python script. For many users, this will just be pytest --pyargs mymodule.tests, though. And being able to put this directly in a pyproject.toml entry would be convenient both for library authors and for downstream users/redistributors of the library.

I don’t see why this needs or even benefits from standardizing. Are we proposing that the python interpreter itself should learn how to run these? If not, then I don’t see why the tool table can’t be used by the runner of choice, or that projects might use other things like justfiles (Which are really just a modernization of makefiles that’s relatively popular because of how simple it is)

1 Like

So that generic front-end tools such as IDEs or CI systems can run your tests without having to be told in their own individual formats. It means you could open any repo in your IDE and hit “Run tests” and get what the author intended.

I was responding to the bit I quoted, where you suggested that writing a script just to call a tool was a waste of time. In my experience, it’s not a waste of time, because the “just to call a tool” usually turns out to be much more complicated than a single command is going to handle.

1 Like