If I’m following you correctly, you’re saying that the current solutions available don’t meet the security standard which would be needed. We can’t just take some amalgam of rules and behaviors from existing tools, call that “the standard”, and focus on making that work, with room for other tools to extend it.
Either that or we’ve somehow miscommunicated.
In my mind, the starting point is something, crude, simple, and derivative. At its core, nothing more than
If something like this is strictly off the table on account of the security concerns, then I’m not sure how the existing tools are okay. And if the existing tools are not okay, then I really don’t think we’re ready to try to build a standard yet.
I don’t need convincing as such. I’m pretty ready to trust that someone with more context can see the problems I can’t see. But I want to learn about what I don’t know.
That makes sense to me. It would probably be useful to enumerate some of the mistakes that you are referring to, if only because I (and possibly others) aren’t seeing them.
If these are the sorts of mistakes you’re meaning, then I’m still a little confused. Yes, of course the environment in which a task is run needs to be carefully defined. But that’s true whether the standard says how you take an argv array and run a command from it, or the standard says you can only run a Python function locatedin a specific way. It’s just that in the latter case, defining how the command is run is made the user’s problem. It feels to me that it’s better to have a standard answer that people can rely on, and not “you need to read the code”. After all, the way I imagine tasks working, there will always be a need to (say) run a git command, or something like that.
OK, that’s a good point. I want to say “just use an argv”, but I concede that even that isn’t portable on Windows (assembling an argv into a CreateProcess call is done by the application, and even saying “it should work like subprocess.run” begs the question of how tools written in Javascript or Rust, for example, can ensure they follow that rule…)
Nevertheless, I still think that if we restrict tasks to Python function calls (or entry points, or anything like that) all we’re doing in practice is making users write their own implementations, and by so doing, making it even harder to be sure that what they are doing is safe.
To put it another way, while I agree we don’t want to make the mistakes other ecosystems made, I’m not sure we want to invent our own, potentially worse, mistakes either
Yeah, as far as I’m aware, none of the existing solutions (broadly speaking, not specifically those for the Python ecosystem) specify anything more than “run this command”. If they did, they’d likely be better in practice, but still unlikely to meet the level of “we [an IDE] can run this for our user without making them read and approve the command manually first”.
Whereas “load API X from module Y that’s been installed from a package on PyPI” can be determined ahead of time to be trusted enough (by the tool’s choice of criteria, which might include past use in other projects, known package feeds, known tools, etc. we can’t determine these ahead of time). It’s an API that is possible to be defended without needing users to manually intervene each time. And if an API that basically just does subprocess.run() gets approved, then so be it, but at least it’s probably been checked at least once and is attached to a package name on a central repository, rather than whatever the end user happens to have installed/configured that day.
Hmm, the conflict of requirements might be too big here. I can see why tools might want a tightly locked down interface like that, but as a user I’d find something where I had to go via a trusted package like that as too restrictive to be worth using. In practice, I’d either go looking for (or write) a package that ran some sort of arbitrary command line for me (defeating the object) or I wouldn’t use this feature (also defeating the object).
So unless there’s some sort of workable compromise between tool needs and user needs, I’m not optimistic that we can come up with a viable standard here
If we allow arguments in the specification, then someone can write a tool that just runs the arguments That tool will probably find itself on a blocklist/warnlist, and potentially in malware scanners soon enough, but it is a pretty straightforward escape hatch (just like the intree hooks for build backends).
I would expect the major code quality/testing tools would provide a main(argv) style interface quickly enough that it wouldn’t be much of an issue. I’m certainly not proposing anything complicated, just that it doesn’t go via shell processing.
You mentioned that VSCode would block it by default. Perhaps i’m misunderstanding the difference, but the VSCode will happily run npm scripts, which appear to be able to run arbitrary shell commands, from a quick test. I’m still not certain I see the problem, at least compared to the existing set of arbitrary mechanisms that projects are choosing to author their tasks using.
And to many of the other questions raised, i feel like the most minimal first solution ought to make zero assumptions/actions about the execution environment, including installation of dependencies (e.g. assume you’re already in a VIRTUAL_ENV/correct execution environment). It feels too fraught and likely to conflict with other tools to do much more than run the command.
Certainly i can see the value in being able to execute python so that you can write (your example) better cross platform tasks than arbitrary commands might yield, but also certainly it feels like a lot of cases it’s going to be a lot more obnoxious a restriction.
I’m speaking hypothetically, though as I’m member of the security team who reviews the behaviour of VS Code (among other Microsoft products) and makes recommendations on how to avoid risks to our users, I believe my hypotheticals probably carry a bit more weight. If or when it becomes a consideration, it will likely become my advice to the dev team, though I can’t force them into any particular design.
It’s the first argument that’s hard to validate. Are we going to restrict it to running scripts/executables that are in the working tree? Is it allowed to do PATH resolution? What about pipx? What about parameter injection or quoting rules that make it difficult to validate?
Basically, passing arguments with Python semantics makes it easy to validate. Passing arbitrary shell arguments for arbitrary shells is near impossible to validate.
The greatest concession I’d make would be to allow subprocess.run to be a “special” backend that lets you provide an arbitrary command (specified as a list, not as a single string). But I’d much rather see tools provide their own backends for being invoked in this context.