PEP 722: Dependency specification for single-file scripts

jamestwebber · July 30, 2023, 8:47pm

This part has been nagging at me for a while now but I wasn’t sure how to put it. To me, the current proposal doesn’t totally solve the “single file” requirement for distributing scripts because the user still needs the correct python version installed. They also need to install a third-party tool, but this could change in the future with e.g. pip run.

The colleagues I’m most likely to share a single-file script with are the sort who don’t know the difference between their MacOS system Python and a conda env. So this removes one step from the setup process but there are still a few remaining, and I’m not sure this one is our biggest problem.

That said, it’s clear that other people have quite different workflows and this might work in other environments (like, a place where everyone has a consistent user-space python installed on their machine)

kknechtel · July 30, 2023, 10:49pm

It doesn’t need to be. However, as I understand it, the motivating idea here is that distributing a single file is convenient in ways that distributing more than one file at a time isn’t; and the goal is therefore to provide a way to distribute single files (and have them “just work” on the receiving end, as long as there is a compatible runner in place) without an explicit packaging step.

FWIW, I proposed a TOML-in-comments approach earlier. It has the advantage that it doesn’t really require any discussion or standardization; it only needs the existence of a tool (or integration into an existing script runner) to detect that comment and make a separate file from it; then all existing tooling works normally. However, it’s noticeably less convenient to write, and implementing the “name and version are optional/inferred” functionality requires some design.

pradyunsg · July 31, 2023, 12:35am

(I definitely don’t want this to be an extended side-track – if someone wants to talk about what PEPs are and how subjective they should be, let’s split that off into a separate thread)

PEPs are ~always subjective + full of “judgement calls” made by the authors. There are ~always tradeoffs in the designs worth putting into PEPs, and the PEP is a proposal for how to deal with the problem at hand.

The question is whether we think it’s overall beneficial given the tradeoffs (with the specific design being proposed based on what the PEP author has written^[1]). Figuring things out and discussing the design details is why we write these PEP-style documents and discuss them.

If someone wants different choices/tradeoffs picked here, they can either make the case for them here (as I did earlier in this thread) or write a competing PEP to cover the usecase^[2]. Usually, explicitly voice your concern to the PEP author is sufficient, given that it’s done publicly (and, isn’t unnecessarily repetitive as many replies in this thread have been ).

How various people feel about this is taken into consideration by the person(s) responsible for making a decision on the PEP (SC or, in our case, the delegates). It’s definitely happened that a decision is deferred because a PEP doesn’t cover the “rejected ideas” sufficiently, though that isn’t a problem here.

The author is supposed to take inputs from the discussion, which Paul certainly has. ↩︎
The social contract is such that doing this can be percieved as confrontational, so be mindful of that. ↩︎

ofek · July 31, 2023, 12:54am

Some notes about that option that I couldn’t edit because the thread is in slow mode:

now that I think about it if that were a thing I would definitely use it
if projects have such a desire for random scripts that would be checked in then existing dependency tooling would work out-of-the-box like version upgrade automation and security scanners
tooling that can add/remove dependencies via CLI would have a clear path to supporting script management also

BrenBarn · July 31, 2023, 4:14am

Sure, and I didn’t mean to suggest otherwise. Just saying that I think some of the perceived pushback or unexpected disagreement has to do with the details of how the PEP proposes to do things, and not a rejection of the idea of single-file scripts or of running/distributing them.

facundo · July 31, 2023, 7:14am

If there is a standard to specify in-script dependencies, fades most probably would follow it. The current specification looks simple and flexible enough.

FRidh · July 31, 2023, 8:23am

Consider FHS and wanting to put a script in */bin, */bin should contain only executables, not non-executable files.

pradyunsg · July 31, 2023, 8:36am

No, it’s not? I’m very curious what makes you say this.

The PEP, at no point, implies that it is meant to be about distributing any script file. Searching for “dist” in the PEP gives me 0 results. The Rationale section has the following sentence…

Having to consider “uses 3rd party libraries” as the break point for moving to a “full scale project” is impractical, so this PEP is designed to allow a project to use external libraries while still remaining as a simple, standalone script.

… which clearly indicates that it’s not about distribution but about usage; and about having a place to put supporting information in a discoverable manner so that running a script becomes easier. It’s even been stated by the author that this isn’t trying to address script file distribution.

pf_moore · July 31, 2023, 12:36pm

As this post is in slow mode, it’s hard for me to respond to individual points here in a manner that keeps individual topics separated, so what I’m going to do is try to pick out some key comments and respond in a single message. However, I will say that I’m currently doing a major overhaul of the PEP to try to incorporate the various comments made here. Just to be clear, the proposal itself is essentially unchanged, but I want to make sure the rationale and motivation are as clear as I can make them, to avoid the misunderstandings that have come up repeatedly in this thread. I also want to make sure the “rejected alternatives” section takes into account the questions that have been raised.

I don’t expect we’re done with discussion yet (although hopefully “slow mode” will allow people to take more time to think before posting - I know I’m taking advantage of that) so I doubt the next revision of the PEP will be the final form, but hopefully it will be a lot closer.

OK, so onto specific points.

This is very much the case, and honestly, the confusion is my fault, because my motivation has always been the “better batch files” scenario. I only think of distribution in the sense that I’d email a .bat file to someone (or post it in a gist) and I’d like to email a .py file in the same way. But I didn’t make that at all clear in the PEP. I’m still struggling to explain that motivation well, as I also want to emphasize the fact that this is simply standardising existing practice, and not proposing any new functionality. The two explanations are difficult to combine without then giving the impression that I expect to see a huge explosion in the use of tools like pip-run (something else that people have misread into the existing PEP…) So yeah, this is a work in progress right now.

This comment (and variations of it) has come up a lot, and it seems like a huge exaggeration. I’m genuinely baffled as to why people think this PEP is going to have such a massive impact on the ecosystem. To me, it’s a relatively minor tidy-up, and the only way it would have a big impact is if it suddenly triggers a lot of interest in an important use case that we’ve so far ignored. So the people arguing against the PEP on the basis that it’s going to be disruptive almost look to me like they are saying “please let us continue ignoring this important use case because we don’t want to deal with the consequences”

While I’d like it if we, as a community, chose to work on improving the “python as a better batch file” workflow, I’m also perfectly happy if we continue to leave it to tools to experiment and innovate for now. This PEP is assuming that the latter is what will happen. I have no personal interest in championing something as big and controversial as a PEP to do the former, but I’ll support^[1] anyone who wants to try.

Thank you for this perspective. I’ve hesitated to say this (because I’m not a typical end user) but I also responded to the survey, and I also agree with the “too complicated” idea, but not with the way it’s being used at times to hinder progress. Interpreting results from a survey like the packaging one is a complicated, specialised skill (I’ve witnessed UX specialists doing precisely that, and I know I couldn’t have done what they did) and while I think it’s important that we heed the results of the survey, I also think it’s critical that we don’t simply use the survey to reinforce our own prejudices - I’m sure I could find arguments in support of this PEP by (selectively) quoting the survey, but I don’t think it’s a useful way of making my points.

We’ve had a number of people present reasons why having everything in a single file is a key requirement. I’ll hunt out as many as I can find, and add them to the PEP. But for now, let’s just say I’ll be adding a rejected option to the PEP of “Store the dependency data in a separate file”. If you (or anyone else) want to propose a solution that uses a separate file, I suggest writing it up as an alternative PEP, and in particular going through the points made here and addressing them.

Nobody’s assuming that. There have been specific reasons given. The most obvious one is people moving (or sharing) a script and forgetting about the extra file (and no, “just don’t forget” isn’t a reasonable solution for this). But there’s also the case of directories like /usr/bin where (by convention or design) everything in that directory is treated as an executable. And pipx allows running a file from a URL (such as a gist), where you can’t even necessarily work out where the associated dependency file would be.

I will add one to the PEP. But to a certain extent it boils down to “no existing tools provide this, so the argument that it’s essential are weak, and it can be added later in a different PEP if it turns out that there is a critical need that existing tools don’t address”. That, plus I want to keep the PEP focused on “formalising things that already exist” and not “defining new functionality”.

Projects with “random scripts” can already use nox, tox, hatch, or any one of a plethora of environment managers. Or pipx/pip-run. What extra does this add to that mix? You say “existing dependency tooling would work out of the box” - but they’d still need to support all of the existing approaches. And in any case, tooling could support this approach, but they’d still get asked to support single-file scripts.

I’m not the one arguing that the survey demands fewer solutions, but I’d expect a bunch of people here to raise the same objections to this proposal that they do with PEP 722 in terms of “adding more ways of doing things”. At least I’d hope they do - otherwise I don’t see how they can argue that they are being consistent when objecting to the PEP…

Possibly with a lot of feedback on problems with their proposal, but hey, constructive criticism is good too ↩︎

kknechtel · July 31, 2023, 12:52pm

I think I’ve been misrepresented here, because in fact I agree that your use case has merit.

What I’m concerned about here is that people will see script runners as part of toolchains, and then everyone’s script runner will be expected to do the thing, because the PEP exists. And the point here is - yes, the use case has so far been ignored, and we could solve problems for people be paying attention to it. But if we do it this way, we lose reuse: since the requirements are now not being specified by the “write a pyproject.toml that includes a requirements specification” method, now there needs to be a separate process to parse those requirements.

But then, I need to consider this in the context of what @pradyunsg pointed out to me:

See, this is the part where I get confused. Because if it isn’t about distribution, then I don’t understand what the “important use case” actually is. It sounds like what you’re worried about is that someone writes a single-file script that needs, say, pandas or requests, and then doesn’t want to have to switch to “a full project” because of the need for that requirement - as described:

…But if the point isn’t to keep everything in a single file for the sake of keeping everything in a single file (and the only reason for that which makes sense to me is what I said earlier:)

in this case, I don’t see why there is a problem with using a separate file. I think that “moving to a full scale project” is a canard here, because there isn’t a demand to go from one file to an entire project template; there’s a demand to go from one file to two (the source and a skeleton pyproject.toml that includes project.dependencies entry along with the other minimum requirements).

Unless, perhaps, this is purely meant to work around “I don’t think it should be necessary to fill in anything else just to list dependencies, but previous PEPs mean I can’t have that”?

DavidCEllis · July 31, 2023, 2:23pm

Thanks Paul, it makes more sense from that angle.

I was mostly looking for a better “wrong version” experience than deciphering a SyntaxError or ModuleNotFoundError (alongside potentially not doing any work creating a virtualenv which wasn’t going to be useful). Perhaps if we end up with something like PEP 711 – PyBI: a standard format for distributing Python Binaries | peps.python.org providing interpreter binaries on PyPI it would be worth revisiting as then it may be easier for tools to provide a solution rather than just a nicer error.

h-vetinari · August 1, 2023, 12:53am

Looking at the language survey posted further up, and assuming the dependencies absolutely cannot be specified in a sidecar file (I can see the */bin argument, for example^[1]), can we follow the rust example of basically putting cargo.toml in a special comment?

#! /usr/bin/env rust-script
//! ```cargo
//! [dependencies]
//! leftpad-rs = "1.2.0"
//! ```

As in: make the magic comment a bit less magical by having something like

#!/usr/bin/env python
# In order to run, this script needs the following 3rd party libraries
# [regular comment line started by `#` only, special lines below started by `#!`]
#! ```toml
#! [requirements]
#! requires-python = ">=3.9"
#! dependencies = [
#!     "numpy>=1.22.4",
#!     "requests",
#! ]
#! ```
import numpy as np
import requests
print("Hello world!")

This would make it:

easy to extract (all lines starting with special comment #!)
easy to parse (after extraction it’s just a regular toml file)
consistent with pyproject.toml

There’s again lots of bikeshedding to be had (the special comment, how to embed the toml file, whether to leave out the main table or how to name it (e.g. just reuse [project] even though it’s only a script), etc.), but basically that would address a lot of the concerns about tooling divergence for me personally.

even though I shudder to think about having scripts that go off and install something in there, especially if they happen to be executed with the “wrong” wrapper (e.g. without a virtual env). ↩︎

brettcannon · August 1, 2023, 1:36am

I am literally an “IDE and editor author”, so the answer is yes and I support this PEP (or something like it).

I don’t, and see below as to why …

Because this is about lowering the ceremony around getting a simple script to work that just wants one or two external dependencies to run. Every extra file, extra motion, etc. required to make this use case work waters down the usefulness and increases the leap from “just using the stdlib” to “using one external dependency”. The use case this is meant to tackle are scripts where your testing is, “does the output look right?”, thus you literally may have one, maybe two or three external dependencies. Need to download some CSV file and process it? You probably just want httpx or requests. Need a little data analysis? Might toss in pandas. But you very likely are not going to have a test suite. You are not going to have any CI for this. The point is adding another file at this workload size feels like overkill when it’s a single .py file sitting in some directory with some files you happen to use to process those e.g. CSV files.

The mental exercise I’m doing around this PEP and the various alternative proposals is what would it take to add requests as a requirement to a script? That’s the use case I think we are targeting here and the one I think people should be optimizing for here.

kknechtel · August 1, 2023, 4:46am

Okay. My stance has shifted.

Between these concrete examples of reasons to have a single file:

, and the fact that sometimes these single-file scripts might share a directory (such as, again, /usr/bin plausibly, supposing that the single-file script has a shebang line that explicitly invokes a script runner)

, and this description:

I am on board, and I want the syntax to look like it is in the proposal, not like ordinary requirements TOML.

Rationale: this syntax is clearly easier to work with for the cases described, and doesn’t make impositions that aren’t required outside a packaging context (name and version specifications, in particular).

As I understand the proposal, it’s entirely up to external tools to determine an interpretation for the comment; this just standardizes a format so that it can become a target for tooling.

A script runner might parse it and directly install requirements in a venv.

An IDE could suggest things that belong, based on its static analysis and knowledge of popular third-party libraries (up to the IDE’s discretion). Perhaps it might even have a database of information about deprecations, so it can warn that certain versions are necessary for the functionality requested in the code.

A full development toolchain might offer a migrate command that parses the same block and uses it to generate a skeleton pyproject.toml with dummy name and version.

Because this format describes something more specialized (only the requirements), and that thing doesn’t have any complex structure (it’s just a plain sequence of requirements specifiers), the TOML format is unnecessary overhead for the writer.

However, conditional on the PEP’s acceptance, I would strongly prefer for the specific task of “bootstrapping” pyproject.toml to be implemented by a standard tool that either ships with Python or at least can trivially be bootstrapped into the Scripts directory. This is about as clear of a There should be one-- and preferably only one --obvious way to do it. case as I can imagine. Sometimes projects do take that step, and there’s no good reason not to facilitate doing so.

My preference there is strong enough, actually, that I’d like to volunteer to write such a tool.

Separately, I also like the idea of being able to have other .toml files in the directory that use this format for locally relevant purposes, but the name pyproject.toml is privileged by the packaging ecosystem - i.e., Pip looks for that name, therefore you need to have one to get onto a Pip-compatible package index, and tooling is intended to use it for the process of building a package for such an index. (As such, the tool I describe above should have an option to use a different filename, but default to pyproject.toml.)

thejcannon · August 1, 2023, 12:55pm

I’m not sure it’s quite one-or-the-other. Just like how tools already support the requirements block this PEP proposes, at least one tool already supports import mapping ^[1].

And, bikeshedding on implementation aside, the big pros of scraping imports are that the metadata can’t drift from reality as far (remove an import and you removed a dependency, vs having to remember to remove the line at the top), and less duplication.

In the context of this PR, however, I do think it’s worth a section in the alternatives section that mentions even if such a tool was widespread, the proposal would still exist because …

I’ll try and work on getting it standalone in the next week or so, to prove it’s easy and fast (thanks Rust!) ↩︎

ofek · August 1, 2023, 1:05pm

I would be a soft +1 if the proposal was changed to use this and if we say that the comment MUST be at the end of the file.

pf_moore · August 1, 2023, 1:35pm

I’m -1 on putting the block at the end of the file, so that’s not going to be in the PEP I’m afraid (no reason other than personal preference). If you prefer putting it at the end, you can, but it’ll be user choice, not mandated.

As to the other part of your point, what precisely do you mean by “this”? Presumably not cargo.toml Do you want a full pyproject.toml embedded in the file? What’s the use case? If you need a full pyproject.toml, why wouldn’t you be able to use a project directory? The whole point of this proposal is that it’s aimed at “better batch file” simple scripts, where almost everything in the pyproject.toml makes no sense. If we say that we allow an “embedded pyproject.toml”, we’re bound to get users getting confused because not everything works (“I tried to set up a hatch environment for my script using tool.hatch.envs and hatch doesn’t recognise it”, or “I put a custom ruff config in foo.py and ruff is ignoring it”, or "I embedded the pyproject.toml for mylib.py, and pip won’t build it, …).

The rust example is slightly misleading, because as a compiled language Rust has to do a build (even for running a single-file script) so having a build config file is reasonable.

So what exactly are you proposing we embed here?

epage · August 1, 2023, 4:29pm

For context, I’m the author of Rust eRFC #3424 for integrating single-file package support (including dependencies) into cargo.

The rust-script syntax that was given is for embedding the manifest / Cargo.toml directly into the file by using a markdown code fence in the module’s doc comment. Using #! would not be equivalent in Python but putting it in the module’s doc comment and using … rST? markdown? syntax for a literal block. However, we are concerned about taking that approach because we are wanting allow people to do this with libraries and then we dirty up people’s documentation with implementation details.

My Pre-RFC contains a lot of details for different trade offs for different ways of embedded manifests or lockfiles (more equivalent of this proposal) within files, particularly the Unresolved Questions section.

Some relevant thoughts

Bug reports and educational material are very big use cases in my mind.
We’ve also gotten a lot of feedback about process overhead with having smaller packages, so we are looking at supporting this for libs and supporting publishing them. In Python terms, this would be to have pyproject.toml support
- However, we have a lot more opinionated of a build system which makes it easier to default a bunch of fields to make the syntax overhead is low
Most likely, tooling will want to edit and not just read, so consider that workflow as well

daylinmorgan · August 1, 2023, 8:54pm

I have a similar app in this space, viv (repo), that helps me solve this one-off dependency scenario for python scripts.

If PEP722 is accepted and the format remains simple to parse (without the need for a format parser like toml) I will add support for it as well.

brettcannon · August 1, 2023, 8:56pm

Yes, I think that’s true and your summary of what various tools would do with the information is good.

It is for the PEP as I don’t see Paul suggesting two different formats be simultaneously standardized and supported.