PEP 722: Dependency specification for single-file scripts

brettcannon · July 19, 2023, 11:47pm

Why the blank line and not “a non-comment line”?

On top of the virtual environment need that was pointed out, this also assumes pip is installed. I could very easily see this being used with a virtual environment that lacks pip in order to speed up virtual environment creation and save on disk space (you can do the install externally via --target after environment creation).

That assumes venv has pip available at creation time. I think you definitely should be creating virtual environments as part of this, but I don’t think they need to be made into a single step.

I don’t think it’s that bad (assuming this untested code works):

import tokenize

with tokenize.open(py_file) as file:
    requirements = None
    for line in file:
        if requirements is None and line.strip() == "# Requirements:":
            requirements = []
        elif requirements is not None:
            if line.startswith("#"):
                if comment := line.removeprefix("#").strip():
                    requirements.append(comment)
                    continue
            break

I think that’s a great point! If you’ve muddled your code in that way then you can simply fix it since it’s a single script that you can look at it and is directly under your control. This isn’t going to be buried in some 3rd-party package that’s accidentally causing you issues with this.

NeilGirdhar · July 20, 2023, 1:00am

Totally agree on this point! And I just recently converted my shell scripts into a single Python script with a Typer CLI, and it’s just beautiful.

BrenBarn · July 20, 2023, 2:16am

Ah, okay, it wasn’t clear to me that you were only focused on running scripts “in situ”. Although later you say something that seems a bit different, so I’ll respond to that with another way I sometimes handle this stuff.

For what it’s worth, yeah, I do the same kind of thing and I also find it annoying to deal with, so it would be cool to improve the situation.

Yeah, I certainly do that. Although there’s a bit of catch-22 where because doing that is often so cumbersome (for the reasons that you’re trying to address with this PEP), I’ve been once burned twice shy and more often create directories just to be on the safe side!

However, in your description of how you use these scripts, I see “copying to a different PC” as really a form of distribution, because if you do that you’re transplanting the script into a different environment. Of course, the goal is to insulate the script from that environment by running it some kind of virtual environment, but intrinsically things are going to be different on a different system (most obviously, it could have a different version of Python). So to me it seems like a possible happy medium to say that there may be some kind of minimal install process, but that process may effectively just be a short way to type a series of venv-creation commands. But crucially that install process would be totally “local” in the sense that it wouldn’t be installing anything globally or even in a venv, it would only be working with the files you tell it to work with and with an environment (created by the install itself) just for that.

So let me describe how I handle a similar situation, without the “copy to another computer” part, and then handwave a bit about how maybe something like that could be extended to a barebones script “installer”.

I use conda. I have an everyday “kitchen sink” conda environment where I have installed a bunch of libraries I frequently use. But that environment isn’t activated systemwide, so if I write a standalone script, or if I just have some fragments of stuff that isn’t worthy of even being called a script yet, if I want to run it (often from an editor) I still need to somehow specify that it be run in that environment. And sometimes I’m working on something that needs its own environment (e.g., because for whatever reason it needs a different version of something than what I have in the kitchen sink).

So what I did is I wrote a little glue tool that looks for a line that starts with # conda-env: and expects the name of a conda environment after this. And then I can run this tool on my script and it will figure out the env and then use conda run to run the script in that env.

Now there are some disadvantages to this relative to something along the lines of the proposed PEP:

there’s nothing that actually declares or knows the dependencies, they’re just assumed to be in the specified environment
the script depends on the environment having a particular name

However there are some advantages too:

the script only has to be “embellished” with a single line referring to the environment, rather than a full list of dependencies
because it’s just one line, it’s easier to be strict about where it can occur^[1]

Being small and strict drastically reduces the danger of unintended consequences of the type I mentioned in my earlier posts (e.g., with a # inside a string literal). That danger can be reduced further by having a longer and less collision-prone prefix (which is more tolerable here since you only have to type it once).

Now, what I’m thinking is (and this is just pie in the sky dreaming here, I haven’t tried any of this), what if something like this were combined with a “barebones” dependency file similar to requirements.txt or the format proposed in your PEP? So the dependencies would be in a separate file, but they’d still be in a simple-as-possible text format, and then instead of listing the dependencies in the script, we just have the script reference this external dependency file. So it’d be something like this:

# this is script_requirements.txt
great_lib
useful_lib

and

# this is script.py
# script-deps: script_requirements.txt

import great_lib
import useful_lib

def handy_function():
    print("And so on...")

Then you could run some tool like pipx a la pipx --read-env script.py and it would set up the environment and/or run the script in the environment if it was already set up. Copying the script to another computer would then require copying two files, not one, but you could put them anywhere, as long as they remained side by side and didn’t have a name clash with other stuff in the same directory.

There are some usability questions that opinions might differ on:

How important is it to strictly have to copy just one file (the script file)? For me, this usually isn’t so critical; what’s more critical is not having to carefully maintain some particular directory structure, so “copy two files and keep them together” would probably be okay.
How important is it to be able to run the file directly with python script.py, rather than something like python -m scriptrunner script.py (where “scriptrunner” is a pipx-like tool that handles the environment matters)? It seems like any system like this has to rely on something already being installed, because the dependency info is useless without some tool to provision the environment. So it’s just a matter of having those required tools also be able to link up the script with its dependency file.
At this point, how much is gained by keeping it as a stripped-down text file vs. something like pyproject.toml? With tomllib in the stdlib now, it seems like the main problem with using pyproject.toml is it has to be named pyproject.toml, which means you can’t have more than one in the same directory. But if it could just be an arbitrarily-named TOML file, then it doesn’t seem that much more verbose to use that, and it might be a little more robust.
There’s still a coupling of two files which could get out of sync (e.g., if you rename the depedency file you have to fix it in the script). My own feeling is that this is sort of unavoidable, as trying to actually pack the dependency info directly into the script file just seems too messy to me. The next simplest thing is to have one file that knows about one other file, but impose no additional structure (e.g., directories).

Like I say, this is just handwaving. But my goal here is to try to probe a bit at which desiderata are most essential and which can be flexed a bit.

the way I have it set up, it actually requires that the first character of the line be # and that it be preceded (if at all) only by lines whose first character is also # ↩︎

psarka · July 20, 2023, 3:14am

The PEP example script uses ‘#!/usr/bin/env python’ shebang. Isn’t this inconsistent with the idea that such requirements specifying script should be called with ‘pip-run’ or equivalent?

pf_moore · July 20, 2023, 6:57am

No good reason, I could change it to that. It may even be what pipx does already, now that I think about it.

Nowadays, the --python option is better than --target for this.

Cool. So how do you ensure the dependencies of these scripts are available? Do you use a shared environment and dump all the dependencies in there? What if there are conflicts. That’s the problem I’m looking at here.

That’s just an example, which can be fixed. I’ll probably just remove the shebang to avoid confusion.

I’m not trying to solve the “how do I run python scripts” problem here. That has much bigger challenges, especially on Windows, which is what I care about. This PEP is purely about “can we standardise the format for specifying dependencies that pipx and pip-run use?”

petersuter · July 20, 2023, 8:02am

I like this proposal.

I’d prefer if it also allowed the requirements to be listed in the module doc-string.

Separate files and directories are just unrealistic in many cases. The real alternatives in practice are then being restricted to the standard library or mystery dependencies. So the comment / doc-string hopefully is already there anyway.

I hope a tool can be bundled with Python that supports this, or an already bundled tool can add support for this. (If not pip, or py launcher, maybe venv: py -m venv -r script.py?) That can be done later of course.

pf_moore · July 20, 2023, 9:41am

Brett Cannon:

I don’t think it’s that bad (assuming this untested code works):

import tokenize

with tokenize.open(py_file) as file:
    requirements = None
    for line in file:
        if requirements is None and line.strip() == "# Requirements:":
            requirements = []
        elif requirements is not None:
            if line.startswith("#"):
                if comment := line.removeprefix("#").strip():
                    requirements.append(comment)
                    continue
            break

I think all that does (compared to my approach in pipx) is handle encoding detection. You can make it work using tokenize.generate_tokens, but it’s not entirely trivial (you need to take care to ensure you ignore trailing comments, for example). The PEP probably does need to discuss encodings, though - if only to say that tools should detect the encoding the same way Python does.

The bigger question for me is whether we want to require tools to implement a full Python tokenizer. One downside (which I mention in the PEP) is that it means that the spec is tied to the Python version. While it’s unlikely, it is possible that Python 3.13 could add new syntax that meant the 3.12 tokenizer wouldn’t be able to process 3.13 source. I’m pretty sure I’d count that as a deal-breaker for pipx, for example, because pipx has the ability to run different Python versions, and it’s perfectly conceivable for someone to run python3.10 -m pipx run --python 3.13 myscript.py…

And suppose Python 3.13 added multi-line comments of the form /*...*/. Would the spec need to change to cover whether these comments could be used for requirements? Or whether a /*...*/ could “comment out” a dependency block?

For all its limitations, I think that ignoring the Python syntax and treating the file as simple text when parsing for a dependency block has advantages.

There’s also a question of who will write and read this data? The PEP assumes that it is written by the developer (a human) and consumed by a utility (i.e., code). This matches the expected usage in pipx. Based on this assumption, it’s more acceptable to take the view that as long as we’re precise in describing how the data will be read and processed, the developer can write the data in a way that doesn’t trigger any of the edge cases we’re discussing here. But this approach needs a lot more care on the part of tools writing the data, as they have to handle all the edge cases intelligently.

It makes sense to me that this gets added to the “Rationale” section of the PEP. But if I’m going to do that I want to be sure that there’s consensus that it’s a reasonable design decision to take. Specifically, that cases like the one @BrenBarn noted, where things that look like dependency blocks in a multi-line string being processed, is something that we can legitimately address by just saying “don’t do that, then”.

Which reminds me:

Apart from being more verbose (which you mention) the other reason I dislike this is because you have to parse the whole source file - there’s nothing to indicate that you’ve reached the end of the dependencies except EOF. While I don’t expect people to be writing megabytes-long source files and calling them “simple scripts”, it’s not impossible (imagine a chunk of embedded data, like in get-pip.py). Script startup time is often quoted as a problem for Python, so let’s not make it worse than we have to.

groodt · July 20, 2023, 11:00am

Don’t have too much to add. Wanted to raise awareness of nix-shell which may provide inspiration.

They have a rich shebang syntax that lets you distribute single file scripts to any machine with Nix installed.

Perhaps a similar shebang interpreter could work for python or pip(x).

E.g.

ntessore · July 20, 2023, 11:26am

This seems like a better solution to me. It’s not that different from what you sometimes find in Jupyter notebooks which come with the requisite %pip install ... magic at the top.

The installation could even be made conditional on some sort of __pipx__ sentinel, say.

petersuter · July 20, 2023, 11:35am

F# script package references simply declare the require packages.

pf_moore · July 20, 2023, 11:45am

Well, you could have the following “magic sentinel”

#!/usr/bin/env pip-run --
# Requirements:
#     requests

... your script here

That doesn’t require anything that isn’t available today, it just needs pip-run available on PATH. It relies on the fact that pip-run already implements the syntax defined in this PEP, plus an optional convention to write the first few lines of the script in this way.

I prefer pipx, because it caches environments (and hence improves script startup time), but its interface isn’t as clean (yet). YMMV - the point here is that we don’t need anything new here, just agreement on stuff that already exists.

hugovk · July 20, 2023, 11:47am

I do, and my workaround is to add comments like this, so I get something to copy and paste when it falls over with an ImportError:

from natsort import natsorted  # pip install natsort
from packaging.version import parse  # pip install packaging
from termcolor import cprint  # pip install termcolor

Let’s avoid PEP numbers in user-facing things, it can be confusing to have to remember or look up 621 or 588 and so on.

thejcannon · July 20, 2023, 12:01pm

Sure, but that’s as opposed to looking up “Requirements:” Some kind of breadcrumb would be a huge lift.

AA-Turner · July 20, 2023, 1:08pm

Perhaps something akin to # Script Requirements: (# Script Requires:, # Embedded Requirements:, etc) could help here – the additional word lowers the likelihood that a rogue comment matches the format, and provides extra context for a google search.

As an illustrative example, there are currently no recorded uses of # Script Requirements: on grep.app, and only about 10,000 search results for the query.

A

pf_moore · July 20, 2023, 1:19pm

I honestly have no vested interest in any particular header text, so I’ll just wait for consensus here. But there may be a compatibility issue with any change, as Requirements: is currently implemented by pip-run (the pipx support isn’t released, so I don’t count that). So a different header would need some form of transition.

I’ve no idea how much use this feature of pip-run has. Maybe @jaraco has a view?

takluyver · July 20, 2023, 3:39pm

I like this idea, and given that tools are already implementing some form of it, I think it’s worth standardising.

As for the multi-line strings question, I don’t think it’s a showstopper. I could live with either of two compromises:

Tools have to know enough Python syntax to look only at comments. This makes non-Python implementations harder, but they can run a bit of Python helper code to do it, or you can write a Python tokenizer in another language. It’s a limitation, but not the end of the world.
Alternatively, the requirements block must come before any code, so you stop looking for it at the first non-empty, non-comment line. Simpler to implement tools for, more limiting for the user.

PEP 508 does not allow local directories or files as dependecy specifiers.

I think it actually might allow local files. It allows URLs (as in pkg @ https://...), and as far as I can see, it doesn’t specify what schemes are allowed, so I think foo @ file:///home/takluyver/foo-0.1-py3-none-any.whl is valid according to the spec. I haven’t checked what implementations actually do with this.

pf_moore · July 20, 2023, 4:40pm

I agree. My initial reaction was a bit too panicky. I actually think we should apply the “consenting adults” principle here. The rules on how the dependency block is identified and parsed must be clear and unambiguous (I believe they are, but we can fix them if not). But they do not have to prevent users from putting then in dumb places (such as in multi-line strings).

If a developer wants to put something that looks like a dependency block (but isn’t) in a script’s docstring, they have to put the actual dependency block before the docstring, so it gets recognised first. And if they don’t have dependencies, but want to use a tool that looks for a dependency block, they can put in an empty dependency block.

If a developer wants to put a real dependency block in a multi-line string, fine. Let them do it. What’s the harm?

Rules don’t stop people doing stupid things^[1]. Common sense (and other people) stops people doing stupid things.

no matter how much we try to make them ↩︎

jeanas · July 20, 2023, 4:44pm

I don’t like the PEP, as I said above, but assuming that it is going to be done, why not borrow the same syntax as encoding comments? Namely #-*- requirements: numpy, scipy, pandas, rich -*-.

I can also imagine

__requirements__ = """
numpy
scipy
pandas
rich
"""

or even

__metadata__ = """
[project.dependencies]
...
"""

(the latter making it an “inline pyproject.toml”).

pf_moore · July 20, 2023, 4:45pm

I was over-simplifying. It may do, but the usability sucks (no relative paths, and don’t get me started on Windows drive letters). My point here was that tools may reasonably want to allow extended forms of dependency specifier, and I don’t want to disallow that, even though it’s not something I want to try to define in the PEP. The obvious extension is “anything that pip can use as a requirement” (and that’s what pipx and, I believe, pip-run allow).

pf_moore · July 20, 2023, 4:50pm

Jean Abou Samra:

I can also imagine
__requirements__ = """
numpy
scipy
pandas
rich
"""
or even
__metadata__ = """
[project.dependencies]
...
(the latter making it an “inline pyproject.toml”).

These ideas are addressed in the PEP (“Why not include other metadata?” and “Why not make the dependencies visible at runtime?”). I imagine you’ll disagree with those sections as well, of course. And it’s perfectly fine for you to disagree - this isn’t going to be something everyone will like. But if you have suggestions or questions that aren’t already covered in the PEP, please do ask them - I’ll be updating the PEP based on the feedback here, so I want to make sure I hear everyone’s views.

(The idea of an encoding comment style is one I’ll think about. I don’t like it, but I need to articulate why if I want to represent it fairly in the PEP).