PEP 722: Dependency specification for single-file scripts

I regularly put small scripts in my C:\Work\Scratch directory. Having to create a subdirectory and a pyproject.toml every time would be a pain. See this thread for more discussion.

But yes, this won’t be useful for everyone, that’s perfectly fine.

3 Likes

I have the same opinion as @sbidoul in that thread:

Not screaming but… I personally use and recommend to use project.dependencies to declare the top level (and not pinned) dependencies of any application, whether or not building a wheel happens in the deployment pipeline.

This feels quite natural and intuitive, and it actually did not even occur to me that it could be controversial until reading this thread.

Moreover, there are already several ways of specifying project dependencies – like pyproject.toml and requirements.txt (not to mention Poetry, etc.). I would be weary of adding more. In my personal experience, the inability to understand what “is the correct method” leads to more frustration than a bit of boilerplate and inconvenience.

So I’d lean towards proposing to make name and version optional in pyproject.toml, and refusing to build the project if they’re not present, but allowing to run Python in an environment defined by project.dependencies.

10 Likes

What I wanted to say in that other thread is that using project.dependencies is good for me… as opposed to inventing a new section in pyproject.toml or a new file format.

But for single file scripts I also think it’s good to have a standard way to embed dependency specifications in the scripts…

So +1 for such a PEP (of which I have only read the title for now).

I feel this crowds the space of the script files a lot. Even more if the user pins top-level and transient dependencies.

Without this PEP, tools can still implement other methods to use existing standard dependency files[1] to run scripts. Like this:

mypip run script.py --requirements=requirements.txt
mypip run script.py --requirements=requirements.lock
mypip run script.py --requirements=pyproject.toml  # implies [project.dependencies]

I don’t understand what this brings except yet another way to specify dependency. If the argument is this is for very simple single-file scripts, then why not parse from the import statements[2].


  1. quasi-standard? ↩︎

  2. :man_facepalming: because package names are not distribution package names ↩︎

2 Likes

The mapping between names that can be imported in an import statement and names that represent distributions in PyPI is not deterministic (and not 1:1).

I think there is a lot of value in the proposition. Based on Jupyter’s/Colab’s wide spread usage of !pip and %pip, I would say there is appetite for the feature, targeting simple single-file scripts scenario.

Also, there is some prior work in other languages targeting dependencies embedded in single-file scripts (e.g. Elixir’s Mix.install and Ruby’s ‘bundler/inline’, probably others) which is a good sign that we are not going in a weird direction. The choice towards non-executable comments is fine and the links included in the PEP do a good job in explaining pros/cons.

6 Likes

Just a hypothetical: given that pip is available with most Python distributions, if pip had a minimal, importable API, would this even be necessary? Could this use case be handled by something like:

#!/usr/bin/env python

# In order to run, this script needs the following 3rd party libraries
import pip
pip.install(["requests", "rich"])

import requests
from rich.pretty import pprint
...

And if so, would that be preferable to comments with side effects?

9 Likes

I’m of two minds about the proposal. I’ll sit with my thoughts for a while before chiming in with my perspective.


One thing I suggest (and at the risk of introducing a new format that pipx and pip-run doesn’t support) is that the # Requirements: line comment include the name of the PEP (PEP 722) in it:

  • Much less likely to conflict with a comment in the wild that happens to match the proposed format
  • Is self-documenting what the heck this thing is, which is good for beginners, or anyone who wasn’t aware of the format

E.g. (bikeshed on the specific text)

# PEP-722: Requirements:
1 Like

IWBNI this could be combined with venv creation. One option might be to extend venv.create() with an install=[pkgs] argument, or possibly exposing that “minimal” API to also create and install in a venv.

2 Likes

I’m in agreement that the packaging/dependency-spec system for small scripts is awkward right now, and I think it’s good to hash out possible solutions to that. But I have some reservations about this PEP.

First, from what I understood on the other threads, requirements.txt is currently not standardized with a PEP. It seems a little premature to PEP-standardize this new inline spec system without also standardizing requirements.txt, because they seem pretty similar and it will get really confusing if they start to diverge. It would make more sense to me to have some kind of “standardized way of specifying dependencies in super-plain text (i.e., not even TOML)” and then that could be used to define both inline dependencies and requirements.txt and maybe other stuff.

Second, it seems like we’re just finishing up finally getting away from setup.py which put dependencies in a Python file, and moving towards pyproject.toml which puts them in a code-free data format, and now this PEP is sort of circling back. It’s not a complete reversal, since the PEP tries to thread the needle by having the info in comments rather than generated by actually running Python code, but still, this type of thing always makes me a bit nervous. Almost anything can be overloaded into comments because they’re totally free-form, but I think we have to be really careful every time we do that, because the more things are overloading comments, the more confusing it can get, and because comments are totally freeform it’s hard to predict what kinds of weird stuff might be in there.

As an example, how does this PEP propose to handle a Python file that has this:

import some_lib
import another_lib
"""This is a string literal

# Requirements:
# some_lib
# another_lib

Here endeth the literal.
"""

# Here are the requirements
# Requirements:
# some_lib
# another_lib

What if the order of the comment block and triple-quoted string are switched?

Based on my reading of the PEP, the answer is “there can be only one, doesn’t matter if it’s in a string literal, we’re just looking for the requirements block”. But this illustrates the point that, if this is meant to be parsed without parsing Python syntax, the PEP can’t actually even use a term like “a comment line”, because “a comment line” is itself a construct of Python syntax. It needs to say something like “A line beginning with optional whitespace followed by a # character”. (And if some other behavior is desired, it similarly needs to be specified in precise terms referring only to the raw text.)

Third, for me, the “rejected alternative” about local dependencies raises the question of why this proposal needs to be a PEP rather than just a pip implementation detail. The PEP starts off by specifying the format each listed dependency must follow, but then later says, hey, actually it’s okay for tools not to follow this format and just accept whatever pip accepts. It seems likely that this will lead to confusing situations where pip (and/or other tools) start using almost-but-not-quite-PEP-722 mechanisms to specify inline dependencies. Also, since the proposed format is so barebones, there is no way for the script to declare, or for the consuming tool to check, that it actually is PEP 722 compliant; tools would just have to try parsing every line and/or feeding to an installer and seeing if something breaks. This isn’t in itself the end of the world (it’s in the spirit of how things currently work :slight_smile: ) but it doesn’t seem like having a standard gains much if it doesn’t do anything to make such problems less likely.

More generally, I am not totally convinced that “find a way to put the requirements into the Python source file” is really the way to solve the problem of “the pyproject.toml build system is overkill for single-file scripts”. I’m not even actually convinced that because the script is a single file, that means there has to be a way to specify script-and-dependencies as a single file.

In other words, if you want to distribute a single-file script, maybe you still have to bite the bullet and distribute another file along with it to list the dependencies. Some of the pain of that could be alleviated by, e.g., loosening the requirement that such a file be named pyproject.toml, so you could have multiple little_script_metadata.toml files lying around, each one of which includes within it a mention of which script file it applies to. That could get messy, but it’s not clear to me it would be messier than things could get inside the script file with this PEP.

Another idea is to create some sort of convention for a command-line option or function to be called to install a script’s dependencies (which would be easier if as @dustin suggested pip had an importable API). So then the way to do it becomes python myscript.py --install-deps or pip --read-deps myscript.py or the like.

Overall this PEP seems to be proposing a solution to apply to “quick and dirty” type scripts. I agree there’s room for improvement for that use case. But it’s tricky to triangulate between “so quick and dirty that it may make things confusing” and “so quick and dirty that maybe it shouldn’t be addressed at the level of a PEP” and “actually not as quick and dirty as it may seem” (e.g., with some of the rejected alternatives at the end). To me this PEP is valuable as a starting point for thinking about this problem but I don’t think it would be a good idea to just adopt it as-is.

6 Likes

I’m still digesting this to think about how I think about it, but I will say, I think the format being used makes parsing this harder, since you have to be context aware when parsing individual lines to know whether you’re inside of a requirements block,and then you have to answer questions about what causes that block to terminate, what if you have multiple blocks, etc.

It’s slightly more to write, but I wonder if instead treating the syntax as line based would be better? Something like:


#!/usr/bin/env python

# In order to run, this script needs the following 3rd party libraries:
#
#    require: requests
#    require: rich

import requests
from rich.pretty import pprint

resp = requests.get("https://peps.python.org/api/peps.json")
data = resp.json()
pprint([(k, v["title"]) for k, v in data.items()][:10])

I don’t really like the generic require name here, but we can bikeshed a different name (pkg-install? project.dependency?.

To me the big win this provides is that parsing and logic in general becomes significantly easier.

import re

with open(filename, "r") as fp:
    for line in fp:
        if m := re.search(r"^\s*#\s*require:\s*(.+)$"):
            ...
5 Likes

Have you considered writing a script to create the subdirectory and the pyproject.toml file?

Without some kind of block restriction, this seemingly simple rule may interact confusingly with normal comments with unlucky line wrapping. For instance:

# The purpose of this library is to do a thing.  Simply call the do_a_thing()
# function to do things.  Of course, there are many things you may want to do.
# The various arguments allow you to easily specify whatever options you may
# require: filenames, timestamps, or other useful information.
#
# See the documentation for more info.

Oops. That comment block accidentally specified a dependency on some invalid library called filenames, timestamps, or other useful information.

Things like encoding declarations avoid this by being much more restrictive, insisting that the declaration occur on the first or second line of the file.

This example may seem contrived, but my point is that this kind of thing is risky because comments are totally free-form. Existing code in the wild could have literally anything in comments. Adding new special meaning to comments in the large, without very restrictive guards (e.g., positional requirements, or at least some prefix much more unlikely than “require:”) may cause super befuddling bugs. It may not be a super common problem, but personally I don’t see it as worth it just to put requirements into a script file.

3 Likes

Ouch. That cripples this proposal.

To be clear, the PEP is simply documenting (and proposing as a standard) the behaviour that’s already present in pipx (and in pip-run, although I only checked the spec for that, not the code itself). In pipx, the answer is “it just reads lines and ignores Python syntax”, which is wrong but acceptable if you assume the requirements are written for pipx to use.

So I’m OK with what pipx does, but there’s no way it’s viable as a standard.

It’s not something pip would implement - to be useful, it needs environment management, not just installation. But it is “just” a pipx/pip-run implementation detail at the moment, I wrote the PEP because @brettcannon suggested that if it were standardised, it might be something that editors like VS Code could write into the file (as a sort of “do you want to add a requirement for this?” helper).

I don’t personally have any need for it to be standardised. I’m happy just using it for pipx/pip-run when I want it.

Personally, my interest (and the reason I added the feature to pipx) is not in distribution, it’s in just running local scripts that have dependencies. I have too many scripts to create a virtualenv per script, and too much junk in my “work” directory to keep track of which requirement file relates to which script. Maybe I’m just disorganised, but I’ve encountered plenty of casual Python programmers, and most of them tend to work with this sort of “bunch of scripts in a directory” model. Typically, they are used to working with shell scripts, or Perl scripts, or maybe SQL files, and Python scripts are just another one in that mix.

It’s certainly trying to close the gap between “quick and dirty” and “full project”. Everyone keeps saying it’s easy to depend on 3rd party packages, and so the stdlib is less relevant today than it was years ago. And that’s true for “full projects”. But it’s definitely not true for “quick and dirty” scripts, where depending on (say) requests is a significant step up in complexity. Either that or you end up dumping a bunch of libraries into your main Python environment - but I thought we all agreed that was a bad model?

You don’t need an API for this. Running pip in a subprocess is perfectly fine. The problem isn’t about running pip to do the install, it’s about managing a (temporary) environment for the script to run in. And that’s the bit that has to be done before the script starts - it’s what pipx and pip-run do for you.

Yes, of course I have. But how do I put 20 directories containing a script and a pyproject.toml file in my ~/.local/bin directory and make them executable? I’m not suggesting this is just about me being lazy or disorganised[1]. There’s a fundamental difference in my mind between a “script” (a single file that I can just run, put on my PATH, copy to a different PC and use, have on a “utilities” disk[2] that I carry with me, etc.) and an “application” that I build and distribute, and which needs to be “installed” or at least set up in a specific way so that I can run it. Do people really not use Python this way any more?

I really don’t know why I’m not getting my point across here. Would shell scripts be as usable if they had to be placed in their own directory with a metadata file alongside them?

Anyway, there have been some significant problems raised with this proposal, to the extent that I think I’ll simply withdraw it (it’s not yet a formal PEP, just a PR against the PEPs repository, so I can just close the PR unmerged). Unless anyone can come up with a way of fixing it - but there’s a lot less value in my mind if we end up trying to design a brand new “solution”, rather than simply formalising a behaviour that is already implemented in tools.

On a broader note, I get the feeling that people really aren’t that convinced that there’s anything wrong with the model of putting your Python code in its own directory. I find that a shame - I don’t like having to write shell scripts, and I’ve always far preferred using Python for the sorts of things I’d use a shell script for. But having to think in terms of “setting up a project” for every script I want to write takes a lot of the joy out of that.

Oh well.


  1. I am, but that’s beside the point :slightly_smiling_face: ↩︎

  2. or a cloud drive, for you youngsters! ↩︎

9 Likes

I don’t have anything useful to say about the exact format, but I did want to chime in with a +1 for the concept. I think it would be very cool to have this capability standardized.

Thinking about this may have triggered some latent software engineering bitterness, so here's some optional snark.

In every corporate environment I’ve worked in there are regularly single file scripts. They’re often meant to support some automation use case and proliferate for reasons like:

  • all the engineers tend to know at least a little Python.
  • it would be more difficult to write whatever-it-is in bash
  • you know that Debian VM is going to have at least Python 3.6

They aren’t projects with a pyproject.toml, because they’re scripts. There isn’t a separate dependencies file, because having a second file and/or setup instructions defeats one of the goals of creating a “standalone” script. These things get copied around on network shares and USB drives or copied and pasted between build systems in different projects.

Specifying any non-stdlib dependencies for these is painfully ad-hoc and usually comes down to praying someone reads an in-file comment you wrote.

Is any of that the “right” way to distribute a script? In a beautiful perfect world everyone would know all the latest build system standards, put their tools in git repositories, and distribute wheel files for their coworkers to install with pip. Except nobody does that, because we all learned a little bit of Python in undergraduate university and just write the .py file. :upside_down_face: (or, to Paul’s point, because it makes the script actively less useful!)

Edit: +1 to @pf_moore’s points above, which articulate my feelings much more clearly than my optional snark.

4 Likes

People absolutely do that, even sometimes in a work context!

(I had a very simple data normalization task that I managed with stdlib modules csv, collections, datetime. If I had needed a third-party lib I would probably have used my user-global virtualenv where I install various libs for various needs, but don’t feel like doing a pipx install (or couldn’t because they are modules without script entry points))

Well there is another aspect that is important to not forget: the user has to “opt-in” and effectively run the given script with a compatible “environment manager” like pipx or pip-run… The idea is not that all comments or docstrings ever written in Python will start to be interpreted as a requirement in every context.

I think that makes a lot of difference, and in this context taking the simplest approach should be fine…

4 Likes

I also have the same feeling sometimes when I want to write a “quick script” in Rust (ok, not the primary use case for Rust, though I do sometimes do it), but cargo new foo alleviates that quite a bit.

If I think about myself, I believe the hurdle is largely psychological – it adds another step on the road towards getting your task done (“figure out how to add a dependency to my Python script”), but I think it is wiser to work on the user experience to make setting up a new directory feel instant, normal and not burdensome, rather than seeking to eliminate it.

Don’t get me wrong: yes, it would be a little more convenient not to have to cd into the directory for editing the code. However, the packaging landscape is already way too confusing for beginners and I think having methods everybody agrees upon is more valuable.

2 Likes

I don’t think the point is to distribute a single-file script, but rather how to “run” a single-file script without going through the hassle of manually managing a virtualenv and keeping mental track of the installed dependencies…

If the distribution is the primary goal, then I agree that going through conventional packaging is OK (but I don’t think this is the target of the PEP).

3 Likes

Why the blank line and not “a non-comment line”?

On top of the virtual environment need that was pointed out, this also assumes pip is installed. I could very easily see this being used with a virtual environment that lacks pip in order to speed up virtual environment creation and save on disk space (you can do the install externally via --target after environment creation).

That assumes venv has pip available at creation time. I think you definitely should be creating virtual environments as part of this, but I don’t think they need to be made into a single step.

I don’t think it’s that bad (assuming this untested code works):

import tokenize

with tokenize.open(py_file) as file:
    requirements = None
    for line in file:
        if requirements is None and line.strip() == "# Requirements:":
            requirements = []
        elif requirements is not None:
            if line.startswith("#"):
                if comment := line.removeprefix("#").strip():
                    requirements.append(comment)
                    continue
            break

I think that’s a great point! If you’ve muddled your code in that way then you can simply fix it since it’s a single script that you can look at it and is directly under your control. This isn’t going to be buried in some 3rd-party package that’s accidentally causing you issues with this.

5 Likes

Totally agree on this point! And I just recently converted my shell scripts into a single Python script with a Typer CLI, and it’s just beautiful.