PEP 722: Dependency specification for single-file scripts

jeanas · July 20, 2023, 5:24pm

D’oh! Sorry, I did read the PEP but too quickly, and I failed to remember that these were in it. My bad.

I can agree with “Why not use a more standard data format (e.g., TOML)?”.

Regarding __requires__, note that there is a precedent in Hatch for reading a __version__ attribute by not using a full Python tokenizer/parser but a simple regex (Versioning - Hatch).

takluyver · July 20, 2023, 5:42pm

I see. I wonder if this kind of defeats the point of the spec, though? If both the existing implementations allow specifying any requirements pip can install, people are going to use that, and then to build a compatible tool you need to allow those things too. So I’m not sure what the purpose of a spec saying something more restrictive is.

Maybe this PEP should be pragmatic and just allow anything that can be passed to pip? We might not want pip to be special like that, but it already is, so maybe we’re just pretending that it’s not. The broader requirement syntax could be made into a proper spec later. Or maybe if the only practical way to do it is implementation defined, it’s just something the respective projects should document rather than a subject for a PEP.

EpicWink · July 20, 2023, 10:29pm

I propose # py:requires:, which surely won’t conflict with reasonable comments. Alternatively, #pragma: requires:, which has precedence in coverage.py.

Does pip-run support multi-line requirements by escaping the newline? Is that necessary in the proposal?

#   foo ~= \
#    2.0

Why not simply say “tools may support a format to specify (potential relative) paths to project directories, but this PEP won’t specify a format”. That’s a lot more assertive on path support

I also can’t imagine there’s too many distinct formats for specifying a path, so I’d imagine the community would converge on the same format

BiteCode · July 23, 2023, 8:48am

I know this proposal wants to keep the scope small. But if you want to list deps, the dependency on a particular version of Python should be part of it.

groodt · July 23, 2023, 10:05am

I really think this will be very useful and could even further increase the utility of Python as a glue language. I use nix-shell scripts extensively at work and the ability to ship self-contained scripts without having to worry much about docker images or installation instructions or distribution archives cannot be understated.

Instead of a new format of dependencies, what about inline requirements.txt? It lets us punt the standardisation question even further, avoids adding a new format for people to learn and if things become standardised in future, it should still work.

Ruby (bundler) has a mechanism for inline Gemfile dependencies. Bundler: How to use Bundler in a single-file Ruby script

require 'bundler/inline'

gemfile do
  source 'https://rubygems.org'
  gem 'benchmark-ips', require: 'benchmark/ips'
end

Benchmark.ips do |x|
  x.report('original') { naive_implementation() }
  x.report('optimized') { fast_implementation() }

  x.compare!
end

pf_moore · July 23, 2023, 11:22am

The problem here is that requirements.txt isn’t standardised, and this is, of necessity, an interoperability specification. You can’t assume that the tool which runs the script will just pass the requirements onto pip. Even ignoring the possibility of an installer that isn’t pip, the tool might want to do some pre-processing of the requirements (pipx caches environments based on the list of requirements being installed, for example). Do we want to require pipx to parse nested requirements files (using the ability to include an -r option in a requirements file)?

Also, that Ruby example, if I understand it correctly, doesn’t use an isolated environment. So it’s not equivalent to what we (or at least I) want to do in Python. If you expect to run the script in an isolated environment with just the script’s requirements installed, you probably have to process the requirements before the script runs. You could create a temporary, empty environment, and then run a script that auto-installs its own requirements in that. But then you can’t do things like cache environments.

But this is getting off-topic. The point here is to standardise a way for a single-file script to declare its dependencies in a way that lets tools do whatever they like with that information. My primary use case is to run scripts, but others may want to do audit scans on a directory of utilities, or process a set of scripts to determine if they can share an environment without dependency conflicts, or freeze the script and its dependencies into a zipapp. Once we have the data, people can use it in many ways.

As a reminder - we don’t need this standard to just write standalone scripts. If you’re happy with using an implementation-defined format, then pip-run already exists, and the next release of pipx is an alternative with different trade-offs, if you prefer. But if you want your IDE to offer to add dependency data when you type in a 3rd-party import, you stand a much better chance of that if the format is standardised, rather than being tool-specific.

EpicWink · July 23, 2023, 9:47pm

At least for executable scripts, you can specify the version by choosing a version-specific executable:

#!/use/bin/env python3.10

...

csm10495 · July 24, 2023, 4:46am

I don’t like when people call pip from inside scripts since it can pollute an environment by adding unexpected packages.

jeanas · July 24, 2023, 6:36am

I am hesitating to write an alternative PEP (proposing inline pyproject.toml). Suppose I did; would any core dev be willing to sponsor it?

BiteCode · July 24, 2023, 6:58am

This will not work on windows. It will also force the exact executable to be present on Unix, while the script could run with 3.11 even if 3.10 is not here.

brettcannon · July 25, 2023, 12:34am

Maybe. It was more to point out to Donald that it isn’t a complicated thing to parse.

Definitely! I could very easily see implementing something in Rust and have it work with the Python Launcher for Unix.

I had been going on the assumption that the tools doing the execution would handle reading and writing any necessary data. Do you think it’s worth coordinating all of the details so the tool handling the execution is entirely interchangeable to the point that they will use the same temp virtual environment?

If we want to avoid ambiguity, this is my preference since it’s the simplest and fastest.

I would prefer not to do this unless it’s backed by a spec. We are trying to explicitly get away from conventions driving things.

I personally would want to avoid that for simplicity.

You can somewhat do that via the shebang line today. And that’s a separate ask since that’s expanding what could be defined as a dependency in any form, let alone within a .py file (i.e. you can’t do that in a pyproject.toml via a project.dependencies array, so it’s out of scope for what this PEP is trying to accomplish).

The Python Launcher for Windows actually reads the shebang line (as does the Python Launcher for Unix).

Not if you use /usr/bin/env for the shebang.

BrenBarn · July 25, 2023, 5:06am

Maybe this is just me, but the way the discussion is going is making me feel it’s going to be a bit dicey to balance the desire for a quick and easy format with the desire for something more clean and standardized for widespread tool interop. In particular as I mentioned before, this seems like an inline requirements.txt in practice, and I don’t see how it’s a good idea to jump to standardize this rather than making a single standard for that type of dependency list, and then saying “you can also specify this kind of information within a python file like so”.

pradyunsg · July 25, 2023, 9:15am

This is not an “inline requirements.txt” format.

The format for requirements.txt allows multiple forms, and comments. From Requirements File Format - pip documentation v23.3.1

The following forms are supported:

[[–option]…]

<requirement specifier>

<archive url/path>

[-e] <local project path>

[-e] <vcs project url>

This format only allows requirement specifiers.

Note that three out of those five are tool-specific, tied to the option parsing logic in pip, and URL/path handling is also tied to however pip decides to do it.

pf_moore · July 25, 2023, 10:28am

I’m not sure what you mean by this (or how it relates to my comment). What I was trying to say was that I was thinking in terms of the data being (primarily) written by a human, and read by a program. In that context, as long as the rules are precise and we can ensure that the program reading the data has well-defined behaviour, we can rely on the (human) writer not doing anything silly^[1].

I know your use case is for a program (VS Code) to write the data. I assume some logic like the following will work:

Find an existing requirements block. If there is one, append to it.
If there isn’t one, put a new one in a “sensible place” (such as at the top of the program, after any shebang and docstring, but before any other code).

But if allowing user discretion on where to manually place the requirements block is a problem for you, then we can make the spec tighter.

I actually now think that “users should simply not do that, even though it’s technically allowed” is probably sufficient.

I don’t want tools to have to know how to parse Python code, for reasons I’ve mentioned above. Also, @brettcannon has a use case for parsing this data in Rust, and I don’t want to require someone to write a Python parser in Rust.

I’m not even sure I want to ask consumers to deal with Python encoding cookies. I’m inclined to have the PEP say that consumers SHOULD read the encoding cookie, but they MAY assume UTF-8 (for example, if they aren’t written in Python and so the tokenize module is unavailable).

This is reasonable, but it does mean that the requirements block must go before any script docstring. I don’t personally tend to use docstrings for my one-off scripts, but if I did, I think I’d want to put the requirements block after it, not before. If the consensus is to go with this rule, I’ll add it, but I’m personally not in favour.

Or at least, we can tell the human writer “well, don’t do that, then” ↩︎

takluyver · July 25, 2023, 11:26am

That seems like a reasonable option too.

Package names and version numbers can only be ASCII anyway, so assuming UTF-8 works for any ASCII-compatible encoding. I guess there might be corner cases with requirements embedding a URL, but not ones worth worrying about.

sirosen · July 25, 2023, 5:22pm

Have we moved past this comment (from a week ago)?

It seems like we have, especially as more people have come out in support of it, but I want to confirm – in part because I really like to see this kind of incremental improvement and don’t think there have been any real “deal breaker” issues raised.

Having a PEP makes it easier for a new tool to come into this space and pick up on the prior art from pip-run and pipx.

Regarding the syntax for the header:

I like py:requires: in part for how it smells like a Sphinx role, which already scans as “executable documentation”.

However, are we really in need of something other than the already-tool-supported Requirements: indicator?

I don’t think it’s important that the comment itself indicate the PEP, be unambiguously non-prose text, or really have any feature other than being “the thing from the PEP”. You can always document what the requirements block means with… a comment.

# this block defines the script requirements
# see PEP 722 for details:
#   https://pep-previews--3210.org.readthedocs.build/pep-0722/
# Requirements:
#   requests

A script which is written for use with pipx and pip-run already soft-requires that it be run with one of those tools, since it explicitly doesn’t have any other supporting infrastructure for installing its requirements.

So what problem are we solving exactly by changing the string used? What is the precise use-case here?

I think it’s worth considering that having to handle a docstring exposes implementers to silliness like

"""
# Requirements:
#    requests
"""
# Requirements:
#    httpx

So the simplification gained by requiring it to come before the docstring is non-negligible – it means you don’t need any python-multiline-string detection.

As someone who does use a docstring in many scripts – I like using argparse.ArgumentParser(description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter) as my script helptext – I wouldn’t mind putting the Requirements: ... comment first. With the shebang line it becomes part of my “script execution preamble”.

thejcannon · July 25, 2023, 6:08pm

Search-ability for people “not in the know” is who I had in mind. The link between “this comment which says ‘Requirements:’ and this spec” is very weak. If I typed into <search engine> “Python requirements comment string” or… (oh heck I wouldn’t even know what else to type to try and find out what this odd comment block does) I would hope I find documentation on this magical comment, and ideally the tools that support it.

The fact that it also reduces possible in-the-world collisions to 0 is nice as well.

So, I like the self-documenting nature both in terms of tool support, and in terms of lookup-ability.

What’s the downside to using a different spec? Two tools have to change very slightly?
I didn’t see any docs about this from pipx, and pip-run already supports multiple ways of handling the metadata. So I’d venture to guess that both tools can easily (and happily) support a self-documenting key.

Let’s avoid PEP numbers in user-facing things, it can be confusing to have to remember or look up 621 or 588 and so on.

I’m not convinced we shouldn’t put PEP pointers in user-facing things either. The typing docs have 58 instances of “See PEP”. In fact more than a few Python doc pages have “See PEP XXX for more details”. I only know because I go to those docs very often and then end up on a PEP page

pf_moore · July 25, 2023, 7:05pm

As far as I’m concerned, yes we have. I over-reacted, my apologies. At the time, it seemed like the consensus was pretty much against the idea, but I guess one benefit of my post was that it brought a number of supporters out into the open

The code searches people have done indicating that Requirements: is used in existing code comments do suggest that something more distinctive would be useful. I’d prefer it to be something that reads naturally as text, though, so I’m not fond of ideas like py:requires.

I don’t have any specific reasons to think that Requirements: would be a problem - if you currently use that in your comments, you probably don’t run the code with pip-run/pipx anyway, so it’s fine. But I wouldn’t object to a different colour for this bikeshed.

My position now is that the above is completely well-defined according to the PEP - it declares a single requirement of requests. If it confuses you, then don’t write stuff like that. The consenting adults principle applies here, the language (or in this case the PEP) isn’t required to prevent you from doing dumb things, just to enable you to do clever things

Noted. I think I’d prefer to put the requirements block next to the imports, but as I say, I don’t typically use a docstring in scripts so that’s more of a conceptual difference than a practical one for me.

That’s a good point, and one I should cover in the “How to teach this” section of the PEP at least. I will say that the fact that you have to use a specific tool to run the script should be a clue as well. And I don’t expect this to be used by absolute beginners (who will use the stdlib). Once someone needs to know how to manage 3rd party libraries, “use pipx and a specially formatted comment block like this” should be explained as a single idea.

Also, there’s nothing stopping someone from adding an explanation:

# The following requirements block allows pipx or pip-run to
# automatically run the script with its dependencies.

# Requirements:
#     requests

Basically none. It’s a classic bikeshedding question. If there’s a consensus, I’ll follow that. Otherwise, I’ll make a decision among whatever options exist at that point, as the PEP author.

In packaging, it’s generally turned out to be a bad idea. (Everyone complains about jargon when we talk about PEP 517 or PEP 440). And honestly, I don’t consider the typing docs to be a paragon of clarity and discoverability.

I’m going to reject using the PEP number. If someone wants that sufficiently, they can write their own PEP (which will, ironically, have a different PEP number )

By the way - as the PEP author, I can’t be the PEP-delegate for this. Is anyone willing to volunteer? @brettcannon are you interested? Or as a SC member would that be complicated?

sirosen · July 25, 2023, 7:49pm

Regarding

"""
# Requirements:
# requests
"""
# Requirements:
# httpx

I’m not concerned as a user so much as I am as a potential implementer reading the PEP to write an embedded requirements parser.

I feel a little uncomfortable with this being implicit, in that it’s at least a little ambiguous and conflicts with how I read the spec. If I were trying to implement without the benefit of this conversation, I’d think I need to make sure my above example comes out as httpx (for which I think I’d grab libcst).

Is there room for calling this out explicitly and saying that the above should parse as requests in the PEP? (I’m open to the possibility that I’m being too fussy about this, but I was confused, so obviously it will be confusing for everyone else in the future. )

I think that making it natural text and trying to make it searchable leaves relatively few options.

It probably needs to become a two-word phrase or hyphenate, e.g.

# Required Packages:
#   requests

or

# Package-Requirements:
#   requests

pf_moore · July 25, 2023, 8:15pm

Yes, absolutely! If the PEP isn’t clear enough that you could determine how to implement the parsing without asking, that’s a problem with the PEP that I will fix. Maybe an example would be the best way, but I’m mildly uncomfortable about having examples of “how not to do things” in the PEP. I’ll think some more on this, but I’m very grateful you pointed out the problem.

It may be that it’s just unclear because when I wrote the PEP^[1] I hadn’t considered multi-line strings. But that’s not important, what matters is making it clear now.

Indeed. My priority is natural over searchable (because of the fact you’ll only need to know about it in the context of invoking a tool that supports it) but that’s a mild preference. Let’s see what people suggest, and whether there’s a consensus. I’m mildly in favour of Package-Requirements if we are going to change. Although there’s something to be said for Requires-Dist to mirror the metadata spec. That may be too obscure, though.

and for that matter, the pipx implementation… ↩︎