PEP 722: Dependency specification for single-file scripts

pf_moore · August 16, 2023, 10:14pm

I’ve edited the original post, to give the correct URL.

brettcannon · August 16, 2023, 10:35pm

I think that’s more of a discussion between you and @ofek about if he agrees with you and wants to change PEP 723 to align with what you’re suggesting. Otherwise I’m not sure how your PEP would differ from PEP 723 beyond what TOML data is included or some bikeshed on how the TOML is embedded.

ofek · August 16, 2023, 10:48pm

I agree with what Brett said but Joshua feel free to message me here, X (formerly Twitter) or Discord

gryznar · August 18, 2023, 2:27pm

I do not like specifying requirements in the comments block. It looks very strange that comments, which should be normally ignored, in this case have a special meaning.
For Python newbies not familiar with packaging, this PEP is not the solution. It takes additional effort to figure out that this kind of syntax is used. Do not forget that if this PEP is accepted (which I would like to avoid), this practice has to spread across the community, which may take a long, long time.

A newbie who would like to package his one-file script would encounter solutions based on requirements.txt, pyproject.toml, etc.) on the web much more frequently because they exist for a long time. It may occur that this solution is one more niche, which for insiders is yet another way to package (but they may prefer more cleaner and standardized ways like requirements.txt, pyproject.toml etc) and for the rest, it may be only an unknown curiosity or thing to ignore (because it looks like usual comments).

To summarize. Please do not accept that and do not introduce “yet another way” to package.

fungi · August 18, 2023, 2:55pm

This PEP isn’t meant to be a solution specifically for “newbies
unfamiliar with packaging” nor is it a way of packaging anything at
all (much less another way of anything).

Many of us who write quick Python scripts already put comments in
the beginning of them with a list of the non-stdlib dependencies the
script needs preinstalled. There are even tools which will read such
comments and autocreate suitable ephemeral environments before
running the script. This PEP merely standardizes that practice so
that these tools can have a consistent format to collaborate on
rather than each doing it their own way.

Not approving this PEP isn’t going to stop those of us who already
do what it documents, nor is it going to make the tools which use
this information suddenly disappear, all it will do is increase the
odds that two different runners will each expect a different syntax.

effigies · August 18, 2023, 2:55pm

It seems there’s still misunderstanding: This is not a way of packaging at all.

This is a way for someone who writes a script to provide a comment to users (“Your environment will need these dependencies to run”), but structure it so that script runners can build an appropriate environment (or verify that the environment has the dependencies).

Critically, this capability already exists, but in different forms in different tools. By agreeing to a common syntax, the script writer who chooses to use it does not have to say “run this with pipx” or “run this with pip-run”, but “If you use pipx or pip-run (or X other runner), it will just work. Otherwise, read the comment and make sure you have these things installed.” I would probably actually make this explicit in a comment:

# This can be run directly with pipx or in any environment with these installed.
# If you need to edit this list, refer to PEP 722 to ensure it remains runnable.
#
# Script dependencies:
#    pandas
#    matplotlib

There’s no requirement that any tool that is not aiming for compatibility to implement this, and there’s no requirement for a user to use those tools.

jamestwebber · August 18, 2023, 3:02pm

I think this is true. But it’s probably still worth considering the outside perspective, even if it’s technically incorrect. If this PEP (or 723) is perceived as complicating the ecosystem, that’s not a good outcome.

Not that one person’s comment is representative of anything, but I hope the user studies that Brett’s group is working on will clarify whether this is happening.

gryznar · August 18, 2023, 3:42pm

Despite the misunderstanding I am not against the idea. I am against the proposed solution. I still think that there is a better way to achieve that, than comments block which may be easily misspelled etc.

merwok · August 18, 2023, 3:44pm

Can you share what that better way would be?

fungi · August 18, 2023, 4:04pm

This is a way for someone who writes a script to provide a comment to users

Probably also worth reiterating, in many cases the “user” is just
your future self.

I don’t typically give these sorts of scripts to others, it’s purely
so I can remember what the script needs the next time I want to run
it. Having a tool that can automatically create the environment that
script needs from the comments I added to it is a bonus, but as much
as anything they’re notes I’ve made to myself within the script so
that I don’t lose them if I move the script to another directory or
to another one of my systems at some point in the future.

sirosen · August 18, 2023, 5:00pm

I agree with several of the points recently made, that this is not packaging, that it only standardizes an existing practice, etc.

But the note that this may be perceived – whether or not it’s technically correct – as “packaging” or “part of the packaging ecosystem” resonates with me particularly strongly.
Most python users are probably hazy about the boundary between “packaging” and “workflow tools” and so forth. For them, these are all just “packaging tools for python”.

This line of reasoning leads me to slightly favor PEP 722 over 723 if we must choose one.
My rationale is that by staying intentionally distant from pyproject.toml data, we better help users analogize the feature and its usage with requirements.txt files, which is more accurate than analogizing it with building a package with dependencies.

I’m very concerned that voices calling for “no new ways of doing things” effectively leads to stalled progress.

Rather than standardizing on a new behavior which enhances, replaces, and improves upon past art – like requirements.txt files – we’ll be stuck with only the current standards and no new tooling.

As for use of special comments vs any other mechanism…
There are mechanical problems with basically any other solution. This ground has been trod pretty thoroughly here and in the PEP 723 thread, but I’ll try to summarize.

it needs to be possible to parse the data in any language, not just python, so it can’t just be some runtime value or attribute
if the value is visible at runtime, the runtime value might not match the verbatim values seen in a file, leading to a misleading discrepancy between runtime information and the spec
shebangs, encoding comments, and other languages’ solutions to similar problems (e.g. embedded cargo.toml proposed for Rust) are a precedent for magic comments
multiline strings introduce additional questions and confusion for users, f-strings would not work and escaping rules become more complex

gryznar · August 18, 2023, 6:01pm

For me - PEP 723. It is much better alligned with existing ways and allows for much more

davidism · August 18, 2023, 6:05pm

Please be sure to read the PEP and the existing discussion first. This has already been discussed and decided on. You’re coming in and restarting discussion that has already happened.

gryznar · August 18, 2023, 6:06pm

Sure, sorry about that.

gwerbin · August 19, 2023, 5:59pm

Sorry if I missed this, but I didn’t see any discussion about using a block delimiter, like how Jekyll treats --- as a delimiter between the YAML header/metadata and the Markdown article content.

PEP 722 and 723 are obviously not the same, but they are both proposing the relatively novel feature of a metadata block embedded in comments. Maybe there’s room here to standardize a format for delimiting metadata from other comments?

Hypothetically:

#!/usr/bin/env python3

# My app!
#
# -*-
# Script Dependencies:
#   requests
#   click
# -*-
#
# Usage: ...

if __name__ == "__main__":
    print("Hello!")

You can still parse that out of the code with a single (absurd) multi-line regex: https://regex101.com/r/ECOTLu/1

import re

block_pattern = re.compile(r"""(?imx)
# Optional shebang
(?:^\#![^\r\n]+$(?:\r|\n|\r\n))?

# Optional blank and comment lines
(?:^[ \t]*.*$(?:\r|\n|\r\n))*?

# The opening delimiter
^\#[ \t]*-\*-[ \t]*$(?:\r|\n|\r\n)

# Header
^\#(?P<indent>[ \t]*)Script[ \t]*Dependencies:[ \t]*$(?:\r|\n|\r\n)

# Dependencies
(?P<deplines>(?:^\#(?P=indent)[ \t]*(?:[A-Z0-9][A-Z0-9._-]*[A-Z0-9]|[A-Z0-9])[ \t]*$(?:\r|\n|\r\n))+)

# Closing delimiter
^\#[ \t]*-\*-[ \t]*$
""")

line_prefix_pattern = re.compile(r"^[ \t]*#[ \t]*")

text = r"""
#!/usr/bin/env python3

# My app!
#
# -*-
# Script Dependencies:
#   requests
#   click
# -*-
#
# Usage: ...

if __name__ == "__main__":
    print("Hello!")
"""

m = pattern.match(text)
if m is not None:
    deplines = m.group("deplines")
    deps = [prefix_pattern.sub("", line) for line in deplines.splitlines()]
    print(deps)

['requests', 'click']

Hopefully you wouldn’t actually use regex to parse this, but it’s meant to show that this block-delimited format is still amenable to usage with simple tools available to all languages.

Edit: I extended this idea in a different post in the PEP 723 thread. Maybe it’s worth drafting a separate PEP?

pradyunsg · August 21, 2023, 12:02pm

As noted in PEP 723: Embedding pyproject.toml in single-file scripts - #141 by pradyunsg

I had a usecase today for “this script is useful to write and run with a few PyPI packages”. I took this opportunity to read both the PEPs and do a drive-through pretending that a magical script-run myscript.py command exists to run my script.

My main bit of feedback on PEP 722 is that it should better justify why it doesn’t have docstring support. The current language in the PEP is:

The most significant problem with this proposal is that it requires all consumers of the dependency data to implement a Python parser. Even if the syntax is restricted, the rest of the script will use the full Python syntax, and trying to define a syntax which can be successfully parsed in isolation from the surrounding code is likely to be extremely difficult and error-prone.

This argument is fairly weak in the context of docstrings. You don’t need to parse the rest of the document – docstrings are guarenteed to be the first bit of “code” in the file. And handling escapes can be optional – it is not necessary for locating a line that’s Script Dependencies: and parsing the indented section after. Sure, rf strings are weird but those are reasonable to exclude – there is a reasonable simplification here.

I say this in part because I had a docstring in my script already and I wrote:

"""[summary line]

[some more info about the script]

Script Dependencies:
    build
    pip
    httpx
"""

… only to realise that isn’t that the PEP permits. You can argue that this is me being dense and not understanding the PEP, but this was what triggered me diving into both PEPs.

It would be useful to either (a) split this heading to cover docstrings separately or (b) clarify the PEPs position with a slightly stronger argument.

PS: I realise that this PEP is “done”, so it’s OK if this isn’t actually changed – but it is a weak-ish argument in the PEP even in that case.

thejcannon · August 21, 2023, 5:45pm

As a singular datapoint from a tool, I just found out isort also reads from docstring: isort

""" my_module.py
    Best module ever

   isort:skip_file
"""

sirosen · August 22, 2023, 3:47am

I think I’ve been the most – or only – vocal proponent of using comments rather than docstrings in this and the 723 thread.

I’m most strongly against requiring the use of a docstring.

I and others like me (I presume there is such a class of users) use docstrings already as data. Perhaps I have some other tool which parses those strings, or maybe I’m only using it as the help text for a script. But either way, the docstring is a visible string at runtime and if I’m using it, my use could (and in my case, often would) conflict with some other spec using it.

If you accept that argument, then the question becomes one of why the docstring should be allowed as an alternative to a comment. I just don’t find it compelling that we need two ways to do this but perhaps there’s a strong argument in favor.

It would need to be strong enough to outweigh the risk that f-strings usage or other fancy usages could confuse users or spec implementers. To give a simple, perhaps silly, example of the kinds of ambiguities which need to be accounted for:

"""
Script dependencies:
""" + """
  requests
"""

Valid or invalid?

Comments better reduce the differences between the file as text and the file as a parsed AST or CST (although even then, there are differences).

pradyunsg · August 23, 2023, 6:32pm

I wasn’t going to respond here, but I have fallen for xkcd: Duty Calls

The point I made isn’t that docstrings are the right choice here but that the PEP makes a weak argument against them.

There is a lot of non-committal language here and it’s really tricky to engage with this productively. If I drop all the non-committal language here^[1] and break this up, you’re basically saying:

docstrings are already used as data, and
(you have a tool that parses docstrings OR use docstrings for help text), and
docstrings are visible at runtime, and
your use conflicts with some other spec

2 of those are facts (used as data, is visible at runtime), and use for help text is a known pattern of use for docstrings. Visibility at runtime and in help text is, arguably, a reason for using docstrings rather than comments here.

On “your use conflicts with some other spec”^[2] – that argument/risk also applies to special casing any specific format for writing comments. We’re discussing about content that used to be ignored and giving it semantic meaning. You could argue that it’s less likely with comments to cause issues and, you know what, sure. That’s primarily a judgement call and, arguably, a reasonable one.

But, again, that is not the argument made in the PEP. The argument made in the PEP is a weak one.

Edit: Also, taking a step back, it would be useful to have a concrete example of a usecase that would be broken by allowing docstrings as a format here. “It’s more complexity” is an argument, but that’s not the argument you’ve made in this part AFAICT.

IMO you’re making a strawman argument – ~no one does that and any tool behaving weirdly in this case is the expected outcome for most people.
It’s clearly weird and will have weird outcome (for anyone wondering, it’s not a docstring but a no-op expression adding two strings – running black will also clearly reflect that).

The PEP currently rejects that it is feasible even with syntax restrictions because of the rest of the file being difficult to parse, which is not a good argument. You are welcome to argue against allowing declarations in docstrings but that’s separate from whether the existing PEP makes a strong argument – it doesn’t IMO.

that’s my attempt at responding to a steelman argument for this. ↩︎
I’m assuming “your use” refers to some existing pattern of use of contents of a docstring (I don’t know what you’re doing or what “spec” you’re referring to) ↩︎

sirosen · August 23, 2023, 7:23pm

It’s definitely not my intent to make the conversation harder, so sorry about that! I only want to avoid presenting my situation as though it’s representative of a huge category of users – I caveated my position a little heavily because I don’t know how many users are leveraging docstrings the way that I do.

I understand this line of argument, but I disagree. If the value is visible at runtime, but the spec is written so as to avoid requiring a python parser (so as to better support non-python tools), then it introduces more possibilities for the runtime value to diverge from what the spec defines.

Maybe? What I had in mind was something like a tool which does something like yaml.load(__doc__.partition("---")[2]).
I know of tools like apispec which do this with wsgi apps and function docstrings – I don’t know of any tools which do this on script docstrings, but it is possible.

Perhaps I’m getting too far afield from my own use-cases with this argument, since it’s purely theoretical.

I chose a bad example of a dynamically created docstring. Here’s one I think is more likely:

f"""
Script dependencies:
{open("requirements.txt").read()}
"""

Interestingly, I just tried this out and it doesn’t work (__doc__ is None on py3.11), which I didn’t expect. You could do it very strangely with

"""
Script dependencies:
"""
__doc__ += open("requirements.txt").read()

But it certainly weakens my argument quite a lot.

No disagreement here; I haven’t looked carefully at the language in the last draft of the PEP, and this and the PEP 723 threads are so time consuming to try to keep up with that it’s hard to say with confidence that I know what’s been said where.

My core argument is only:

Because the docstring is a runtime value, users are using it. They may be parsing it, but they definitely are using it in forms like help=__doc__.

PEP 723’s proposal of __pyproject__ = """...""" introduced additional concerns about f-strings and dynamic data, but it seems that these don’t hold or hold more weakly because the docstring is more tightly constrained.