PEP 722: Dependency specification for single-file scripts

I agree with several of the points recently made, that this is not packaging, that it only standardizes an existing practice, etc.

But the note that this may be perceived – whether or not it’s technically correct – as “packaging” or “part of the packaging ecosystem” resonates with me particularly strongly.
Most python users are probably hazy about the boundary between “packaging” and “workflow tools” and so forth. For them, these are all just “packaging tools for python”.

This line of reasoning leads me to slightly favor PEP 722 over 723 if we must choose one.
My rationale is that by staying intentionally distant from pyproject.toml data, we better help users analogize the feature and its usage with requirements.txt files, which is more accurate than analogizing it with building a package with dependencies.


I’m very concerned that voices calling for “no new ways of doing things” effectively leads to stalled progress.

Rather than standardizing on a new behavior which enhances, replaces, and improves upon past art – like requirements.txt files – we’ll be stuck with only the current standards and no new tooling.


As for use of special comments vs any other mechanism…
There are mechanical problems with basically any other solution. This ground has been trod pretty thoroughly here and in the PEP 723 thread, but I’ll try to summarize.

  • it needs to be possible to parse the data in any language, not just python, so it can’t just be some runtime value or attribute
  • if the value is visible at runtime, the runtime value might not match the verbatim values seen in a file, leading to a misleading discrepancy between runtime information and the spec
  • shebangs, encoding comments, and other languages’ solutions to similar problems (e.g. embedded cargo.toml proposed for Rust) are a precedent for magic comments
  • multiline strings introduce additional questions and confusion for users, f-strings would not work and escaping rules become more complex
5 Likes

For me - PEP 723. It is much better alligned with existing ways and allows for much more

Please be sure to read the PEP and the existing discussion first. This has already been discussed and decided on. You’re coming in and restarting discussion that has already happened.

5 Likes

Sure, sorry about that.

Sorry if I missed this, but I didn’t see any discussion about using a block delimiter, like how Jekyll treats --- as a delimiter between the YAML header/metadata and the Markdown article content.

PEP 722 and 723 are obviously not the same, but they are both proposing the relatively novel feature of a metadata block embedded in comments. Maybe there’s room here to standardize a format for delimiting metadata from other comments?

Hypothetically:

#!/usr/bin/env python3

# My app!
#
# -*-
# Script Dependencies:
#   requests
#   click
# -*-
#
# Usage: ...

if __name__ == "__main__":
    print("Hello!")

You can still parse that out of the code with a single (absurd) multi-line regex: https://regex101.com/r/ECOTLu/1

import re

block_pattern = re.compile(r"""(?imx)
# Optional shebang
(?:^\#![^\r\n]+$(?:\r|\n|\r\n))?

# Optional blank and comment lines
(?:^[ \t]*.*$(?:\r|\n|\r\n))*?

# The opening delimiter
^\#[ \t]*-\*-[ \t]*$(?:\r|\n|\r\n)

# Header
^\#(?P<indent>[ \t]*)Script[ \t]*Dependencies:[ \t]*$(?:\r|\n|\r\n)

# Dependencies
(?P<deplines>(?:^\#(?P=indent)[ \t]*(?:[A-Z0-9][A-Z0-9._-]*[A-Z0-9]|[A-Z0-9])[ \t]*$(?:\r|\n|\r\n))+)

# Closing delimiter
^\#[ \t]*-\*-[ \t]*$
""")

line_prefix_pattern = re.compile(r"^[ \t]*#[ \t]*")

text = r"""
#!/usr/bin/env python3

# My app!
#
# -*-
# Script Dependencies:
#   requests
#   click
# -*-
#
# Usage: ...

if __name__ == "__main__":
    print("Hello!")
"""

m = pattern.match(text)
if m is not None:
    deplines = m.group("deplines")
    deps = [prefix_pattern.sub("", line) for line in deplines.splitlines()]
    print(deps)
['requests', 'click']

Hopefully you wouldn’t actually use regex to parse this, but it’s meant to show that this block-delimited format is still amenable to usage with simple tools available to all languages.

Edit: I extended this idea in a different post in the PEP 723 thread. Maybe it’s worth drafting a separate PEP?

1 Like

As noted in PEP 723: Embedding pyproject.toml in single-file scripts - #141 by pradyunsg

I had a usecase today for “this script is useful to write and run with a few PyPI packages”. I took this opportunity to read both the PEPs and do a drive-through pretending that a magical script-run myscript.py command exists to run my script.

My main bit of feedback on PEP 722 is that it should better justify why it doesn’t have docstring support. The current language in the PEP is:

The most significant problem with this proposal is that it requires all consumers of the dependency data to implement a Python parser. Even if the syntax is restricted, the rest of the script will use the full Python syntax, and trying to define a syntax which can be successfully parsed in isolation from the surrounding code is likely to be extremely difficult and error-prone.

This argument is fairly weak in the context of docstrings. You don’t need to parse the rest of the document – docstrings are guarenteed to be the first bit of “code” in the file. And handling escapes can be optional – it is not necessary for locating a line that’s Script Dependencies: and parsing the indented section after. Sure, rf strings are weird but those are reasonable to exclude – there is a reasonable simplification here.

I say this in part because I had a docstring in my script already and I wrote:

"""[summary line]

[some more info about the script]

Script Dependencies:
    build
    pip
    httpx
"""

… only to realise that isn’t that the PEP permits. You can argue that this is me being dense and not understanding the PEP, but this was what triggered me diving into both PEPs.

It would be useful to either (a) split this heading to cover docstrings separately or (b) clarify the PEPs position with a slightly stronger argument.

PS: I realise that this PEP is “done”, so it’s OK if this isn’t actually changed – but it is a weak-ish argument in the PEP even in that case. :sweat_smile:

4 Likes

As a singular datapoint from a tool, I just found out isort also reads from docstring: isort

""" my_module.py
    Best module ever

   isort:skip_file
"""
1 Like

I think I’ve been the most – or only – vocal proponent of using comments rather than docstrings in this and the 723 thread.

I’m most strongly against requiring the use of a docstring.

I and others like me (I presume there is such a class of users) use docstrings already as data. Perhaps I have some other tool which parses those strings, or maybe I’m only using it as the help text for a script. But either way, the docstring is a visible string at runtime and if I’m using it, my use could (and in my case, often would) conflict with some other spec using it.

If you accept that argument, then the question becomes one of why the docstring should be allowed as an alternative to a comment. I just don’t find it compelling that we need two ways to do this but perhaps there’s a strong argument in favor.

It would need to be strong enough to outweigh the risk that f-strings usage or other fancy usages could confuse users or spec implementers. To give a simple, perhaps silly, example of the kinds of ambiguities which need to be accounted for:

"""
Script dependencies:
""" + """
  requests
"""

Valid or invalid?

Comments better reduce the differences between the file as text and the file as a parsed AST or CST (although even then, there are differences).

2 Likes

I wasn’t going to respond here, but I have fallen for xkcd: Duty Calls

The point I made isn’t that docstrings are the right choice here but that the PEP makes a weak argument against them.

There is a lot of non-committal language here and it’s really tricky to engage with this productively. If I drop all the non-committal language here[1] and break this up, you’re basically saying:

  • docstrings are already used as data, and
  • (you have a tool that parses docstrings OR use docstrings for help text), and
  • docstrings are visible at runtime, and
  • your use conflicts with some other spec

2 of those are facts (used as data, is visible at runtime), and use for help text is a known pattern of use for docstrings. Visibility at runtime and in help text is, arguably, a reason for using docstrings rather than comments here.

On “your use conflicts with some other spec”[2] – that argument/risk also applies to special casing any specific format for writing comments. We’re discussing about content that used to be ignored and giving it semantic meaning. You could argue that it’s less likely with comments to cause issues and, you know what, sure. That’s primarily a judgement call and, arguably, a reasonable one.

But, again, that is not the argument made in the PEP. The argument made in the PEP is a weak one.

Edit: Also, taking a step back, it would be useful to have a concrete example of a usecase that would be broken by allowing docstrings as a format here. “It’s more complexity” is an argument, but that’s not the argument you’ve made in this part AFAICT.

  1. IMO you’re making a strawman argument – ~no one does that and any tool behaving weirdly in this case is the expected outcome for most people.
  2. It’s clearly weird and will have weird outcome (for anyone wondering, it’s not a docstring but a no-op expression adding two strings – running black will also clearly reflect that).

The PEP currently rejects that it is feasible even with syntax restrictions because of the rest of the file being difficult to parse, which is not a good argument. You are welcome to argue against allowing declarations in docstrings but that’s separate from whether the existing PEP makes a strong argument – it doesn’t IMO.


  1. that’s my attempt at responding to a steelman argument for this. ↩︎

  2. I’m assuming “your use” refers to some existing pattern of use of contents of a docstring (I don’t know what you’re doing or what “spec” you’re referring to) ↩︎

1 Like

It’s definitely not my intent to make the conversation harder, so sorry about that! I only want to avoid presenting my situation as though it’s representative of a huge category of users – I caveated my position a little heavily because I don’t know how many users are leveraging docstrings the way that I do.

I understand this line of argument, but I disagree. If the value is visible at runtime, but the spec is written so as to avoid requiring a python parser (so as to better support non-python tools), then it introduces more possibilities for the runtime value to diverge from what the spec defines.

Maybe? What I had in mind was something like a tool which does something like yaml.load(__doc__.partition("---")[2]).
I know of tools like apispec which do this with wsgi apps and function docstrings – I don’t know of any tools which do this on script docstrings, but it is possible.

Perhaps I’m getting too far afield from my own use-cases with this argument, since it’s purely theoretical.

I chose a bad example of a dynamically created docstring. Here’s one I think is more likely:

f"""
Script dependencies:
{open("requirements.txt").read()}
"""

Interestingly, I just tried this out and it doesn’t work (__doc__ is None on py3.11), which I didn’t expect. You could do it very strangely with

"""
Script dependencies:
"""
__doc__ += open("requirements.txt").read()

But it certainly weakens my argument quite a lot.

No disagreement here; I haven’t looked carefully at the language in the last draft of the PEP, and this and the PEP 723 threads are so time consuming to try to keep up with that it’s hard to say with confidence that I know what’s been said where.

My core argument is only:

Because the docstring is a runtime value, users are using it. They may be parsing it, but they definitely are using it in forms like help=__doc__.

PEP 723’s proposal of __pyproject__ = """...""" introduced additional concerns about f-strings and dynamic data, but it seems that these don’t hold or hold more weakly because the docstring is more tightly constrained.

This wouldn’t match the specification as set out in PEP 722, which requires static data:

The dependency block is identified by reading the script as a text file (i.e., the file is not parsed as Python source code)

(Emphasis in the original)

In general, only string literals (incl. raw strings) as stand-alone expressions are lifted into docstrings (e.g. Expr(value=Constant(...)), though I might be missing a case.


I think the fairly straightforward argument in favour of docstrings is that users (e.g. Pradyun) would expect it, though it does make parsing harder (", ', """, and ''' are all valid openers for a docstring, before enumerating all of the string prefix possibilities!).

A

3 Likes

I’m not going to update the PEP because (a) I’ve said it’s ready for review and I don’t want to keep changing it after that, and (b) the “Rejected Alternatives” section is already huge, and I don’t think there’s a lot of value in extending it further. However, I will respond here, for the record.

First of all, I do actually think this is covered adequately by Why not use (possibly restricted) Python syntax? That section as written is focused more on the __requirements__ = [...] idea, but that’s because that was the commonest suggestion of this form at the time, but it’s not exclusive to that form.

You’re misunderstanding my point here. To find the docstring, you need to identify the “first string literal in the file”. And to do that you either need a Python parser or you define a restricted syntax for the docstrings you are willing to parse. For example,

"""
Script dependencies:
    \N{Latin small letter r}equests
"""

is a valid docstring. Would you accept that as declaring a dependency on requests? Yes, you can find it easily enough, but you can’t parse it with just a regex, for example.

And if you do need a full Python parser, you hit all of the issues raised in the PEP about tying the client to a particular Python version, being hard to write clients in languages other than Python, the Python AST compiler having no easy to use “incremental” mode, etc, etc.

rf strings aren’t valid docstrings. So you’re safe there. But this emphasises my point that it’s really hard to be precise about the syntax if you define it in terms of Python constructs[1], and really hard to define something that evaluates to the same thing statically and at runtime if you don’t. And I think having something that has different values statically and at runtime is a bad source of potential bugs and confusion.

Yes, this is to some extent about weird edge cases, but IMO that’s the key job of a PEP, to address weird edge cases. Anyone can write something that “usually” works, but once it hits the reality of multiple implementations and users that each interpret the spec their own way, vagueness is a big problem.

Sure, it requires some familiarity to know how to specify your dependencies. But “allowing people to write a dependency block without any prior knowledge” was never a goal here. “You write your dependencies in a specially structured comment” isn’t a big thing to expect people to learn.

Also, people use __doc__ at runtime, and expecting those people to have to decide whether putting a dependency block in the docstring, where it will affect the existing usage of __doc__ (which is often user-visible) is a problem. Especially if there’s no alternative - and I hope you aren’t suggesting allowing a dependency block in the docstring as well as in a comment, because having two ways of doing this seems like it’s way too much.

I hope that explains why I don’t think the argument in the PEP is particularly weak, it’s just that it isn’t specifically framed in terms of docstrings.


  1. I searched for the precise specification of a module docstring in the Python docs, and it’s pretty hard to find - hard enough that I gave up after 10-15 minutes of looking. ↩︎

4 Likes

FWIW, I think it’s possible to allow dependencies in module docstrings without requiring a Python parser and without making the two ways to do it egregiously different: make the leading “#\s+” part of the header search regex optional. String escapes are already disallowed in the comment syntax, so it would be reasonable to disallow them even in docstrings.

The one additional restriction required on such a variation is that the dependency list would need to end at the first empty (or whitespace only) line in the docstring case, since “first subsequent non-comment line” wouldn’t be a viable terminator (a single or double quote character as the first non-whitespace character in a line would likely also need to be accepted as a docstring dependency list terminator).

That said, I don’t think it’s a good idea to do that immediately. The feature is distinct enough that it could be considered as a separate idea that depends on PEP 722, rather than being part of the baseline proposal.

@brettcannon re-reading this, I realise it may have been unclear, but I do consider this PEP ready for approval. If you hadn’t realised that, sorry and you can consider this my formal request for a pronouncement.

As it’s 2 months since I finalised the PEP, can you give a time frame for when you expect to be able to make a decision? It’s not particularly urgent for me, but I’d like to have an idea if possible, as I plan to submit PRs to implement PEP 722 in pipx and pip-run if it gets accepted, and knowing when to make time for that would help me.

2 Likes

You were clear and I had realized it. :slightly_smiling_face:

We have user studies scheduled for Friday (which also eats into my open source time, so no one expect PR reviews this week from me :sweat_smile:)! It unfortunately takes a lot of coordination to make this happen since we have to plan the structure of the user study, get approvals (this isn’t cheap to do), find the participants, schedule them, do the user study with each participant, and then summarize the overall results. I can’t speak for @courtneywebster as to how long it will take to do the summary, nor do I want to put any public pressure on her to rush things, so I will say we will post a summary as soon as a summary is ready.

I’m expecting public discussions about the results, so we will see where the leads us.

My hope is, worst case, a pronouncement by the end of October.

3 Likes

As @brettcannon mentioned, the user studies are scheduled for Friday, 09/22/2023. I will proactively commit to preparing a summary by the following week and encourage discussion based on any findings. :blush: Thanks for your patience as we continue to organize everything!

3 Likes

I should also mention Courtney may also reach out to some educators to get their feedback, but this is not a certainty and I also don’t know how long that would take. We are trying to be rather thorough on this, hence the long wait. :grin:

2 Likes

I have posted the summary of the user studies in the PEP 722 and PEP 723 User Study Discussion thread. I am happy to provide clarity on anything covered there and answer any questions that come up as a result of the study!

7 Likes

FYI I plan on making a decision between PEPs 722 and 723 the week of October 16. If you want a cross-PEP thread to discuss on, see PEP 722 and PEP 723 User Study Discussion .

6 Likes

I announced my decision as a PEP delegate at PEP 722/723 decision.