PEP 722: Dependency specification for single-file scripts

I also prefer the single hash # of the initial proposal, especially if the whole concept of arbitrary script metadata is dropped. I still haven’t heard any disagreement to accepting both PEP 722 and 723: if both were accepted, then the TOML format can handle arbitrary metadata, comments, etc, while this proposal is a very simple and inflexible way to specify dependencies.

3 Likes

I just noticed that’s also the way comments already work in requirement.txt files in pip: “Whitespace followed by a #

1 Like

Today I learnt that encoding comments don’t necessarily have these -*- markers according to 2. Lexical analysis — Python 3.11.4 documentation. They’re just usually put in by convention. That doesn’t totally shoot down the idea of using -*- markers, but does reduce the consistency it would bring since being as permissive as encoding comments (where # coding: utf8 is valid) would yield too many false positives. Especially since encoding comments are necessarily at the top of the file (first or second line) and dependency blocks aren’t, in this design.

Is a mapping between module paths and packages available from PyPI? What I’m thinking is that the information of what is required is (almost) all already available from a script. For example, if

import requests

fails, a command line switch might instruct CPython to look for a package that provides the requests module, instead of directly raising an ImportError. There could be cases where the same module is provided by multiple packages, in which case a special exception could be raised asking the user to annotate the import (only needed once per root module), e.g.

import yaml  # requires: ruamel.yaml

on anything along these lines that is easy to parse.

This should be re.match(r'^#\s*?\[(.*?)\]', line) to allow either space in-between # and the [ or none, but slow-mode isn’t letting me edit my original post. Sorry for the noise.

I’ve already brought that up above. I’m on mobile so it isn’t easy to find, but you should see it.

That’s also the subject of discussion here: Record the top-level names of a wheel in `METADATA`?

1 Like

It looks like there’s a trade-off between user-friendly syntax and parsing simplicity.
IMO, the syntax should user-friendliness – most importantly, avoiding syntax that looks like it should work but doesn’t.

One way to go is to limit this to dependencies only. But, IMO, we still need to at least think about how it’ll be combined with future/unofficial additions.


This is something @brettcannon might want to check with user studies, but I’m worried that no matter how well we document the syntax, users will want to combine blocks without blank lines, e.g.:

## Script Dependencies:
##    foo
##    bar
## GUI-Name: foo

Rather than ending a block with an empty line, I suggest using indentation. (The suggestion was part of my first post here, but it was a bit buried.)
Specifically, a line might continue a block if it starts with ## and:

  • it is indented by at least one more space than the header line, or
  • only contains the ## marker

Below is a draft implementation that makes a few more calls:

  • Comments are not part of the general metadata block syntax, but are handled in Script Dependencies, and use space-hash as a separator to allow # in URLs
  • Block names are normalized to avoid common mistakes (lowercased, runs of spaces/tabs are condensed to single space)
  • Tabs are expanded to spaces. (Python2-style indentation. IMO that’s the best we can do for an indentation-based format, especially one without warnings or syntax errors.)
Draft implementation
import tokenize
import re
from packaging.requirements import Requirement

def read_metadata_blocks(filename):
   """Generator for metadata blocks within a file
   
   Yields 3-tuples (block_type, extra, block_data).
   Whitespace is not stripped from block_data, so internal indentation is
   preserved.
   Tabs are expanded to spaces.
   """
   # Use the tokenize module to handle any encoding declaration.
   with tokenize.open(filename) as f:
      # Prefix of the currently parsed block.
      # `block_type`, `extra`, `block_data` are only valid if `prefix` is set,
      # that is, while parsing a block.
      prefix = None
      for raw_line in f:
           if not raw_line.startswith("##"):
               # End existing block (if any), ignore this line
               if prefix:
                   yield block_type, extra, block_data
                   prefix = None
               continue

           line = raw_line[2:].expandtabs()
           stripped_line = line.rstrip()
           if prefix and (not stripped_line or line.startswith(prefix)):
                # Continue existing block
                block_data.append(line[len(prefix) - 1:])
                continue

           # End existing block (if any), maybe start new one
           if prefix:
               yield block_type, extra, block_data
               prefix = None
           raw_block_type, sep, extra = stripped_line.partition(":")
           if not sep:
              continue
           extra = extra.strip()
           block_data = []
           prefix_len = len(raw_block_type) - len(raw_block_type.lstrip())
           prefix = line[:prefix_len] + ' '
           block_type = re.sub(' +', ' ', raw_block_type.strip().lower())

      # End last block (if any)
      if prefix:
          yield block_type, extra, block_data

def read_dependency_block(filename):
    for block_type, extra, data in read_metadata_blocks(filename):
        if block_type == 'script dependencies':
            for line in data:
                req_text = line.partition(' #')[0].strip()
                if req_text:
                    yield Requirement(req_text)
            break


As for flake8 & Black: If a PEP adds new syntax, linters need to adapt. And users who use the new feature need to update their tools.
Even if the feature is designed to work in older Python versions.


Consider rewriting the reference implementation to avoid the multiple iter trick, to make the algorithm more adaptable to non-Python languages.


FWIW, the reason I’d add this to my PEPs is to remind myself that I don’t want to do any editing for the PyPA PR, and to signal that the rest of the Specification section will be copied to the spec page verbatim.
(And, if there’s any intro or other fluff you want in the spec, I think it’s good to move it into the PEP’s Specification, so we know exactly what we’re getting.)

1 Like

That is a sensible approach. As previously discussed even if there are script files out there that have comments like this, there is no evidence these files are going to be called with a compatible runner (like pipx or pip-run).

In grep.app I cannot find matches: grep.app | code search
In github search we have a handful of matches, but not all of them match in a case sensitive way or follow the rule of contiguous lines (the ones that do look like early experiments on PEP 722).

2 Likes

Having taken some time to think about the options, and read the subsequent comments, I have decided to go back to the original proposal of a simple “Script Dependencies” comment block, with just a single # prefix. I’ve added support for inline comments, but dropped the whole “metadata blocks” extensibility idea.

I’m revising the PEP accordingly and will publish the new version in a day or so.

Some specific points:

I’m keeping this here to avoid splitting the discussion, and in fact I don’t think that “slow mode” will be an issue. There’s not really that much to discuss in terms of design - at this point, unless there’s a substantial problem necessitating another redesign[1] there’s not much more than bikeshedding to do. And while I’m open to suggestions for changes, unless a proposal gets a lot of votes, I need to just make a decision and stick with it. Specifically, we could debate forever over whether “Script Dependencies” is the right header text, but it’s unlikely to matter much in the long run. And yes, I went for case insensitive (and whitespace insensitive).

See the PEP (the original version covered this, it’s no different). Basically the block ends at a non-comment line. No indentation dependency.

I went for space-hash-space. I don’t actually think it’s that confusing.

The docstring’s been covered in the PEP (under “why not use Python syntax”) from the start, so no, that’s not going to happen, sorry. The docstring is explicitly mentioned (briefly) in that section: “Other suggestions include a static multi-line string, or including the dependencies in the script’s docstring”.

That’s up to Brett, not me, but I’d be surprised if he accepts both. The pushback from the “too many options” crowd would be immense (and probably justified, IMO).

No, I’ve added this as a rejected option to the PEP (thanks to @jeanas for the PR, all I needed to do was copy and paste :slightly_smiling_face:)

I’ve thought about it, we tried an option that allowed it, and IMO it caused more trouble than it was worth. I think a smaller, focused, proposal is better. Others may disagree (and PEP 723 is evidence that some people disagree pretty fundamentally!) but I’d rather not muddy the waters over this.

I’m going to stick with “just don’t do that”. If someone does user studies that clearly indicates that this is a crucial requirement, then we can adjust things, but I don’t have the resources to do those studies, and I’m not sure it’s the best use of whatever resources Brett might have.

A follow-up proposal for a # Gui-Name: foo block can pretty easily add a rule “The gui-name block terminates any preceding dependency block”. And the new block isn’t a valid PEP 508 requirement, so the combination isn’t a valid dependency block either, which means tools that don’t recognise the new block will fail fast rather than silently doing the wrong thing. IMO that’s sufficient.

As regards indentation, sorry I didn’t comment explicitly but I did consider it and I don’t see that it adds anything. Indented requirements look better, but I’m fine with that being a style matter. And the worst that will happen if a following comment gets misinterpreted as a dependency line is likely to be that it gets rejected for not conforming to PEP 508.

I wish I could agree, but again, people writing scripts for whom Python isn’t their core job may not have the luxury of using the latest tools. Why have a feature intended to make their life easier actually make it harder?

I take your point, but I think it’s useful to point out just how easy the algorithm is in idiomatic Python. After all, while I want to ensure that the block can be parsed in other languages, I expect that the majority of tools will be written in Python.

Good point, I’ve added this.


  1. I really hope not! ↩︎

8 Likes

Hi @pf_moore, A good plan forward. Let me know if you want help editing the PEP. Happy to help. Thanks!

1 Like

Can we have some syntax to specify the expected Python version, and to put explanatory comments in the requirements list? e.g.

# Python-Requires: >=3.10
# Script Dependencies:
#   requests
#   # 1.12.2 had a bug with frobbing the foobars
#   click >=1.10, != 1.12.2
3 Likes

I’m unsure if this needs to be standardised initially – perhaps if we expect a Python-manager type tool that would be able to install the correct version of Python, all dependencies, and then run it may be of more value; but I would question usefulness to tools until then.

(For readers, the python version could be put into the docstring, I don’t think there’s a requirement to standardise at this point.)

A

1 Like

(first quote below captures key points from @pf_moore’s most recent post rather than quoting the whole thing)

Given the linter issue for ## prefixed lines (and other variants on the same idea), I agree the simple “delimited metadata block without a dedicated line prefix” approach makes sense.

I agree with avoiding tracking the details of indent levels, as I view using indent tracking to detect block termination as adding complexity while introducing minimal clear value:

  • concatenated metadata blocks can still be allowed in the future by making headers terminate the previous metadata block (see next discussion)
  • requiring that lines in the block all be indented by the exact same amount would rule out nesting dependencies under category comments (e.g. indent the comments by 2, the actual dependencies by 4)
  • ignoring comment indenting, or only enforcing a minimum level of indenting would be weird (compared to just ignoring the indentation level entirely and using a different metadata block terminator)

By contrast, terminating the metadata block at the next non-comment line is clear and unambiguous, even for cases like:

# Script dependencies:
#   # Always needed
#     numpy
#   # Retrieving remote data
#     requests
#   # Dumping graphs
#     matplotlib

I mostly agree with Paul on this one - the “each metadata block header terminates any previously opened metadata block” rule can go in a future PEP that adds a second inline metadata header (perhaps the # Python-Requires: block suggested at various points in the discussion), so leaving it out here isn’t hindering forward compatibility significantly.

The one minor benefit I could see to defining that rule up front would be changing the way the following gets processed from an error complaining that Python-Requires: >=3.10 isn’t a valid dependency specifier to simply ignoring the unknown trailing metadata block:

# Script Dependencies:
#   requests
#   # 1.12.2 had a bug with frobbing the foobars
#   click >=1.10, != 1.12.2
# Python-Requires: >=3.10

Which would be more consistent with what will happen if the blocks are in the other order (as in @njs’s original example) and the unknown metadata block title gets ignored completely:

# Python-Requires: >=3.10
# Script Dependencies:
#   requests
#   # 1.12.2 had a bug with frobbing the foobars
#   click >=1.10, != 1.12.2

(Note: allowing metadata headers to terminate open blocks means that generalised parsing of metadata block headers would have a similar problem with URL syntax as comments did, but the problem is amenable to a similar solution: the block title trailing delimiter becomes ‘: followed by a newline or other whitespace’ rather than accepting any colon appearing anywhere on a line.)

While the PEP could explicitly define the # Python-Requires: block, I don’t think it’s a good idea to do that unless/until some script runners have seen sufficient demand for improved error or warning messages when running a script on a too-old version of Python (vs the status quo where scripts fail based on the actual incompatibility, and may even work without problems for a subset of their functionality if there is only a runtime dependency on the new version rather than a syntactic one)

4 Likes

Once we do live in the world where script runners could actually choose and set up a specific Python version easily (the one promised by PEP 711), I don’t see why the Python version should be treated specially. As far as I can see, it’s a dependency. And “special cases aren’t special enough to break the rules”. There’s no PyPI package just named python, and I’m pretty sure that’s supposed to be a PEP 541 excluded name; so why not just write e.g. python>=3.11 in the same list with everything else?

3 Likes

This doesn’t avoid the special casing, it just moves it from part of the format to being the job of tools that consume the format. The standard dependencies and the python version will likely be handled by separate tools or components even with something like PyBi providing standard binaries so at some point they would need to be separated anyway.

This could also lead users to expect to be able to declare their python version requirement like this in other formats where it is not supported.[1]


  1. In something like requirements.txt for instance. ↩︎

1 Like

OK, the revised (and hopefully final!) version of the PEP is now published, and available at PEP 722 – Dependency specification for single-file scripts | peps.python.org.

5 Likes

I don’t know if Petr was objecting to the nested-for-loop part or just reusing the handle (that’s not that weird in other languages, is it?). You could avoid the former with

for line in f:
    if re.match(DEPENDENCY_BLOCK_MARKER, line):
        break

for line in f:
    if not line.startswith("#"):
        break
    line = line.split(" # ", maxsplit=1)[0]
    line = line[1:].strip()
    if not line:
        break
    yield Requirement(line)

Which is just a rearrangement of the current example [1].


  1. obviously not critical to the PEP, just noticed while reading ↩︎

True. The nested loop was (I think) a holdover from when it was possible to have multiple blocks in the file. Having formally said that only the first valid block needs to be parsed does make the code less tricky.

There’s something a little unnerving in the first loop in your version, though. I think it’s because if there’s no dependency block, the second loop still gets executed (although it does nothing because we’re at EOF). Personally, I’d want to add comments to your version whereas I didn’t feel the need with mine. It’s very much just a coding style question, though.

But I don’t think this is what Petr was talking about, because both versions seem to me to be equally translatable (or not) to other languages.

Edit: I decided to go with your version, with a couple of comments, it is cleaner. Thanks. I also fixed a bug with the empty line handling (break rather than continue, terminating the block prematurely).

3 Likes

Running your reference implementation on the example you give only obtains:

rich
requests

Is it intended to stop on comment lines/blank lines? Or should that be continue instead of break after the split?

Edit: I think you edited to fix this as I replied.

1 Like

I’ve update the implementation in viv to use this revised spec.

I did however modify the reference implementation in the currently rendered version since based on the reading of the spec I think it’s supposed to continue rather than break on comment-only lines, but I might be misinterpreting the expected behavior.

Folks can test locally using python3 <(curl -fsSL viv.dayl.in/viv.py) run --script ./script_w_deps_block.py if they’d like.