PEP 722: Dependency specification for single-file scripts

My requirements.txt files for larger projects are full of comments. Most often I’m commenting things out (they’re for testing, or maybe an alternate implementation), but occasionally notes like “# remove the next two dependencies when we finally drop Oracle support”. So I’d definitely like to see comments allowed.

2 Likes

Hit the nail on the head and I fully agree with everything else you’ve written in this post.

I would also vote that comments would be a useful addition. I know I would like to leave myself comments in a couple cases:

  • Why a dependency was added/needed
  • A reminder of why I pinned a specific version or put an upper/lower bound on a version
  • TODO comments about dropping or swapping to different dependencies
1 Like

I do think allowing comments would be an improvement.

Doesn’t that have to be specified either way? In fact it seems it already is. The PEP says:

The block type is all of the text on the initial line, up to the colon.

To me that means that any whitespace before the colon would be part of the block type. I can see this coming up in practice, since some people like to put whitespace in unexpected places (e.g., if foo == 2 :) and then some tool processing the script deps may fail because it’s looking for a block type of “Script Dependencies” and not "Script Dependencies ". So I think we can either ignore, or parse, or disallow (i.e., error) on whitespace there, but leaving it up to individual tools seems a bit dubious to me.

Again I think that has to be specified, but the good news it is also already is. :slight_smile: That is, the PEP says the block type has to be Script Dependencies and that has one space so it has to have just one space. Again, if that’s not desired I think it’d be better to make it explicit. An alternative is to use a hyphen or underscore where there’s less scope for ambiguity.

I’m not so sure about that, for instance with the whitespace-before-colon case mentioned above. But in any case “parsing simplicity” can play out in different ways. If this is going to parsed with a regex, for instance, there’s not much additional complexity either way if the regex has \s* in a certain place or doesn’t.

Also, personally I’d rather think of it in terms of parsing simplicity for the user not the author of the parsing tool. In other words, what is to be avoided is people getting tangled in odd errors because they added a space here or there. One way to achieve that is to explicitly allow looseness in various places (e.g., whitespace before the colon), so that tools must be generous in what they accept. Another way is to explicitly disallow it, so that tools can raise a specific, informative error right away, which usually will still get the user past the problem quickly. But just leaving it up to the tool author to decide what slight variations to accept doesn’t seem best to me.

I agree that this is a good reason for requiring dependencies to be in a single block.

Agreed, another +1 for comments from me, for things like “Pinned to version x until https://example.com/project/pulls/123 is released”.

+1 again. I’d probably prefer writing in sentence case with a lowercase “d” , and there are people who prefer everything in lowercase. Let’s avoid the hassle of case sensitivity.

4 Likes

I am also +1 for allowing comments. Knowing why a dependency should be version N and not N+1 or N-1 is useful. Comments often clarify intent, and this applies to dependency files as well as “regular” code.

All of the examples posted so far are ones that I’ve also seen. Another common one is noting a transitive / diamond dependency conflict, where it’s not obvious why two dependencies wouldn’t be compatible.

That’s also a problem for the dependencies case, as name @ url is a valid PEP 508 requirement (see the second example here.

There’s clearly a lot of support for having comments, but it seems they aren’t as simple as they seem at first glance (specifically, @ncoghlan’s proposal as it stands is flawed because it doesn’t support the URL case).

And worse than that, there’s a different error (E265) that reports any comment that doesn’t start with precisely a hash and a space. As far as I can tell, this is motivated by a ridiculously strict reading of PEP 8. Worse still, it looks like ruff has this rule, too - thankfully off by default, but there’s no guarantee that will remain the case.

And while black doesn’t reformat ## comments (or #!), it does add a space in #%, #=, #[ and #] (and I suspect, anything other than space, # or ! after the first hash). So that blocks basically all of the “obvious” alternatives to ##.

There’s a lot to consider here, and it’s not obvious what the best solution is. One possibility is to revert to essentially the original proposal, which follows pipx and pip-run - a single hash, no comments, no internal blank lines:

# Script Dependencies:
#   requests
#   click

It’s limited, certainly. But it’s free of the flaws identified above, it’s proven in real world use, and it’s simple - which still remains a key goal for me.

There are other options, but I’m not particularly happy with any of them:

  • Picking delimiters until we get one that seems to work seems both risky (because of that “seems to”) and arbitrary.
  • Treating the flake8 issue as their problem, while it’s what I’d like to do, feels like it’ll just cause confusion and frustration for users.
  • There’s even the possibility of using TOML embedded in a specially formatted comment. But that still has all the problems I raise in the PEP about using TOML, and no-one has come up with a satisfying counter-argument to those so far (I’m discounting the people supporting “embedded pyproject.toml”, as they have mostly been simply saying they disagree, and not actually trying to persuade me to change PEP 722).

I need to think some more about this. I’d appreciate any feedback that people can provide, as long as it’s focused on just finding an alternative “embedded comment” syntax. People who prefer an embedded pyproject.toml have PEP 723, and people who want to debate PEP processes have a separate thread to pursue that discussion in. So it would make my life a lot easier if we could focus feedback on just this issue for now, until it’s resolved in a way that people interested in supporting PEP 722 are happy with.

When I’ve formulated my own thoughts, I’ll post them in a new thread (I think the “slow mode” rule of 8 hours between posts will make it too hard to discuss options, and hopefully by starting a new thread, we can keep things on topic without needing moderator intervention).

2 Likes

(Tone: slightly exasperated at these linting tools, none of which I use myself): would # # work?

Is this dependent on the indentation? Or how do the existing tools decide where the dependency list ends?

(I think TOML for anything other than “embedded pyproject.toml” - or whatever restricted variation thereupon - is a non-starter, the worst of all worlds.)

About comments vs. URLs: Maybe use " #" instead of "#" for comments? Downside: Potentially confusing.

About ## vs. linters: Ignoring the linter question seems acceptable to me. They will adapt. Rules can be disabled.

Using a single # seems preferable to me, I don’t really see the point of a special marker in the first place.
Type checkers like mypy already support directives in comments like # type: ignore, without a special magic super-comment marker and I’m not aware of any problems. (But maybe there are?)

I’d maybe even weakly prefer to put the dependencies in the doc string, and avoiding even the single #. This allows copying the output of pip freeze directly without needing to edit each line. This still doesn’t require parsing Python syntax, e.g. just find the special line Script dependencies: and indented lines that follow.

The less confusing syntax to remember the better.

If the discussion stretches into April next year, we can propose:

# 📦 numpy
3 Likes

Methinks that in the optics of a comment-based approach, the idea above of

# -*- dependencies -*-
#   requests
#   rich

would be the least arbitrary, since it borrows the existing syntax of encoding comments :slight_smile:

(I hope you don’t mind me bringing this up even though you said in the moment that you didn’t like it, because it hasn’t been discussed much while it seems like a logical alternative given its precedent, and because it resonates a lot with “arbitrary”.)

3 Likes

I prefer the single comment over double-comments anyway. Keep this simple and don’t fight the linters/formatters.

Maybe use [ ] for the meta-block heading, like an embedded ini file.

A blank line designates the meta-block ending.

import re

def parse_meta_block(text, section):
    meta_block = []
    lines = text.split('\n')
    in_block = False

    for line in lines:
        line = line.strip()

        if not line:
            if in_block:
                return meta_block
            in_block = False
            continue
        
        match = re.match(r'^#?\s*\[(.*?)\]', line)
        if match:
            if match.group(1).lower() == section.lower():
                in_block = True
            else:
                in_block = False
            continue
        
        if in_block:
            parts = line.split('#')
            value = parts[1].strip()
            if value:
                meta_block.append(value)

    return meta_block

text = '''
# [Script Dependencies]
# requests=1.0 # comment
# #comment
# rich>=0.1
# package[sub]

# other comment. Not part of “Script Dependencies” meta-block.
'''

section = "Script Dependencies"
result = parse_meta_block(text, section)
print(result) ## ['requests=1.0', 'rich>=0.1', 'package[sub]']

I think this is the simplest option. I am however partial to the start of the block with just a case-insensitive Dependencies: over Script Dependencies:.

One option for the spec would be to only support block comments to prevent # from being an issue in URL’s. But I could see this creating confusion for users if they are used to leaving inline comments in a requirements.txt file.

I also prefer the single hash # of the initial proposal, especially if the whole concept of arbitrary script metadata is dropped. I still haven’t heard any disagreement to accepting both PEP 722 and 723: if both were accepted, then the TOML format can handle arbitrary metadata, comments, etc, while this proposal is a very simple and inflexible way to specify dependencies.

3 Likes

I just noticed that’s also the way comments already work in requirement.txt files in pip: “Whitespace followed by a #

1 Like

Today I learnt that encoding comments don’t necessarily have these -*- markers according to 2. Lexical analysis — Python 3.11.4 documentation. They’re just usually put in by convention. That doesn’t totally shoot down the idea of using -*- markers, but does reduce the consistency it would bring since being as permissive as encoding comments (where # coding: utf8 is valid) would yield too many false positives. Especially since encoding comments are necessarily at the top of the file (first or second line) and dependency blocks aren’t, in this design.

Is a mapping between module paths and packages available from PyPI? What I’m thinking is that the information of what is required is (almost) all already available from a script. For example, if

import requests

fails, a command line switch might instruct CPython to look for a package that provides the requests module, instead of directly raising an ImportError. There could be cases where the same module is provided by multiple packages, in which case a special exception could be raised asking the user to annotate the import (only needed once per root module), e.g.

import yaml  # requires: ruamel.yaml

on anything along these lines that is easy to parse.

This should be re.match(r'^#\s*?\[(.*?)\]', line) to allow either space in-between # and the [ or none, but slow-mode isn’t letting me edit my original post. Sorry for the noise.

I’ve already brought that up above. I’m on mobile so it isn’t easy to find, but you should see it.

That’s also the subject of discussion here: Record the top-level names of a wheel in `METADATA`?

1 Like

It looks like there’s a trade-off between user-friendly syntax and parsing simplicity.
IMO, the syntax should user-friendliness – most importantly, avoiding syntax that looks like it should work but doesn’t.

One way to go is to limit this to dependencies only. But, IMO, we still need to at least think about how it’ll be combined with future/unofficial additions.


This is something @brettcannon might want to check with user studies, but I’m worried that no matter how well we document the syntax, users will want to combine blocks without blank lines, e.g.:

## Script Dependencies:
##    foo
##    bar
## GUI-Name: foo

Rather than ending a block with an empty line, I suggest using indentation. (The suggestion was part of my first post here, but it was a bit buried.)
Specifically, a line might continue a block if it starts with ## and:

  • it is indented by at least one more space than the header line, or
  • only contains the ## marker

Below is a draft implementation that makes a few more calls:

  • Comments are not part of the general metadata block syntax, but are handled in Script Dependencies, and use space-hash as a separator to allow # in URLs
  • Block names are normalized to avoid common mistakes (lowercased, runs of spaces/tabs are condensed to single space)
  • Tabs are expanded to spaces. (Python2-style indentation. IMO that’s the best we can do for an indentation-based format, especially one without warnings or syntax errors.)
Draft implementation
import tokenize
import re
from packaging.requirements import Requirement

def read_metadata_blocks(filename):
   """Generator for metadata blocks within a file
   
   Yields 3-tuples (block_type, extra, block_data).
   Whitespace is not stripped from block_data, so internal indentation is
   preserved.
   Tabs are expanded to spaces.
   """
   # Use the tokenize module to handle any encoding declaration.
   with tokenize.open(filename) as f:
      # Prefix of the currently parsed block.
      # `block_type`, `extra`, `block_data` are only valid if `prefix` is set,
      # that is, while parsing a block.
      prefix = None
      for raw_line in f:
           if not raw_line.startswith("##"):
               # End existing block (if any), ignore this line
               if prefix:
                   yield block_type, extra, block_data
                   prefix = None
               continue

           line = raw_line[2:].expandtabs()
           stripped_line = line.rstrip()
           if prefix and (not stripped_line or line.startswith(prefix)):
                # Continue existing block
                block_data.append(line[len(prefix) - 1:])
                continue

           # End existing block (if any), maybe start new one
           if prefix:
               yield block_type, extra, block_data
               prefix = None
           raw_block_type, sep, extra = stripped_line.partition(":")
           if not sep:
              continue
           extra = extra.strip()
           block_data = []
           prefix_len = len(raw_block_type) - len(raw_block_type.lstrip())
           prefix = line[:prefix_len] + ' '
           block_type = re.sub(' +', ' ', raw_block_type.strip().lower())

      # End last block (if any)
      if prefix:
          yield block_type, extra, block_data

def read_dependency_block(filename):
    for block_type, extra, data in read_metadata_blocks(filename):
        if block_type == 'script dependencies':
            for line in data:
                req_text = line.partition(' #')[0].strip()
                if req_text:
                    yield Requirement(req_text)
            break


As for flake8 & Black: If a PEP adds new syntax, linters need to adapt. And users who use the new feature need to update their tools.
Even if the feature is designed to work in older Python versions.


Consider rewriting the reference implementation to avoid the multiple iter trick, to make the algorithm more adaptable to non-Python languages.


FWIW, the reason I’d add this to my PEPs is to remind myself that I don’t want to do any editing for the PyPA PR, and to signal that the rest of the Specification section will be copied to the spec page verbatim.
(And, if there’s any intro or other fluff you want in the spec, I think it’s good to move it into the PEP’s Specification, so we know exactly what we’re getting.)

1 Like

That is a sensible approach. As previously discussed even if there are script files out there that have comments like this, there is no evidence these files are going to be called with a compatible runner (like pipx or pip-run).

In grep.app I cannot find matches: grep.app | code search
In github search we have a handful of matches, but not all of them match in a case sensitive way or follow the rule of contiguous lines (the ones that do look like early experiments on PEP 722).

2 Likes

Having taken some time to think about the options, and read the subsequent comments, I have decided to go back to the original proposal of a simple “Script Dependencies” comment block, with just a single # prefix. I’ve added support for inline comments, but dropped the whole “metadata blocks” extensibility idea.

I’m revising the PEP accordingly and will publish the new version in a day or so.

Some specific points:

I’m keeping this here to avoid splitting the discussion, and in fact I don’t think that “slow mode” will be an issue. There’s not really that much to discuss in terms of design - at this point, unless there’s a substantial problem necessitating another redesign[1] there’s not much more than bikeshedding to do. And while I’m open to suggestions for changes, unless a proposal gets a lot of votes, I need to just make a decision and stick with it. Specifically, we could debate forever over whether “Script Dependencies” is the right header text, but it’s unlikely to matter much in the long run. And yes, I went for case insensitive (and whitespace insensitive).

See the PEP (the original version covered this, it’s no different). Basically the block ends at a non-comment line. No indentation dependency.

I went for space-hash-space. I don’t actually think it’s that confusing.

The docstring’s been covered in the PEP (under “why not use Python syntax”) from the start, so no, that’s not going to happen, sorry. The docstring is explicitly mentioned (briefly) in that section: “Other suggestions include a static multi-line string, or including the dependencies in the script’s docstring”.

That’s up to Brett, not me, but I’d be surprised if he accepts both. The pushback from the “too many options” crowd would be immense (and probably justified, IMO).

No, I’ve added this as a rejected option to the PEP (thanks to @jeanas for the PR, all I needed to do was copy and paste :slightly_smiling_face:)

I’ve thought about it, we tried an option that allowed it, and IMO it caused more trouble than it was worth. I think a smaller, focused, proposal is better. Others may disagree (and PEP 723 is evidence that some people disagree pretty fundamentally!) but I’d rather not muddy the waters over this.

I’m going to stick with “just don’t do that”. If someone does user studies that clearly indicates that this is a crucial requirement, then we can adjust things, but I don’t have the resources to do those studies, and I’m not sure it’s the best use of whatever resources Brett might have.

A follow-up proposal for a # Gui-Name: foo block can pretty easily add a rule “The gui-name block terminates any preceding dependency block”. And the new block isn’t a valid PEP 508 requirement, so the combination isn’t a valid dependency block either, which means tools that don’t recognise the new block will fail fast rather than silently doing the wrong thing. IMO that’s sufficient.

As regards indentation, sorry I didn’t comment explicitly but I did consider it and I don’t see that it adds anything. Indented requirements look better, but I’m fine with that being a style matter. And the worst that will happen if a following comment gets misinterpreted as a dependency line is likely to be that it gets rejected for not conforming to PEP 508.

I wish I could agree, but again, people writing scripts for whom Python isn’t their core job may not have the luxury of using the latest tools. Why have a feature intended to make their life easier actually make it harder?

I take your point, but I think it’s useful to point out just how easy the algorithm is in idiomatic Python. After all, while I want to ensure that the block can be parsed in other languages, I expect that the majority of tools will be written in Python.

Good point, I’ve added this.


  1. I really hope not! ↩︎

8 Likes