PEP 722: Dependency specification for single-file scripts

njs · August 8, 2023, 1:35am

Can we have some syntax to specify the expected Python version, and to put explanatory comments in the requirements list? e.g.

# Python-Requires: >=3.10
# Script Dependencies:
#   requests
#   # 1.12.2 had a bug with frobbing the foobars
#   click >=1.10, != 1.12.2

AA-Turner · August 8, 2023, 5:46am

I’m unsure if this needs to be standardised initially – perhaps if we expect a Python-manager type tool that would be able to install the correct version of Python, all dependencies, and then run it may be of more value; but I would question usefulness to tools until then.

(For readers, the python version could be put into the docstring, I don’t think there’s a requirement to standardise at this point.)

A

ncoghlan · August 8, 2023, 5:56am

(first quote below captures key points from @pf_moore’s most recent post rather than quoting the whole thing)

Given the linter issue for ## prefixed lines (and other variants on the same idea), I agree the simple “delimited metadata block without a dedicated line prefix” approach makes sense.

I agree with avoiding tracking the details of indent levels, as I view using indent tracking to detect block termination as adding complexity while introducing minimal clear value:

concatenated metadata blocks can still be allowed in the future by making headers terminate the previous metadata block (see next discussion)
requiring that lines in the block all be indented by the exact same amount would rule out nesting dependencies under category comments (e.g. indent the comments by 2, the actual dependencies by 4)
ignoring comment indenting, or only enforcing a minimum level of indenting would be weird (compared to just ignoring the indentation level entirely and using a different metadata block terminator)

By contrast, terminating the metadata block at the next non-comment line is clear and unambiguous, even for cases like:

# Script dependencies:
#   # Always needed
#     numpy
#   # Retrieving remote data
#     requests
#   # Dumping graphs
#     matplotlib

I mostly agree with Paul on this one - the “each metadata block header terminates any previously opened metadata block” rule can go in a future PEP that adds a second inline metadata header (perhaps the # Python-Requires: block suggested at various points in the discussion), so leaving it out here isn’t hindering forward compatibility significantly.

The one minor benefit I could see to defining that rule up front would be changing the way the following gets processed from an error complaining that Python-Requires: >=3.10 isn’t a valid dependency specifier to simply ignoring the unknown trailing metadata block:

# Script Dependencies:
#   requests
#   # 1.12.2 had a bug with frobbing the foobars
#   click >=1.10, != 1.12.2
# Python-Requires: >=3.10

Which would be more consistent with what will happen if the blocks are in the other order (as in @njs’s original example) and the unknown metadata block title gets ignored completely:

# Python-Requires: >=3.10
# Script Dependencies:
#   requests
#   # 1.12.2 had a bug with frobbing the foobars
#   click >=1.10, != 1.12.2

(Note: allowing metadata headers to terminate open blocks means that generalised parsing of metadata block headers would have a similar problem with URL syntax as comments did, but the problem is amenable to a similar solution: the block title trailing delimiter becomes ‘: followed by a newline or other whitespace’ rather than accepting any colon appearing anywhere on a line.)

While the PEP could explicitly define the # Python-Requires: block, I don’t think it’s a good idea to do that unless/until some script runners have seen sufficient demand for improved error or warning messages when running a script on a too-old version of Python (vs the status quo where scripts fail based on the actual incompatibility, and may even work without problems for a subset of their functionality if there is only a runtime dependency on the new version rather than a syntactic one)

kknechtel · August 8, 2023, 6:08am

Once we do live in the world where script runners could actually choose and set up a specific Python version easily (the one promised by PEP 711), I don’t see why the Python version should be treated specially. As far as I can see, it’s a dependency. And “special cases aren’t special enough to break the rules”. There’s no PyPI package just named python, and I’m pretty sure that’s supposed to be a PEP 541 excluded name; so why not just write e.g. python>=3.11 in the same list with everything else?

DavidCEllis · August 8, 2023, 1:50pm

This doesn’t avoid the special casing, it just moves it from part of the format to being the job of tools that consume the format. The standard dependencies and the python version will likely be handled by separate tools or components even with something like PyBi providing standard binaries so at some point they would need to be separated anyway.

This could also lead users to expect to be able to declare their python version requirement like this in other formats where it is not supported.^[1]

In something like requirements.txt for instance. ↩︎

pf_moore · August 8, 2023, 7:22pm

OK, the revised (and hopefully final!) version of the PEP is now published, and available at PEP 722 – Dependency specification for single-file scripts | peps.python.org.

jamestwebber · August 8, 2023, 7:42pm

I don’t know if Petr was objecting to the nested-for-loop part or just reusing the handle (that’s not that weird in other languages, is it?). You could avoid the former with

for line in f:
    if re.match(DEPENDENCY_BLOCK_MARKER, line):
        break

for line in f:
    if not line.startswith("#"):
        break
    line = line.split(" # ", maxsplit=1)[0]
    line = line[1:].strip()
    if not line:
        break
    yield Requirement(line)

Which is just a rearrangement of the current example ^[1].

obviously not critical to the PEP, just noticed while reading ↩︎

pf_moore · August 8, 2023, 8:17pm

True. The nested loop was (I think) a holdover from when it was possible to have multiple blocks in the file. Having formally said that only the first valid block needs to be parsed does make the code less tricky.

There’s something a little unnerving in the first loop in your version, though. I think it’s because if there’s no dependency block, the second loop still gets executed (although it does nothing because we’re at EOF). Personally, I’d want to add comments to your version whereas I didn’t feel the need with mine. It’s very much just a coding style question, though.

But I don’t think this is what Petr was talking about, because both versions seem to me to be equally translatable (or not) to other languages.

Edit: I decided to go with your version, with a couple of comments, it is cleaner. Thanks. I also fixed a bug with the empty line handling (break rather than continue, terminating the block prematurely).

DavidCEllis · August 8, 2023, 8:58pm

Running your reference implementation on the example you give only obtains:

rich
requests

Is it intended to stop on comment lines/blank lines? Or should that be continue instead of break after the split?

Edit: I think you edited to fix this as I replied.

daylinmorgan · August 8, 2023, 9:54pm

I’ve update the implementation in viv to use this revised spec.

I did however modify the reference implementation in the currently rendered version since based on the reading of the spec I think it’s supposed to continue rather than break on comment-only lines, but I might be misinterpreting the expected behavior.

Folks can test locally using python3 <(curl -fsSL viv.dayl.in/viv.py) run --script ./script_w_deps_block.py if they’d like.

brettcannon · August 8, 2023, 10:31pm

Nope. See Record the top-level names of a wheel in `METADATA`? - #52 by thejcannon as a discussion about recording at least the top-level names.

I too would be surprised. But rejection of both PEPs is also possible, so who knows. As of right now I’m trying not to bias myself until we can try some user studies and see what the reactions are (I already have an opinion simply based on personal experience, but I want to avoid as much bias a possible in making the final decision as I can by not unconsciously discounting any feedback we get from the target audience of the PEPs).

Honestly, the August 14th was more to make sure Ofek was serious enough to write a PEP and to try and get overall PEP discussions done without them dragging on for a month. I definitely do not consider that deadline a hard one but more of an aspirational one. It seems this topic, though, is nicely staying on-topic and reaching convergence, so I’m not concerned about PEP 722 (I haven’t read the PEP 723 thread yet, though ).

pf_moore · August 14, 2023, 12:31pm

OK, the discussion here seems to have died down, and we’ve readhed the 14th, so I’m going to say that PEP 722 is ready for approval.

@brettcannon I’m happy if you want to wait to give PEP 723 some additional time, or if you want to delay in case there’s still a possibility of @ofek and I coming up with some sort of merged proposal. There’s no rush here, I simply wanted to formally confirm that PEP 722 is ready when you are.

DavidCEllis · August 15, 2023, 2:04pm

I’ve been exploring making a basic tool to launch scripts based on this specification, plus a non-standard x-requires-python block that gets used with ‘pyenv’ or ‘py’ to find the appropriate python executable. (I probably won’t make it build the appropriate Python with pyenv if it’s missing, but I may make it output the command you would need).

With respect to a proposed TOML based format from a merged proposal I’d note that despite implementing this in Python I’ve tried to make the time from start to running a script when a cached venv can be used as fast as possible^[1] and just by importing a toml parser library this takes twice as long before doing any parsing^[2]. This probably doesn’t matter if you’ve decided to implement such a thing in rust, and you may consider the overall time to still be small enough not to care, but I did want to point it out.

This is somewhat limited by the launch time of Python itself, but it’s easy to make it much slower by importing certain modules. ↩︎
I tested rtoml, pytomlpp, tomllib and tomlkit on my hardware - 2x was the best case. ↩︎

ofek · August 15, 2023, 2:43pm

This interests me greatly because responsiveness of the Hatch CLI is something I try to optimize. Do you have stats on the import times of each library that you tested?

jeanas · August 15, 2023, 2:48pm

The case that needs to be as fast as possible is the frequent case where you iterate many times on your script and/or reuse it many times, but without changing the metadata block. You can detect that case by using the dependency block string as cache key, and just skip importing a TOML library if the metadata is the same string as the previous run.

(Also, you’re comparing code that you have purposefully optimized for startup time with code that might not have received startup time optimization.)

DavidCEllis · August 15, 2023, 3:49pm

It depends what the TOML block ends up looking like as to whether the cache of the exact text is enough. (I’d like to share the env between scripts with the same dependencies so I need it not to have any extra unnecessary details). Perhaps the current proposed [run] block will be fine, but the proposal seems to have changed every time I look at that thread.

The current code parses the block and compares the parsed details to a cache and can do that before any of the toml libraries I’ve found have finished importing. Skipping the parsing step in the initial comparison is a possibility but it’s not necessary with the PEP 722 format. (I’m not going to write a TOML parser just to optimise it for import time for this one use case but I don’t think that’s what you were suggesting).

It’s hard to say what the impact would be in the context of hatch. For instance tomllib looks to be somewhere in the region of 2%^[1] of your start time based on python -Ximporttime -c "import hatch.cli". However unlike my tool, hatch is already importing some of the dependencies tomllib requires. So for instance import tomllib might be a 2ms import for hatch, but a 22ms import for a new project. I don’t think you’d see any noticeable difference with any of the other libraries (except tomlkit would probably take longer).

I’m not claiming import time is the most important thing in general, just that I’d like to keep it down for something like this that is intended to launch small scripts if possible.

This is just on my development machine, which is not a super stable benchmarking tool. ↩︎

jamestwebber · August 15, 2023, 3:58pm

Hm, in my mind the most important case is when the script is not changing at all. That’s the “simple distribution (via email or something)” use case that is a major motivating factor here. I would think that the developer of the script would already have an environment with the dependencies, in many cases.

I guess this depends on workflow and speed is always nice. I’m just pointing this out because if I were hellbent on optimizing time-to-launch, I’d consider checking the file’s mtime before parsing anything.

pf_moore · August 15, 2023, 4:31pm

Thanks for this - this sort of practical experience is extremely useful for ensuring the final standard is as good as we can make it.

Personally, I do consider startup time of importance. I’ve been looking at how to design reusable environments so that we don’t need to install anything that’s already available^[1] so I’d really like it if I didn’t lose any time I gain from that to importing a TOML parser…

yes, I know I’m reinventing nix ↩︎

pf_moore · August 15, 2023, 4:36pm

@brettcannon I think I need to add two more things based on comments that came up today.

Time taken to parse TOML to the “Rejected option” of using TOML.
“Just have a runtime function to install dependencies” to “Rejected options”.

Neither of these are critical to the approval process, but I’d like to add them for completeness. I’ll try to do them tomorrow.

jamestwebber · August 15, 2023, 4:41pm

Out of curiosity, how are you timing this? Maybe I’m doing it wrong but python -m timeit "import tomli" isn’t nearly that slow (on my machine).

edit: ah yes, per below this is definitely caching the import. timeit is not for this I guess!