PEP 722: Dependency specification for single-file scripts

ofek · August 4, 2023, 11:00pm

I don’t have much time to articulate a proper response but I think this is a very astute observation that I didn’t see myself until you said it.

I’m not at all saying that those who disagree with me are not doing this, but for me personally I am looking at absolutely everything through that decade or longer lens which is why I am putting a high value on cross tool interoperability/unification and UX. If as a user some task is arduous when there could be a simple workflow or something is not beautiful when it most certainly could be (like IDE syntax highlighting of this new section) I basically de-rank that heavily in my mind.

Perhaps that’s not a good mindset? I don’t know, but that is almost all I care about and I would have to be persuaded to change it.

davidism · August 5, 2023, 12:03am

A post was split to a new topic: My thoughts on the PEP process

pradyunsg · August 4, 2023, 11:27pm

GitHub - David-OConnor/pyflow: An installation and dependency system for Python did that quite a while back (at least 2 years ago).

davidism · August 5, 2023, 12:00am

Hey all, I was hoping that slow mode would cause you to reflect on what is on topic here more instead of falling into off topic discussion, but despite that you’re still getting off topic. This is not the place to discuss the PEP process, or what other installers exist or do, or respond to posts that are not about (or directly ignore) what’s stated by the PEP. Provide feedback about this PEP here, not about other things that could be PEPs or meta discussions instead.

davidism · August 5, 2023, 1:39am

A post was split to a new topic: What is the py launcher for Linux?

jamesdow21 · August 5, 2023, 1:02am

Magics are defined by the IPython kernel (which is usually, but not always, the kernel being used in a Jupyter notebook) . They can either be single line magic commands with %, or cell block magics with %%

Spyder recognizes “code cells” with either #%% or # %% to let you run small chunks of a script in an Interactive mode.
VS Code also recognizes # %% for the same purpose (thanks Brett and team!)

As far as I know though, #% on it’s own has no meaning anywhere in the Python ecosystem.

My suggestion (and the way I prefer my bikesheds painted) was indeed just to try to be similar to these pre-existing “magic” uses.
It might be best to avoid the similarity though, to avoid any confusion between them.

There are a lot of tools that do more or less the same thing: managing a project that will get built and distributed as an installable package.

That is not this use case. At all. This use case (which is an incredibly common one) is to have a single executable python script with it’s own dependencies. All of the complaints about “too many tools” are not complaining about a lot of tools that do a lot of things. The frustration I see more often than not is that there are a lot of tools that all do one thing, but none of them do this very simple thing.

However, there are plans for supporting this use case at last. pip-run does it, pipx soon will (merged but not released), and there are the plans mentioned about adding support to the VS Code Python extension.

The point of the PEP is to define a simple, easily understood and parsed format for this narrow, but common, use case: writing the “better batch file” that includes a 3rd party dependency

Often a lot of these scripts are put into a single directory, and they might have conflicting dependencies, so a single environment won’t work, but they are all individually small enough to not make it worthwhile to make each a full project.

The desire is to have each script be able to entirely able to stand on it’s own (and “distribution” is as simple as giving someone a copy of just the .py file). All that is needed for that is a way to declare what dependencies are needed outside of the standard library (and maybe a shebang line if you want to get crazy).

Environment management for a project involves declaring dependencies for the code itself, dependencies for testing, dependencies for building documentation, dependencies for the build environment, additional development tools like linters and type checkers. If the project includes optional dependencies, there may also need to be separate test environments for all of those as well.

If you want a script that can also include all of those features, nothing is stopping you from using the project management system and declaring an entry point for your code.

A single script doesn’t need any of those (and the script author doesn’t even need to be aware of the existence of any of those).

I’ve hesitated to write “scripts and projects” because I view the line between the two as blurrier than the PEP text. ↩︎

jeanas · August 5, 2023, 6:24am

Your reply sounds like I didn’t quite get my point across, so let me use an analogy.

Consider Python itself, as a language. If we assess its complexity, it is huge. It has dozens of statements, complex semantic features like async, hundreds of stdlib modules. Nobody can even know all of it. Hairy stuff, right? Beginners don’t need most of that. An absolute beginner to Python just needs some simple assignments/if/while sort of statements, a few simple types, a few basic functions, and perhaps some turtle graphics. Then how do so many absolute programming beginners manage to use Python? The answer is, because Python does a good job of making simple things simple and hiding complexity from beginners’ eyes.

The packaging tools we have aren’t really like that, famously. They’re more like Java, where you must create a class before you can even print “Hello, World” to the console.

PEP 722 makes an argument, which you repeat above and which is valid, that this is too much complexity for beginners to handle. As a remedy, it proposes adding a second class of tools which would be more like Logo. Only the essentials, only for beginners, no complexity.

That’s a valid proposal, but what I’m saying is that I would rather have the Java-like tools transformed into more Python-like tools that beginners find themselves comfortable with, without creating a split between two types of tools and two metadata minilanguages.

davidism · August 5, 2023, 12:31pm

A post was split to a new topic: Who should approve a Packaging PEP?

pf_moore · August 5, 2023, 10:11am

I think it’s horribly misinterpreted by many people. But that’s offtopic for this discussion so I won’t say more here.

For running scripts in particular. Because running a script is a Python core function. And, the core devs have explicitly stated that they want packaging to be outside the core. So script running, while it may be something packaging tools will offer, is not exclusive to such tools.

Running a script with its dependencies will interact with packaging, but it will still be a core Python function, and therefore will (should) be part of the core feature set. Here, I’m considering the py launcher as core, because I expect that in 10 years it may well be shipped with Python for both Windows and Unix.

Project environments are persistent, and a core and visible part of the environment. They are typically managed explicitly by the developer (in terms of their content and use). They are often stored in the project directory (by choice, not as an implementation detail).

When running a script, an environment^[1] may be needed, but it’s hidden, managed transparently, and transient. It may be cached for performance reasons, but if so, management of that cache (expiry of environments, in particular) should be transparent.

Those two sets of requirements are utterly different, and trying to force them to work the same will mean trade-offs to the detriment of both cases.

Not for beginners. For people who write Python scripts as part of a role that is not that of a Python developer. These are not beginners, they are often highly experienced and senior developers or admins, who simply view Python as a tool for automating important tasks within a single environment.

Honestly, I wish we would stop characterising users as “beginners”. Not everyone who uses Python is on a path that goes from being unfamiliar with more than the basics, through familiarity with the language and its ecosystem, to expert level understanding. People can be experts, but simply not consider parts of the Python ecosystem (for example packaging) to be a priority to them.

Would anyone consider Guido to be a “beginner” in Python? And yet, I’m pretty sure he has very little knowledge or interest in the packaging ecosystem. Would it make my argument stronger if I used him as the example of “someone who has no interest in, or patience for, the complexities of pyproject.toml”?

in the broadest sense - it may not need to be a virtualenv, for example ↩︎

kknechtel · August 5, 2023, 12:46pm

Not to speak for James or anything, but I think you got the point across fine the first time - this is just a disagreement.

No, I don’t think this is the argument at all.

First off, workflows like this are not necessarily “for beginners”, and thinking that way is problematic - in the same way that thinking about questions about fundamental tasks as “too easy” is problematic for Q&A sites, documentation writers etc.

But more to the point: what is proposed is not a new class of tools (script runners exist already, and this only advocates a format without saying anything about who will actually use that format), and it is also not an alternative to the packaging process. It serves a different need.

Single-file scripts are unquestionably applications rather than libraries in almost every case. People who write this stuff aren’t expecting to share it with friends, so that their friends can import it. The goal, realistically, is to run it.

Making it possible to run the script is not “packaging”, so it doesn’t make sense to judge the tools for doing so by the standards of the existing packaging framework, or consider the impact on the “ecosystem”. When people set up a script for others to run this way, it’s not just that there is no intent to mark files as belonging together, no intent to put the code in a centralized “index” for distribution, no intent to prepare a “wheel” or interface with code in other languages etc. etc.

It’s that there is no intention for others to “install” the script.

A script distributed this way should be able to just sit anywhere it likes - not put any representation of itself into site-packages. Someone who already happens to have a Python environment set up that includes the necessary dependencies, should be able to just use /path/to/appropriate/python my_friends_script.py and go. People who don’t want to learn about any tools can read the comments, pip install whatever to whichever, and proceed with step 1. People who want runtime isolation (not build isolation!) can do fancy_scriptrunner my_friends_script.py, and let it parse the comments automatically.

But in no case is the script “installed”, therefore it was not “packaged”. Even when a script runner creates a virtual environment (which is likely temporary anyway) to house dependencies, the script itself is not moved or copied; and no actions are taken for the purpose of “advertising” the script to the OS, other programs etc.

ericvsmith · August 5, 2023, 2:36pm

I agree that this is not a feature targeted solely at beginners. I’ve been around the block a few times. I have a number of packages on PyPI. I understand packaging. And I would absolutely use the feature that PEP 722 is documenting.

davidism · August 5, 2023, 6:21pm

A post was merged into an existing topic: My thoughts on the Packaging PEP process

ncoghlan · August 6, 2023, 12:59am

Overall +1 from me for the “## Script dependencies:” version of the proposal (and I agree with the rationale for not trying to embed TOML or require Python syntax parsing, hence I’d be -1 on the 723 alternative).

Some functional suggestions for the PEP:

it should explicitly make the block header lines case insensitive. Otherwise “is the ‘d’ capitalised?” will be a recurring source of entirely pointless embedded metadata parsing failures. Precedent exists in the normalisation rules for dependency specifiers: https://peps.python.org/pep-0440/#case-sensitivity (Tangent: Package name normalization — Python Packaging User Guide should reference PEP 440 as the origin of the formalisation of the normalisation rules)
the PEP should nominate an explicit home URL under PyPA specifications — Python Packaging User Guide rather than using the PEP itself as the long term format documentation. Given the prior example of https://packaging.python.org/en/latest/specifications/core-metadata/, my URL suggestion would be
https://packaging.python.org/en/latest/specifications/script-metadata/ with the page title and link from the spec page being “Embedding Metadata in Script Files”
consider allowing comments and clarifying the spec for “blank” lines in the metadata blocks (i.e. still requiring the “##” prefix to keep the metadata block going, but ignoring “#” characters and anything after them, and then amending the blank line handling to cover discarding all lines with no non-comment text on them)

Editorially, I think the PEP would benefit from including the following:

rationale for preferring a combined “line prefix with block header” over a pure “line prefix” approach (the main benefits I see myself: less visually noisy overall since the line prefix can be much shorter, and it encourages listing all the dependencies in one place rather than scattered throughout the file)
a summary of the benefits of declarative metadata over runtime package installation APIs (i.e. metadata allows more robust environment caching based on either script hashes or file location details, while runtime config typically incurs more environment setup overhead on every run since any caching becomes the responsibility of the defined runtime API rather than being checked once in the script runner)

Note: I read the first 60 or so replies to this thread as well as the start of the split out PEP 723 thread (and Paul’s reply to that here), so it’s possible I missed folks already making the above comments. There’s a lot of off-topic discussion to skim over though, so I won’t get to that until I re-read the threads on a full computer rather than my phone).

janlarres · August 6, 2023, 2:18am

This has already been mentioned a few times, but I think the survey answer preference for “serving common use cases” is an important one here. The survey’s phrasing unsurprisingly restricted itself to specifically talking about “packaging”, but I think that some of the frustration with the current packaging landscape is specifically the use case that is addressed with this PEP: that the currently available tooling forces users to go down the “packaging” route even if they don’t actually want to package anything, which then exposes them to the somewhat complex landscape.

By addressing this use case in a way that avoids the actual packaging part many of the other survey outcomes become mostly irrelevant because they no longer apply. And this is really true regardless of the experience level of the users.

kknechtel · August 6, 2023, 7:03am

I absolutely agree.

Here I’m more wary (although perhaps there is some standard policy that makes my objection moot). Because of my position that this PEP is at-most-tangential to packaging concerns, I would prefer not to have PyPA directly involved, or house the document under that domain.

I’m not convinced this is especially necessary, given that the metadata blocks are already embedded into block comments. I would rather see the PEP explicitly allow (and require support) for multiple requirements blocks that are functionally concatenated; then one could simply intersperse regular comments if necessary. (And really: is “reading more of the file than necessary” a non-trivial concern, for a tool that is likely to spend time downloading and installing packages?)

I don’t think you overlooked anything relevant to the points you’re making, but I’m trusting my own memory on that.

petersuter · August 6, 2023, 7:34am

Comments seem like a reasonable idea.

## Script Dependencies:  # for pip-run
##    click
##    requests>=2.26.0   # we need brotli support https://github.com/psf/requests/blob/main/HISTORY.md#2260-2021-07-13
##    rich
##
##    # for the data science part
##    numpy
##    matplotlib
#
## X-Python-Version: 3.11 # second (non-standard) metadata block starts here

Maybe not strictly needed, but maybe more convenient than the alternative:

# For pip-run:
# requests: we need brotli support https://github.com/psf/requests/blob/main/HISTORY.md#2260-2021-07-13
# numpy+matplotlib for the data science part
## Script Dependencies:
##    click
##    requests>=2.26.0
##    rich
##    numpy
##    matplotlib
#
# second (non-standard) metadata block starts here:
## X-Python-Version: 3.11

pf_moore · August 6, 2023, 11:03am

I was going to say that I didn’t see the need, because dependency lists aren’t likely (in my experience) to be long enough to warrant the extra complexity. But your example is a good one. I’m still not 100% convinced it’s worth the extra complexity, but I’ll consider it. If others would find this useful, please speak up! If there’s sufficient support for comments, I’ll add them, but the default will be to leave the spec as it stands in this regard.

I’m torn on this. Your arguments are reasonable, but it opens up questions like "is a space allowed between the header name and the colon? What about multiple spaces, or tabs, between the two words? And I don’t want to make things that complex. For me, parsing simplicity is key, and I don’t think there’s much likelihood for confusion (frankly, the “D” is probably the only place where people might get it wrong). One other possibility is to just make the key a single word - “Dependencies:”.

Again, other opinions would be welcome. But (and I wish I didn’t have to say this, but I think I do) can people who feel the need to use this question as a chance to argue that this “proves” that PEP 722 is not as simple as it claims and therefore TOML is better, please refrain from trying to make that point again. It’s been hard enough to get feedback on the technical details of this PEP, and at this point, reiterating arguments that are already addressed in the PEP is feeling awfully like deliberate sabotage of the process.

It’s the normal process to migrate specifications to the packaging guide once they are approved, and I fully intend to do that (thanks for the suggestion of a precise location, it saves me having to think about it). But because of that, I’m not sure why people think it’s necessary to explicitly mention it in the PEP. Is it a case of having to make it clear because the process isn’t followed often enough?

I’ll add a brief comment to the PEP noting that the final form of the spec will reside in the packaging guide, just because it’s easy enough to add and clearly some people would feel better if it were there (you’re not the only person to have made this suggestion).

Will do. Your list of benefits is basically my reasons for preferring this form, so at least the intent was clear, even if I hadn’t added it to the PEP

I’m not convinced by this one, as I think it gets too deep into implementation details and tool UX. This PEP doesn’t prevent a runtime installation API, and in fact it’s totally neutral on that matter. A suitably crafted runtime API can be perfectly fine (take, for example, the nox session API), but it’s not the approach that this PEP is addressing. Not least, because an API is by definition tool-specific, and so is not appropriate for standardisation.

Yeah, I concede that “reading more than necessary” is probably an unnecessary attempt at optimisation. But there are other reasons for not allowing multiple blocks which get merged:

User readability. It would be way too easy to have an extra 1-line dependency block in the middle of a script and not notice it. This could even be considered a security risk (although I’m pretty sceptical of the whole “security” aspect when we’re talking about running some bit of code someone sent you).
Lack of any good use cases for it. No-one’s requested this so far in the discussion, and neither pipx nor (to my knowledge) pip-run support it.
Again, simplicity of parsing and processing. Maybe I’m going too far on this point, but I really don’t see any reason to make it harder to parse the data just to handle something that “might be useful”.

I can expand on that point in the PEP.

@brettcannon if you want to revise your schedule for when you make a decision based on the fact that these points have come up, I’m fine with that, but FWIW, I will commit to having these comments addressed before Monday 14th.

DavidCEllis · August 6, 2023, 1:11pm

Peter Suter:

ncoghlan:

consider allowing comments and clarifying the spec for “blank” lines in the metadata blocks (i.e. still requiring the “##” prefix to keep the metadata block going, but ignoring “#” characters and anything after them, and then amending the blank line handling to cover discarding all lines with no non-comment text on them)

Comments seem like a reasonable idea.
## Script Dependencies:  # for pip-run
##    click
##    requests>=2.26.0   # we need brotli support https://github.com/psf/requests/blob/main/HISTORY.md#2260-2021-07-13
##    rich
##
##    # for the data science part
##    numpy
##    matplotlib
#
## X-Python-Version: 3.11 # second (non-standard) metadata block starts here

While it’s probably fine for the dependencies block use case, I’m not sure I like comments within the metadata block in the general case. # can have valid meanings other than just to indicate comments and splitting on it may not always make sense.

First example that came to mind:

## X-Relevant-Links:
## https://docs.python.org/3/whatsnew/3.11.html#new-modules

In this case #new-modules may be taken as a comment and removed which was not intended. I don’t have a specific use case where I need this but I thought it was worth bringing up.

ali · August 6, 2023, 1:27pm

Thanks for writing this up and for improving on this use case that has been mostly ignored in the python ecosystem.

One thing I think is missing in the PEP is how it may affect / is affected by linters and code formatters. The use of ## as a block identifier may get flagged / auto-formatted by known linting tools. If you run flake8 for instance on the example script in the PEP, it outputs:

$ flake8 script.py
script.py:3:1: E266 too many leading '#' for block comment
script.py:4:1: E266 too many leading '#' for block comment
script.py:5:1: E266 too many leading '#' for block comment

This may cause confusion if a tool was to automatically format the comment block and insert a space between the ## and the script suddenly stops working as intended. This could happen because of a ruff autofix rule, by an IDE format on save, or by running a code formatter directly on the script. The good news is that black which the most used formatter currently doesn’t mess up this case but I am not sure if other formatters/IDEs/autofixing linters do. It might be worth mentioning this in the PEP with a recommendation for tool authors and users on how to handle this case.

ericvsmith · August 6, 2023, 5:04pm

My requirements.txt files for larger projects are full of comments. Most often I’m commenting things out (they’re for testing, or maybe an alternate implementation), but occasionally notes like “# remove the next two dependencies when we finally drop Oracle support”. So I’d definitely like to see comments allowed.