PEP 722: Dependency specification for single-file scripts

jezdez · July 27, 2023, 10:07am

I’m not in favor of this PEP because it simply adds yet another way to specify Python dependencies without taking user feedback into account. Particularly, I find this a big red flag:

…it’s intended to be for single-file scripts only, and is not intended in any way to replace or compete with project metadata stored in pyproject.toml

We’ve heard from many community members (e.g. the packaging survey) that they are tired of having to know the many ways to interact with Python packaging tooling and metadata.

While I believe your intent is, of course, benign, I don’t think end users will understand the subtle use case differences, and instead will have to reflect on the additional option they are provided alone. That’s especially unfortunate since I believe the proposed format doesn’t actually cover best practices like lock files and might have a chilling effect on the great work of standardizing on one file (pyproject.toml).

Essentially, I don’t think end users will know or understand the need for such a PEP, and will again have to expand their understanding of when and how to use which format for dependency specification.

As a reminder, respondents of the survey primarily

found Python packaging too complex,
did NOT prefer to use several Python packaging tools,
preferred a clearly defined, official workflow.

But they also said, PyPA should

focus on supporting a wider range of use cases,
support more interoperability between Python packaging and packaging tools for other languages.

Given these key takeaways, I don’t see how this PEP would help users, as it would just add one more way to specify dependencies, without removing other options or integrating better in real-world scenarios.

BiteCode · July 27, 2023, 10:52am

Oh, then the proposal doesn’t solve the problem for the people that needs it the most. We already have tools for this, so python experienced devs don’t need the proposal. If it’s not officially adopted, the people that are not saavy will not know it exists or will mess up installing the tool in the first place.

pf_moore · July 27, 2023, 12:19pm

Thanks for the sanity check on this. I will point out that my interest in this is very much as a user, so to that extent at least I am taking user feedback into account

My intent here is very much focused on the “supporting a wider range of use cases” point. Specifically, Python packaging currently has very bad support for the common use case of a set of single file scripts, often stored in a “utilities” directory on $PATH, which rely on dependencies from PyPI.

I honestly don’t see how we can improve the situation here without something like this PEP. If you have a suggestion for an alternative way of addressing this use case, I’d be very interested in hearing of it. But please understand that solutions involving “make a project directory” or “put the dependencies in a separate file” directly contradict the key requirement here, which is having a way of writing a single runnable file that can use packages from PyPI.

Like it or not, this is a common requirement for many Python users, and telling them that they “shouldn’t work like that” is not realistic^[1] - they’ve been “working like that” for many years now, and either complaining that Python environment management is hard, or dumping all their 3rd party requirements into their system Python (something that the packaging community chose to discourage without really ensuring that all the reasons people do this were considered).

With regard to your other points from the survey:

These are all related to the tool that allows single-file scripts to be run. I’m absolutely in favour of simplifying the landscape here. The pip-run tool was proposed as a pip subcommand a long time ago (hence the name). It hasn’t happened yet because there’s a lot of UI and organisational issues that we haven’t been able to resolve. There’s also a question of whether this is in scope for pip, but the user survey results suggest (to me, at least) that users would be OK with pip gaining this functionality as part of becoming the core of the “unified PyPA workflow”^[2] so I consider that issue as having been solved at this point.

But none of that is relevant to this PEP. All I’m trying to do here is define where any tool would find dependency data when faced with a single-file script. That problem will have to be solved no matter what the official workflow for running such a script ends up being, and I don’t see the disadvantage in building on existing, working solutions.

I’ve covered “wider range of use cases” point above. As far as interoperability is concerned, surely, by standardising the means of getting data that is currently only available in tool-dependent ways, that has to improve interoperability? And by using the existing PEP 508 standard for dependency specifiers, I’m ensuring that this proposal build on existing interoperability work? I don’t want to make assumptions here, but if you’re concerned that conda (for example) can’t use this data, surely that’s about how well PEP 508 maps to conda packages rather than being about this proposal?

So while I wouldn’t want to try to present this PEP as some sort of massive step forward in addressing the user concerns expressed in the survey, I don’t see how it’s harming whatever work we do in that area. And I absolutely don’t think that the correct response to the survey is to stop making any sort of progress out of fear that we’ll make things (temporarily) more complex in the process of working on long-term simplification.

PS Apologies if my frustration is showing through here. I’ve been arguing for literally years that by ignoring the “run my script with some dependencies” use case, we’re failing to consider an important user requirement. It’s difficult for me to know how to address a complaint that when I finally try to make some progress in this area, I’m not considering user feedback…

Again, putting this in the context of the survey, there was a strong flavour users feeling that Python packaging does not listen to what users say, and IMO saying “that’s not the way you should do things” is a strong contributing factor to giving that impression. ↩︎
Although within the packaging community, there’s no consensus yet on whether pip, specifically, should be the core tool, rather than something else ↩︎

pf_moore · July 27, 2023, 12:30pm

So your view is that to be “officially adopted” something like pip-run must be part of pip? Or are you referring to the idea mentioned by @ntessore of being able to extract a script’s dependencies and install them into an existing environment? (Sorry, the way your response appears on the web interface doesn’t make it clear which comment you were responding to).

If it’s the former, there’s a proposal to add pip-run functionality to pip. That’s independent of (but linked to) this proposal - but I’d encourage you to read the full discussion there before commenting in support of the idea, as there are some non-trivial issues that need some work before this can happen, and I don’t think anyone currently has the bandwidth to work on them.

fungi · July 27, 2023, 12:41pm

I would also question whether the results of a Python packaging
survey are applicable to this case, which is basically a sort of
un-package approach. People choosing to take the packaging survey
likely have a selection bias for package-oriented solutions, which
this proposal isn’t really (though it is still relevant to related
topics like environment management).

Put another way, are the users who this solution is trying to
satisfy, i.e. those who don’t want to package their scripts, likely
to bother filling out a survey about packaging?

fonini · July 27, 2023, 1:15pm

Well, you could argue that people who write scripts with dependencies which are Python distribution packages are users of the Python packaging ecosystem. After all, they download the packages for their dependencies from PyPI, and IIRC there was a banner on the PyPI web interface linking to the survey (I believe that was what led me to the survey). You don’t need to write Python packages (instead of standalone scripts) to be a user that regularly downloads and installs packages, and eventually even visits the PyPI website if only because it appeared on search engine results.

NeilGirdhar · July 27, 2023, 1:37pm

Sorry for the late reply, but yes I dump all of the shared dependencies in the global environment.

Is there no common functionality in all of your scripts? How do you share functions between them? I’m surprised that you haven’t collected them into libraries with multiple entry points rather than have loose scripts.

Also, just from a maintenance point of view, do you really prefer one script with an inline pyproject.toml versus a folder with a Python script and a pyproject.toml? In the latter case, at least you’ll have syntax highlighting, and access to any tools that check and update pyproject.toml. It just seems easier to me.

It would be cool if there were a tool to generate a folder/pyproject.toml given a Python script.

pf_moore · July 27, 2023, 2:59pm

Not typically, no.

Generally, I don’t. Or if I do, copy and paste is enough.

Well, doing things the way I do works better for me. Not least because of the point that everyone keeps missing here - I don’t have to install them. I just create them and run them. And if I find something that I want to change, I fix it and I’m done. There’s no “source directory” that I have to keep in line, or anything like that. This isn’t production-quality code, managed as a full-fledged project, it’s helpers, one-offs, and quick wrappers.

Yes. What sort of maintenance do you imagine I do?

Here’s an example of the sort of thing I mean. Note that this uses just the stdlib, but that’s precisely because managing dependencies is a PITA.

from urllib.request import urlopen
import json
from argparse import ArgumentParser
from datetime import datetime

p = ArgumentParser()
p.add_argument("pkg", nargs="*")
args = p.parse_args()

def print_version(pkg):
    with urlopen(f"https://pypi.org/pypi/{pkg}/json") as f:
        data = json.load(f)
    ver = data["info"]["version"]
    reldate = min(datetime.fromisoformat(f['upload_time']) for f in data['releases'][ver])
    print(f"{pkg}: {ver} ({reldate:%Y/%m/%d})")

for pkg in args.pkg:
    print_version(pkg)

I’d have used click and requests if I could.

Or there’s

import importlib.metadata
from packaging.requirements import Requirement
import re

def normalize(name):
    return re.sub(r"[-_.]+", "-", name).lower()

dependencies = set()
all_packages = set()
for dist in importlib.metadata.distributions():
    name = normalize(dist.metadata["name"])
    all_packages.add(name)
    if dist.requires:
        for req in dist.requires:
            dep = normalize(Requirement(req).name)
            dependencies.add(dep)

top_level = all_packages - dependencies
for name in sorted(top_level):
    print(f"{name}=={importlib.metadata.version(name)}")

(which does use dependencies, and needs some annoying infrastructure that I always forget to maintain because of that).

Also, a list of dependencies is not the same thing as an “inline pyproject.toml”. There’s no version for any of these scripts, the only name is the filename, and there’s no docstring or readme. They are purely personal utilities.

It would be of literally no use to me. But having said that, if you could specify a script’s dependencies in the script, like this PEP allows, it would be easier to write such a tool if it’s of use to you.

By the way - please excuse the fact that this reads as if I’m proposing this PEP purely to support my personal workflow. That’s not the case - I know many people who work like this, and plenty of people have commented on this thread in support of this type of approach. But it’s easier to give specific examples if I pick cases from my own usage.

I’ll be honest, reading your response gives me a very strong sense of what the survey feedback means when it says that users feel that the packaging community don’t listen to them. I’m trying to explain what my use case is, and why existing solutions don’t work for me^[1]. And rather than considering the problem I’m describing, you are trying to tell me that I’m going about things wrong. It may be that you’re tying to understand my use case better, but it feels like you’re telling me I shouldn’t be trying to do what I’m doing. It’s one thing to say that the packaging ecosystem shouldn’t support what I’m trying to do, but trying to say that I shouldn’t be doing it in the first place, so there’s no problem to solve, is very different (and not the message I think we should be giving to our users).

But this is pretty much off-topic. The PEP isn’t about something vague and general like “how to build and manage a set of standalone utilities in Python”, it’s about the much more specific topic of “How can a single-file script declare its dependencies”. I guess your view is “that functionality isn’t needed”. OK. But others have confirmed that they would benefit from it, and the existence of tools like pip-run and pipx suggest the same. I’ll review the “Motivation” section of the PEP and strengthen it if I think it will help, and then we can let the PEP-delegate decide if I’ve made the case sufficiently.

And let’s be honest here, I have a lot of experience with Python packaging, so there’s a good chance I’ve tried most, if not all, of the possibilities… ↩︎

tusharc · July 27, 2023, 4:36pm

Paul Moore:

My intent here is very much focused on the “supporting a wider range of use cases” point. Specifically, Python packaging currently has very bad support for the common use case of a set of single file scripts, often stored in a “utilities” directory on $PATH, which rely on dependencies from PyPI.

I honestly don’t see how we can improve the situation here without something like this PEP. If you have a suggestion for an alternative way of addressing this use case, I’d be very interested in hearing of it. But please understand that solutions involving “make a project directory” or “put the dependencies in a separate file” directly contradict the key requirement here, which is having a way of writing a single runnable file that can use packages from PyPI.

Like it or not, this is a common requirement for many Python users, and telling them that they “shouldn’t work like that” is not realistic - they’ve been “working like that” for many years now, and either complaining that Python environment management is hard, or dumping all their 3rd party requirements into their system Python (something that the packaging community chose to discourage without really ensuring that all the reasons people do this were considered).

This is the most compelling part of the proposal to me.

Python is an excellent glue language. (Who’s heard “the second-best language for everything”?) Writing Python scripts is much easier than writing shell scripts. Every team I’ve worked on has had some standalone Python scripts for scattered purposes. (Would it help the PEP’s case if I gave more specific examples?)

Until reading this thread, it didn’t even occur to me that I could create and run single-file scripts with pipx/pip-run. I think that I had always assumed that you had to go through the trouble of building out a full package, which is enough of a barrier that it rarely feels worth it.

My point, I suppose, is to emphasize that this is a real requirement and the PEP should improve the user experience around this.

flyinghyrax · July 27, 2023, 4:46pm

This seems to be a nice listing of how in-line dependency specification is handled in other language ecosystems:

https://dbohdan.com/scripts-with-dependencies

fungi · July 27, 2023, 5:03pm

You’re not alone. I manage quite a few packaged libraries and
applications written in Python, some widely-used, and help maintain
hundreds more which are very widely used, so I have a fairly solid
grasp of packaging concepts (even maintaining semi-popular
Setuptools plugins and pyproject build backends). Still, I automate
much of my day job with random Python scripts because the language
is very natural to me, and those scripts wind up stashed in random
places, not “properly” packaged in any way.

I’ve taken to putting ad hoc comment blocks listing the (Python and
non-Python) dependencies for those scripts to remind me what I need
if I move them to another system or recreate the venv I run them out
of. The friction related to using external libraries in such scripts
does subconsciously compel me to mostly avoid them and do a lot more
stuff with stdlib-only solutions when I could almost certainly save
time by using one or a handful of popular packages from PyPI.

Just an observation from the peanut gallery here, but this seems
pretty useful, and even if I chose to keep making the venvs for my
scripts by hand when I need to run them, I would still probably
adapt my comment blocks to the proposed format should it become
officially recommended, because why not?

NeilGirdhar · July 27, 2023, 5:07pm

Sorry, I didn’t meant to imply anything like that. I was just comparing your workflow to mine and wondering why you’ve chosen to do things this way.

One other difference is that all of my scripts are checked in to version control so that whenever I have to reinstall my machine or I get a new machine, I have access to them. So, for me, all of my scripts are already in a folder with a pyproject.toml and an .editorconfig. The pyproject.toml also configures tools like Ruff and Pyright.

For now, yeah I guess that’s my view–or if it’s going to be one file, I like the inline pyproject.toml that was suggested since it adds no new formats, and is extremely extensible.

Please don’t take offense. I didn’t mean it as any criticism. I’m just reading various comments to get an idea of how other people are doing things, and I was just asking about shared functionality since my scripts do have a lot of shared code.

BiteCode · July 27, 2023, 5:21pm

Ok, thanks for the precision.

pf_moore · July 27, 2023, 5:47pm

Thank you! That’s a great list, I’ll read through it and see how I can incorporate any insights from it into the PEP.

khs · July 27, 2023, 6:07pm

This PEP is useful, wish it was implemented years ago. I know the back-and-forth discussion here, but it tries to solve a customer problem that many of us encounter. I.e. here’s a python script to fix something simple, oh, you need to install packages, make a venv et rest (why which the end user just gives up).

jamestwebber · July 27, 2023, 6:59pm

Although this does make me think of the security implications, particularly for novices. It’s common for people to google a problem and execute some code they found–this isn’t great practice, but at least they can read through the code snippet they find before running it (…ideally). If they find one of these standalone scripts, just running that script will download and execute an unknown amount of code on their machine.

The PEP mentions this as being the same as the status quo, but it feels like a little more expansive in terms of what a bad actor has access to. Perhaps this is mitigated by the ongoing work to prevent dependency confusion, and restricting the index to PyPI.

pradyunsg · July 27, 2023, 7:00pm

What are you thinking of, that’s above and beyond what’s already possible?

jamestwebber · July 27, 2023, 7:02pm

I guess I’m thinking specifically of a dependency confusion attack where a reasonable-looking script is downloading a malicious version of a known dependency. If this specification can only download packages from PyPI then that problem is mitigated.

edit: and it’s true that this isn’t much different from blindly running pip install malicious-package before executing a script you found, but that’s exactly the small barrier that it’s trying to remove. If this feature makes standalone scripts w/ dependencies easier to run, it could make that type of attack more effective as well.

fungi · July 27, 2023, 7:27pm

Note that the proposal isn’t for the script to install things for
you, it just includes comments with a list of its dependencies.
You’d need to run some wrapper tool that does the parsing,
downloading and installing of these dependencies. Yes, executing
any software you download from untrusted sources is already a
problem, but I don’t see how a declarative data structure for
listing dependencies makes that any worse.

And besides, it’s already quite possible today to distribute a
“simple” Python script that downloads and installs things on your
system with whatever privileges you grant it, like you seem to be
worried about, e.g.:

https://bootstrap.pypa.io/get-pip.py

pf_moore · July 27, 2023, 7:43pm

Absolutely no offense taken. It was just an interesting learning experience being in the position of trying to explain my use case and being confronted with helpful questions which, although well-intentioned, came across as expecting me to justify my choices. I’ll certainly be more careful in future about how I talk to people offering use cases as a result.

One thing I know I tend to do is use such “have you tried X, Y or Z” approaches to avoid having to flat-out say “no” to a request. I think that’s a fairly natural thing to do - we wouldn’t be contributing to open source if we didn’t want to help people get their jobs done, so saying “no” to a proposal or feature request goes against that instinct. But I think that sometimes we have to just accept that a decision needs to be made, or an opinion needs to be stated, and we don’t always have to justify ourselves.

One other thought on this point. If I did have shared code, I could just put it in a lib directory^[1] alongside my scripts, and then do import lib.foo. Python adds the script directory to sys.path, after all, and this is precisely what it’s good for.

Personally, I might make lib a “real” project and publish it on PyPI, just because I can, and as a packaging expert that seems logical to me. But for many sysadmins, DBAs and data analysts of my acquaintance, that would be a massive step, and the simple lib directory is a much more appropriate solution for their situation.

But this is now way off topic. Let’s go back to simpler subjects like bikeshedding over whether to use Requirements: or Dependencies: or something else

You don’t even need an __init__.py, thanks to implicit namespace packages ↩︎