PEP 722: Dependency specification for single-file scripts

flyinghyrax · July 28, 2023, 11:35pm

As I’ve been following this discussion, I have been baffled and shocked at how controversial this has turned out to be.

This isn’t a packaging standard, nothing is being packaged. There is no build process (inherent to the use case!) and no separate distributable artifact other than the file itself (again, inherent to the use case!)
It is addressing something that existing tools already do because there is a demand for it.
It is addressing something tools for other languages already do, because there is a demand for it.
Having a format means different runners will be more compatible, rather than the existing situation in which they are not interchangeable.
Having tools like this become more common makes this common single file scripting use case simpler, specifically because it means you don’t have to make an entire project out of a shell script in order to be following best practice.

All objections I have seen this far have either already been addressed, or ignore the use case. I’m wildly impressed with the patience and diligence of @pf_moore in answering everyone’s concerns, even when those voicing the concerns obviously did not read the PEP or the rest of the thread, because he has had to answer the same objections multiple times.

People keep referencing the packaging survey and how respondents indicated that they felt packaging was too complex, or best practice was unclear, or that there were too many options. I am one of those respondents. And watching this has been incredibly frustrating, because I’m witnessing the community argue that a simpler solution to a problem that I have all the time is inferior to the more complicated existing solutions that I can’t use because they just add to the problem - while nominally doing so because packaging is too complicated. And it’s absurd.

groodt · July 29, 2023, 1:47am

I too am surprised. I think this single-file script with dependencies is a fantastic idea. I mentioned before, but I think it will further improve Python as a glue language and make things a lot easier for beginners and experts alike.

In fact, I think this may even improve the situation and perceptions in relation to the so-called “packaging problem”. I’ll bet that there are a lot of beginners who fight with packaging to distribute scripts, but if this PEP existed they wouldn’t need packaging and therefore would complain less.

It also would help with packaging in other very important ways. I can see a path where this even helps remove obstacles in the way of a packaging lockfile spec. There are always circular debates about apps vs libraries vs scripts and their relationship to lockfiles. If we remove scripts from the equation, the solution is far, far, far more tenable.

If this PEP existed I can think of at least 10 different tools at my workplace that could be distributed as scripts instead of complicated zip files or docker images.

EpicWink · July 29, 2023, 2:18am

There seems to be a misunderstanding in some recent posts where the posts’ authors think that this will affect most of all of Python packaging. As far as I can tell, this proposal only impacts high-level Python script executors, such as pipx, pip-run, and potentially conda run, and IDEs via a feature request.

This proposal would be simple to implement in IDEs, with maybe a check-box in run configuration (or a dialogue box on run) and some very simple text parsing; I know I would enjoy implementing it.

I think the N apps for K features concern is partially misleading, as in most cases users want to increase K: my take-away from the packaging survey is only that (a majority of?) users want N to be 1. I don’t think this proposal hinders not furthers that goal (hypothetical pip run would likely implement this).

My main issue with this proposal, which has been somewhat addressed above, is that dependencies are installed inadvertently because an unknowing user happened to specify the dependencies in the valid format, then use a script executor without knowing it has the capability to install those dependencies.

jeanas · July 29, 2023, 3:16am

Your emotions are yours, but staying a bit less emotional in your writing is usually more productive.

I don’t think that’s true of what I wrote in Sketchy and maybe crazy alternative to PEP 722.

Packaging is too complex → This will make it more complex, not for those who will always stick with this solution, but overall, and for those who have to understand both systems.
Best practice is unclear → This adds a new practice, making “best practice” less clear.
There are too many options → There will be one more option.

So it’s not clear to me how you immediately jump to the conclusion that the criticism towards this proposal is absurd.

Yes, from the point of view of someone who will never write anything but quick single-file scripts (which is a lot of people), this will simplify things, though at the cost of introducing yet another option (in any event, it will take a lot of time before all topmost Google search hits for queries like “run Python script with dependencies” point to resources with the new method, and in the meantime there will be some confusion, inevitably).

For those who are both writing quick scripts and more significant projects (which is also a lot of people), this will increase the packaging fragmentation, which everybody agrees is confusing.

That’s why I think it is worth reflecting on how to make single-file scripts at the same time convenient and similar to what already exists.

petersuter · July 29, 2023, 7:38am

Jupyter notebooks already have a popular simple magic inline command for this, as mentioned above. (Examples)

Hopefully Jupyter could instead / also support this PEP?

(A separate toml file would probably not be a viable alternative.)

How would the PEP work for e.g. torch? Typically you have to look up and copy this:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117

from https://pytorch.org/

Should scripts be able to specify a package index? It would be possible to implement “some” features, for example being able to add extra index locations. However, it is difficult to know where to draw the line

Maybe it is better to keep it simple. But it is unfortunate that ML use cases would be out of scope then.
Here is a random simple example of a single-file script / Jupyter notebook using torch.

From the Python Packaging User Survey:

What should the PSF and PyPA focus on? → #1 by far: Making Python packaging better serve common use cases and workflows

What should the packaging community do to be “an ecosystem for all”? → #1 by far: Support more interoperability between Python packaging tools

This is it. Simple scripts are a very common use case. Currently there’s no good workflow for them, and no interoperability between tools. This PEP addresses both.

What Python packaging tools do you use? → #1 by far: pip

I prefer to use several Python packaging tools, rather than a single tool → Most disagree

I would love to see support for this PEP integrated in pip as pip run. But the PEP has to be accepted first of course.

What do other packaging managers do better than Python Packaging?

Better deployment of dependencies when installing projects

Better systems for managing dependencies within a project

For example they provide simple one-liner dependecy requirement syntax for simple script “projects”.
For example F# has a popular directive #r "nuget: FSharp.Data" roughly equivalent to e.g. #r "pip: numpy". (Not an embedded .fsproj file.)

Python packaging is too complex

“Simple things should be simple, complex things should be possible.”

A very simple solution is needed. A simple script can be written in seconds to minutes. Specifying its dependencies should not break the flow. Writing a simple declaration with trivial / no syntax should be easy from muscle memory.

There are many viable ways this could be done with minimal syntax complexity:

# Requirements:
# numpy
# Pillow

__dependencies__ = "numpy Pillow"

import numpy  # requires: numpy
import PIL    # requires: Pillow

"""
%pip install numpy
%pip install Pillow
"""

There are only very few matches on Github for #Requirements or #Dependencies, so more complex syntax for disambiguation is not necessary.

Keeping the syntax similar to the existing pyproject.toml would be nice (for experts that already know it) I guess.
But if it’s not possible to keep it simple it doesn’t serve the common use case and becomes pointless.
Learning the simple one-liner syntax doesn’t seem like a real issue for experts.

This would be way too complex:

__pyproject_toml__ = """
[project]
dependencies = [
  'numpy',
  'Pillow',
]
"""

...

Can proponents of a more pyproject.toml-like approach show an example of similar simplicity as the PEP?

Transforming a script containing simple PEP declarations into a full pyproject.toml project would be trivial if it’s ever desired.

maximlt · July 29, 2023, 9:23am

conda used to have support for creating an environment from reading an environment spec embedded in the notebook metadata. Support for that was deprecated a long time ago and removed this year in version 23.3.0. Maybe there’s some lesson to be learnt from that experiment.

sinoroc · July 29, 2023, 9:25am

Peter Suter:

How would the PEP work for e.g. torch? Typically you have to look up and copy this:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
from https://pytorch.org/

Should scripts be able to specify a package index? It would be possible to implement “some” features, for example being able to add extra index locations. However, it is difficult to know where to draw the line

Maybe it is better to keep it simple. But it is unfortunate that ML use cases would be out of scope then.
Here is a random simple example of a single-file script / Jupyter notebook using torch.

As far as I know, this is bad practice to hard-code the package index in the script (or in the package metadata), because it makes redistribution (sharing) more difficult. Each potential user of the script (or packaged application/library) might want to use a different index. I recommend reading the abstract vs. concrete requirements discussion. Sadly, it is true that there is no easy solution for that problem in Python packaging ecosystem yet (that I am aware of). The best current solution is for each user to have their own package index proxy or something like simpleindex. Anyway this seems indeed out of scope for PEP 722 itself, it will be up to the tools implementing support for PEP 722 to offer solutions for this issue, and maybe they already do (I do not know).

As far as I understood, this is what pip-run aims to be in the long term, but it has to be mature before this happens, and something like PEP 722 would surely be a step in the right direction to prove maturity.

You only need to know how to write dependency specifiers, which is the same notation as in the [project] section of pyproject.toml and a bunch of other places as well.

This PEP aims to unify a bunch of practices already existing in the wild: pkg_resources’s __requires__, Jupyter’s, pip-run, and probably others (with pipx upcoming).

Good point. As far as I can tell VSCode is already represented in this thread by Brett Cannon. Maybe this PEP should be presented in Editor/IDE Integration as well to ask for feedback.

abravalheri · July 29, 2023, 12:54pm

Hi @maximlt, do you know what would be these lessons or the reason for the removal?

I tried to follow the links:

The PR you listed mentions issue #12307
Issue #12307 mentions a warning in conda_env/specs/notebook.py#L21. The link is broken, but I guess it refers to https://github.com/conda/conda/blob/159e78566a2e09e90462cea5f5cf0d027383aeb3/conda_env/specs/notebook.py#L21.
The warning references pull #5843
Pull #5843 references issue #5842
Issue #5842 mentions:

These two commands are outside the scope and spirit of conda. The first requires knowledge of a “notebook,” which is not one of conda’s fundamental units of work. The second requires knowledge of the Anaconda API, which is the purview of anaconda-client, not conda.

So it seems that this was project choice related to the scope of the project, or am I missing something?

FRidh · July 29, 2023, 8:04pm

I do very much see the use case, and as mentioned earlier in the thread, nix-shell is often used in such a way. Yet at the same time a risk that we have here is yet another way for specifying dependencies.

Therefore, I think the preferred way would be to embed the pyproject.toml, which is entirely tool agnostic. Tools like poetry or pdm could then opt into embedding lock files as well. Files could have a file: pyproject.toml (or the correct rst notation) as header e.g.

groodt · July 29, 2023, 9:47pm

The risk of creating another way of specifying dependencies is exaggerated in my opinion. They are still specified as described in Dependency specifiers — Python Packaging User Guide

As mentioned previously in this thread, there is evidence in Single-file scripts that download their dependencies · DBohdan.com Most language ecosystems on that page define a specialised compact format or DSL for dependencies single-file scripts. It is true that some support an embedded file form as well. rust-script is interesting because it supports both a minimal syntax and embedded cargo rust-script | Run Rust files and expressions as scripts without any setup or compilation step.

To me, this debate doesn’t seem like a huge blocker. Another PEP in future could come along that specifies support for embedded pyproject.toml, either as an alternative to the compact format from PEP 722, or as a unifying format.

BrenBarn · July 30, 2023, 4:58am

(post deleted by author)

maximlt · July 30, 2023, 8:36am

You already know more than I do! The only thing I can add is that I’ve found this presentation from 2015 when they explain how this was meant to be used:

Working with notebooks

$ conda create -n project
$ conda install -y bokeh pandas jupyter
$ ipython notebook iris.ipynb
$ conda env attach -n iris iris.ipynb
$ anaconda notebook upload iris.ipynb


Reusing your notebook

$ anaconda notebook download malev/iris
$ conda env create iris.ipynb
$ source activate iris
$ ipython notebook iris.

facundo · July 30, 2023, 9:30am

For the record, you already can do the following with fades:

Define which import needs a package install by commenting it

#!/usr/bin/env fades

import math
import requests  # fades

...

Specify it in the script’s docstring:

#!/usr/bin/env fades

"""Super fun script.

The following deps will be handled by fades:

    requests
"""

import math
import requests

...

Or just specify it at run time, but it’s not that fun:

$ fades -d requests myscript.py

In any of these cases fades will create a virtualenv with just requests installed (if not there already to re-use it) and run the script in the context of that virtualenv.

pf_moore · July 30, 2023, 10:23am

Do you know whether fades would be willing to support this PEP if it were accepted? If not, is there a particular reason why?

pradyunsg · July 30, 2023, 1:44pm

I doubt, since https://github.com/PyAr/fades/ hasn’t had changes since this commit in Mar 2022, at the time of me writing this comment.

ofek · July 30, 2023, 4:00pm

I don’t have much time to comment on this but I both acknowledge this as a valid use case and also am a soft -1

I think the solutions I see are as follows, in order of preference:

Create the concept of a “script directory” that would require a single pyproject.toml where the stem of every script corresponds to a key in optional-dependencies and tools would manage the dependencies for a given script’s environment based on the value of that key
Wait for this to be standardized in which case package indices can serve an API for a reverse lookup (sorted by most downloaded)

BrenBarn · July 30, 2023, 6:46pm

This discussion has grown quite a bit! Several people have said things I agree with and some have said things I half-agree with so I’m just going to try to navigate a few of those. I guess the short version though is this: I think maybe what we need is less a discussion of this PEP and more a discussion of “what do people overall want in terms of running single-file scripts and what is the best way to achieve that”?

H. Vetinari:

So I do not buy the “single file” requirement, or at least not that it trumps all of the above. This does not mean that I’m dismissing your use-case, but I do believe the same result (having a script without too much structure or ceremony & a reasonable way to specify its dependencies) could be achieved differently, for example by:
scratch/
  - my_fancy_script.py
  - my_fancy_script.requirements.toml
  - ye_old_workhorse.py
  - ye_old_workhorse.requirements.toml
  - [...]

I suggested something like this earlier in the thread, although I can understand if people didn’t see that because I buried it in a long rambly post about my own approach to this problem. But yes. . .

. . . I do think it means we should be careful not to assume that because the script is one file, the script and dependency information must also be one file. In other words, there’s a difference between “a single file” and “a single runnable file (with an accompanying non-runnable dependency file)”.

That’s appreciated. Like I said a bit earlier in the thread, I agree that single-file scripts are a valid Python use case and I agree they’re not well-supported by current packaging/dependency/environment standards. But I just think we should be a bit careful. Just because that is a need that could usefully be met doesn’t automatically mean this proposal is the best way to meet it.^[1]

In the PEP you have several rejected alternatives. You have good rationales for rejecting them, but they’re essentially judgment calls. On some of those points I would make those calls differently or at least want to consider them more before making them.^[2] And the way I read some of the comments in this thread is basically other people saying they would make some calls differently too^[3]. So I think for some of these it’s not just a matter of saying “see the rejected alternatives section”; the question is whether you rejected an alternative that maybe should actually be accepted instead.

As I see it this is kind of trying to have your cake and eat it too. The PEP is just abstractly about a format for in-script dependencies, and in and of itself has (or should have) nothing to say about backwards compatibility concerns of pip-run or pipx or any other third-party tools. Those tools or any others can use this format, or can use some nonstandard format (just as they’re doing now). If something like the alternative proposal @h-vetinari suggested were to be approved, well, pipx and pip-run and maybe some other third-party tools would not be compliant with it, and they could fix that, or not, but that’s neither here nor there with regard to what the standard is. Maybe the ideal path (or at least a good path) is that some discussion happens and people go “okay yeah actually a slightly different version of this would be better”, and then pipx and pip-run implement that alternative, and it works great, and then we can take another stab at codifying that in a PEP.

With regard to the packaging survey and users’ thoughts on the profusion of tools, I’m a bit ambivalent. As was discussed at length on other packaging threads, part of the problem there is that talking about standards doesn’t magically cause work to be done or tools to be improved. A lot of what users want is tools to do things; standards about how things should be done impact the typical user only indirectly, insofar as someone actually implements those standards.

So, on the one hand, that means this proposal will probably have limited effect on users’ confusion, because it’s just codifying behavior that already exists. The main risk is that having this mechanism blessed with a PEP will increase the number of tools in this space, but they will start to diverge in various ways and users will have a tough time choosing between them. That said, I do think the PEP amplifies this concern by alluding to the possibility that other tools could deviate from the given format (e.g., “accept anything that pip accepts”). That seems to really open the door to potential confusion. If we’re going to specify a format, let’s specify a format.^[4]

On the other hand, though, that’s sort of why I see less upside for this PEP, or even some downside. As I understand it, pipx and pip-run already do this (or soon will). So this PEP doesn’t give users anything they don’t already have in those tools. Is there a worry that, without this PEP, other tools will start to do it in slightly different ways, and that will lead to an increase in user confusion? But if, as I mentioned above, some of the rejected alternatives might actually be better, wouldn’t it actually be bad if we proscribed those other ways of doing things?

In my view (and I realize that not everyone agrees with me on this. ) the real way that interoperability standards can reduce user confusion is when there is a thicket of alternative ways of doing things, but it becomes clear that some of those are just differently colored bikesheds, or some are better than others, and then a standard can come in and say “a bunch of people did a bunch of stuff, but we’ve now decided this is the official way”. I’m a little leery about approving a PEP like this where I see stuff in the rejected alternatives that sounds (at least potentially) better to me.

I agree! I’m not saying we need a full-court-press user survey for every single thing. In fact I think by the time a PEP is proposed, broad-based UX research may not be the right thing; there can a tendency for people to support a proposal because it claims to meet a certain need, and only later realize they don’t like the way it tries to do so.

But rather, like I said at the beginning of this post, what I think is beneficial is more discussion at an earlier stage to find out what users’ goals are and what the obstacles are to those goals, and then that can inform a more specific proposal. Sort of seeking “feed forward” rather than feedback.

More generally, though, like I said repeatedly on some of the other threads, my own take on the sentiment expressed in the user survey^[5] is not so much “there are too many confusing standards” but “Python does not come with an included battery that does everything I want packaging to do in a coherent manner”. It is because of that that users are cast adrift and must navigate through a sea of alternatives. This PEP is really neither here nor there with respect to that user sentiment (if it is a common user sentiment, as I believe) — because users who believe that won’t care if there are two or three or a thousand tools implementing this, they will just say “if this behavior is so useful and great, why doesn’t it come with Python?”

To summarize this long post:

Being able to run single-file scripts without having to engage the full Python packaging mechanism is a real need. But the PEP makes judgment calls about how to do that, and I’m not sure all of those are the right ones, and at the least I think we need a fuller discussion of them.
More user input is good, but I think it’s better to get that input before getting to the point of making a specific proposal like a PEP. This makes it more likely that the proposal will actually make users feel like they got what they wanted.
In my view, a big part of the problem users have with Python packaging is not with standards nor even with tools, but specifically with the default tools that come with Python. So I don’t see the packaging survey as saying too terribly much about this specific proposal; the question is whether the proposal will make it into official included Python batteries.

(Sorry by the way for my earlier empty post. I accidentally hit reply way too early on this post, then deleted it, but then couldn’t post the real one because the thread is in “slow mode”.)

Likewise, I don’t think I’m the only one who thought there was some genuine need that PEP 582 was trying to meet, and I hope I’m not the only one who agreed that that particular proposal wasn’t the best way to meet it. ↩︎
As a concrete example, it’s not clear to me that that it’s better to have this ad-hoc format instead of something like TOML-in-comments ↩︎
although that’s not saying they’d make the same alternative calls I would ↩︎
That makes it all the more important that we specify the right format. ↩︎
and I acknowledge nothing in the survey results said this in so many words ↩︎

DavidCEllis · July 30, 2023, 7:42pm

H. Vetinari:

So I do not buy the “single file” requirement, or at least not that it trumps all of the above. This does not mean that I’m dismissing your use-case, but I do believe the same result (having a script without too much structure or ceremony & a reasonable way to specify its dependencies) could be achieved differently, for example by:
scratch/
  - my_fancy_script.py
  - my_fancy_script.requirements.toml
  - ye_old_workhorse.py
  - ye_old_workhorse.requirements.toml
  - [...]

To me the single file requirement is the entire point. There are already ways to define dependencies once we have a multiple file project, but there is no standard way to define them inside a single file script.

If you come up with another method to define dependencies outside the .py file then anyone who actually needed a solution for the single file script use case will continue to come up with other ways to define requirements, or avoid using third party packages entirely because that ‘solution’ ignored the issue it was supposed to solve.

–

Separately to this, I’m a little disappointed that specifying version of Python is considered out of scope as moving machines and finding out the version installed is missing something I’ve used after wasting time installing dependencies has definitely bitten me before. At the very least I’d like to see a stronger, specific argument as to why it’s considered ‘out of scope’.

fungi · July 30, 2023, 8:01pm

Put another way, people already will (and do) track packaged
dependencies of their scripts inside their scripts. Currently there
is no standardized/blessed syntax for doing it, so everyone who does
it does so differently. Saying the solution to that is not to put
dependency tracking in the script won’t stop people from doing that,
it will just mean that they continue to do so in a fragmented and
uncoordinated way because nobody wants to provide an interoperable
specification for tools to gravitate toward.

If the resulting state of PEP 722 is “script dependencies go in
another file that’s not the script” then I and others will just
ignore its existence and continue to track our dependencies inside
the scripts that need them, so the single file requirement really is
central to the use case, full stop.

sinoroc · July 30, 2023, 8:41pm

It can probably be added via a later PEP.

My take on this this is that the tools in the PyPA-centric packaging ecosystem are not really ready for this kind of feature yet, although there are some new things appearing in this domains (pip --python, posy, the py launchers, and so on). On the other hand it is a feature central to the conda ecosystem.