PEP 722: Dependency specification for single-file scripts

defjaf · July 28, 2023, 12:30pm

Well, the proposed PEP explicitly says

Of course, declaring your dependencies isn’t sufficient by itself. You need to install them (probably in some sort of virtual environment) so that they are available when the script runs.

And, indeed, that is the way both pip-run and pipx would handle the situation as far as I understand…

fungi · July 28, 2023, 1:06pm

Well, the proposed
PEP
explicitly says

Of course, declaring your dependencies isn’t sufficient by
itself. You need to install them (probably in some sort of
virtual environment) so that they are available when the script
runs.

And, indeed, that is the way both
pip-run and
pipx would handle the situation as
far as I understand…

Sure, but it doesn’t mandate the use of venvs, so I don’t understand
the point to your rant about venvs. Just because the author of the
proposal, and the authors of the tools already doing something along
these lines, use venvs that doesn’t mean you have to use a venv to
get some value out of the proposal. I get that you don’t like venvs,
so… just don’t use them? And accept that there are others who do
find them useful as actual solutions to broader problems than you’ve
encountered (or perhaps ever will encounter) rather than derisively
declaring them a “non-solution” or pretending the problems some of
us deal with are nonexistent.

I could see, for example, using a tool which checks your preferred
environment for the presence of the packages specified in a PEP 722
compliant comment block in a script before running that script, and
either warning you that you’re missing dependencies (by telling you
what exact packages you want) or simply installing them for you,
whatever your comfort level. That could work equally well in a
non-isolated shared environment or with a persistent venv (and the
latter is how I would use because I need the additional isolation in
many cases). It’s something I find useful and already do because I
need to rebuild my environments semi-regularly, so welcome the
opportunity for an interoperable specification around it.

defjaf · July 28, 2023, 1:19pm

Of course, and if my “rant” (which I admit is my own word for what I wrote!) implied otherwise I certainly should not have done so. And indeed a tool which just ensured that my environment (virtual or otherwise!) had the correct packages installed would indeed be useful!

I could say more about virtual environments but that is not appropriate for this thread.

pf_moore · July 28, 2023, 1:38pm

To be clear, those are not PoC implementations of the PEP. They are existing tools that have implemented a “run-with-dependencies” operation for scripts, because there was demand for that feature. They will continue to exist whether or not this PEP is accepted.

All the PEP does is document in a common place, the format that both tools use for extracting dependencies. It also tries to make the format useful should people write other tools that need this data, and it will almost certainly make slight changes to details of the format based on feedback here (which pip-run and pipx will probably implement, because following standards is a good thing, but they don’t have to, of course).

Think of pip-run and pipx more as existing use cases for this PEP, and it might make more sense.

h-vetinari · July 28, 2023, 1:42pm

But this expansion of covered use-cases (great!) should not make the situation worse for everything else, and the packaging survery could not have been more clear about reducing the divergent tools in packaging, so adding yet another way to specify dependencies understandably meets resistance – and it’s on the PEP to prove this necessity.

It’s also a really ugly way: magic comments. This breaks syntax highlighting, much automated tooling (due to parsing two different syntaxes in the same file), is prone to diverging in semantics from requirements.txt / pyproject.toml / poetry lock files, etc.

So I do not buy the “single file” requirement, or at least not that it trumps all of the above. This does not mean that I’m dismissing your use-case, but I do believe the same result (having a script without too much structure or ceremony & a reasonable way to specify its dependencies) could be achieved differently, for example by:

scratch/
  - my_fancy_script.py
  - my_fancy_script.requirements.toml
  - ye_old_workhorse.py
  - ye_old_workhorse.requirements.toml
  - [...]

That would still give a clear approach, without much overhead: Start hacking away in xyz.py, and once you need third-party dependencies, add xyz.requirements.toml. The actual suffix for that is bikeshed central, but that way we could:

reuse some existing infrastructure (e.g. a reduced form of pyproject.toml), rather than having yet another way for dependency specification.
users who want to “graduate” their script for some reason just have to rename that file and add some extra metadata to make it a full-fledged project.

pf_moore · July 28, 2023, 1:46pm

True. My point was more that as you say, UX research is costly, and it’s not always easy either to ensure the data is representative and unbiased, or to interpret the results accurately. And while the participants in this discussion are certainly self-selected, I don’t imagine that the sort of user research we could reasonably undertake would avoid at least some level of self-selection. Even the “big” user survey that @jezdez has referred to involves a certain amount of self-selection, even if it’s only selecting “people willing to take a survey” (which is likely to be biased towards “people who have a point they want to make”).

Eliminating such bias is a complex, specialist task. I have some background in statistics, so I know enough to know I don’t know how to do it properly, but that’s all

User research is absolutely a good thing, and we should do more of it. But it’s not a way of avoiding having to make choices based on our experience and knowledge. And sometimes choosing what (in our view) is right over what’s popular.

pf_moore · July 28, 2023, 2:07pm

This PEP adds literally no new tools, and no new data formats. All it does is make one existing format (used by two existing tools) into a standard, so that if we (for example) later replace those two tools with a single new one (reducing number of tools?) then users don’t have to change their code (reducing churn for users).

That’s a fair criticism. I’m open to other suggestions. But many other languages use the “structured comments” approach, so it seems like it isn’t so bad in practice.

… and we’re back here again. How many people stating on this thread that they have a requirement for being able to declare dependencies in a single-file Python script are needed to demonstrate that this is a real-world use case?

OK. Maybe that would work. My gut instinct is that it would be something I’d use reluctantly, and be frustrated by various “papercut-level” annoyances. But I don’t want to reject a reasonable proposal just because it’s not my favourite. Also, none of the other languages mentioned in the survey of languages linked above use a separate file^[1], so it feels like it’s going against common practice. Do you have examples of other languages using this approach that you can point to?

If you’re serious about this suggestion, are you willing to get it added to pip-run and pipx? What’s the transition plan from the existing behaviour to this proposal? There’s a whole “backward compatibility” section of the PEP that will need writing if we go down this route.

Yes, I concede that’s at least partly because the survey is of single-file solutions. ↩︎

h-vetinari · July 28, 2023, 2:46pm

You snipped my statement in a somewhat unflattering way; I do accept the use-case. Luckily, your PEP is named “dependency specification for single-file scripts”, which I have no problems with as a requirement. My point was the the dependencies do not have to be in the same file to achieve that.

My response was aimed at pointing out the potential solution space between “single-file script” and “single-file script+requirements”, and that it’s possible to support the former in a way that doesn’t (a priori) create yet more UX & teachability problems.

I do care about python packaging (and not increasing divergence further), but between 2 jobs, my FOSS “responsibilities”, and a sliver of social life, I don’t have time to write, much less implement, a PEP, sorry.

jamestwebber · July 28, 2023, 3:14pm

I know this is addressed and currently rejected in the PEP, but something like __dependencies__ with a restricted syntax (only string literals, for instance) could be a simple solution that doesn’t require a complete parser.

pf_moore · July 28, 2023, 3:30pm

Sorry, you’re right. My posts are getting so long I’m trying to keep my quoting limited, I went too far in this case.

No worries. I’m not trying to say “put up or shut up” or anything like that. But equally, I don’t have the energy to take your suggestion further (I foresee a number of problematic areas that will trigger even more rounds of debate, such as “we can’t standardise the requirements format, and yet we can’t call it a requirements file if it’s not one”). So unless someone wants to pick this up, I’ll put it in the “rejected ideas” with my concerns recorded. I hope that’s OK.

I’m not sure what you want me to say here. Unless you address the issues mentioned in the PEP, I don’t see what you’re suggesting… Even though it’s not stated explicitly, the example syntax in the PEP is restricted, because it has to be something that can be evaluated statically. That’s the point of the 4th problem in the list given. If you want to pursue this, please give a specific proposal.

ntessore · July 28, 2023, 4:06pm

I do not have anything to add towards a resolution, but this does not really address the existing user frustration, which for me boils down to:

There are N different official ways to do K different things in the Python packaging space.

Your proposal still raises that to:

There are N+1 different official ways to do K+1 different things in the Python packaging space.

thejcannon · July 28, 2023, 4:11pm

Thinking outside the box a bit just to see what else we could have (or why nothing else quite fits)

In https://www.pantsbuild.org/ we handle these things by mapping imports back to requirements (Record the top-level names of a wheel in `METADATA`? is kinda relevant in a way… but the other direction). Most package names map to their module names, and then for those that don’t, one big mapping (which users can extend) is the backup.

So what if these tools tried a similar approach? Scrape the imports ^[1] which gives you module names, then ask for those packages (probably asking some server for module → package). That should work for many cases. I think we’d miss out on optional dependencies and other, less-scrabale dependencies (like using strings with __import__).

Optional dependencies could be handled with a PEP to allow imports with brackets import requests["toml"]. I very much expect that to be rejected, however. Alternatively, import the extra, and just don’t use it (meh, but ick)

So, if you wanted to solve it for everyone, at some point you need to parse extra info that isn’t just imports (a la __requires__ from pip-run).

… So, it’s a shame that the 80% case (imports and packages align 1:1) is poisoned by the 20% case, and we can’t get this from a nice, structured way. Parsing imports has some nice benefits (remove and import, and you don’t need to remember to remove it from the Requirements block. No new thing to muddy packaging waters).

And import parsing can be done easily through ast or efficiently through tree-sitter+Rust (what we do in Pants). ↩︎

khs · July 28, 2023, 4:29pm

I don’t mind comment based configurations, it’s done everwhere already, like documentation. Another option, clumsy but would please purists, is to have an embedded toml data record up top in the single-file.

abravalheri · July 28, 2023, 4:33pm

I don’t know if this is a fair assessment in this particular discussion.

The first thing is that I would not be so confident in saying that the problem the PEP is trying to solve lies in the portion of the “Python packaging space” that people have been complaining about. Sure it involves installing distributions, but it is not related to the process of “packaging” a project into a distribution format that can be shared (which seams to be the point that troubles most people).

As stated previously in this discussion, the PEP focus on solving the problem of executing domestic/bespoke/personal scripts and alleviating the pain of manually managing virtual environments.
Would we make Python better if we simply refuse to solve this problem? For me the answer is no, and since different problems require different solutions, it is also natural that we have different ways of specifying different things (it is not like you can use an automatic can seamer to open a can).

The second thing is that the PEP is informational and only documents practices that are already implemented and available in the ecosystem. If anything, the existence of the PEP will be an incentive for not “reinventing the wheel” (unintended pun) next time a tool developer decides to tackle this particular pain point (which is a real pain point for many devs that chipped in this thread).

sinoroc · July 28, 2023, 6:12pm

As far as I understood the notation expected is the one from “Dependency specifiers” standard specification (first defined in PEP 508). So yes, something like numpy==1.25.1 should be allowed.

jeanas · July 28, 2023, 7:00pm

See also: Sketchy and maybe crazy alternative to PEP 722

brettcannon · July 28, 2023, 9:36pm

I’m wondering if the differing view/understanding around this for @jezdez stems from this “better bash” scenario compared to the “single file to distribution” scenario that has also been discussed? In the “bash script, but better” scenario, having to take a simple script and compile into an executable in the end becomes development overhead for something you were probably hoping wouldn’t take more than 10 minutes in total.

The distribution scenario, though, I don’t view it as necessarily the key motivator. I would imagine the sharing aspect of this is between machines you control (e.g., I’m setting up a new machine and I have a couple of helper scripts I use on occasion), or sharing something with a friend (e.g., someone asked how to accomplish something and it’s faster for me to write them a script than explain what they are after). I personally don’t see this as a solution for anything where multiple files would have made sense to begin with (e.g. some kid wrote a game that had graphics stored in some image files).

The papercut that comes to my mind with this suggestion is leaving out the accompanying *.requirements.toml file by accident if you moved the .py file. Right now your only option for moving a project is to move an entire directory which implicitly captures everything. The PEP allows for a simple case of moving a single file. This proposal requires remembering to either move two files or use some * globbing. Either way you can’t just go with a tab-completed command in your terminal to move files.

I will say I used to do that back in the day, but then I got bit too many times by projects which had clashing dependency requirements. It also inherently ties the script to your machine. This also assumes your Python install is not your system Python install and you won’t accidentally break your OS with your dependencies.

This is similar to the “N+1 ways to do things” argument with a similar answer: this isn’t introducing any new tools, just either standardizing what tools are already doing or empower tools to not reinvent some solution for a use case that appears to exist for folks and those tools.

As someone who will probably have to implement this PEP, my answer is “no way”. Anything that looks like Python means people will inevitably treat it as such and expect Python’s syntax to work, no matter how restrictive you meant to make it. Add on to the fact that unless you define the fully supported grammar and thus require a parser, people will implement it differently which will lead to incompatibility.

But this really can’t be a “many cases” thing; it needs to be an “all cases” thing.This also doesn’t cover the version or marker restrictions you may want to put on your requirements. It also requires something that can parse Python import statements to get the top-level package names (which isn’t too bad; I have written such a regex, although it eats into perf a bit if you were to have to run it over a very large file, which this use case is not exactly aimed at). I think what would need to be seen to consider this is examples of:

The simple case; package name maps to project name and there’s no restrictions.
The Pillow case; how do you map import PIL to installing Pillow?
Restricted install; e.g., I only want to install packaging>=23.1.
Worst, single case; project name that doesn’t match the import name and has a restriction, e.g. pillow>=10.0.0.
Namespace packages space; a single import that requires multiple dependencies to resolve.

I’m assuming the 2nd case also handle the situation of multiple projects installing the same name. And then what the expected algorithm of resolving all of this to get the actual list of dependencies to install.

I’m not suggesting this couldn’t somehow work, but you do need to solve all of these situations and I don’t see how you don’t end up needing some special comment marker to go along with the imports to resolve these situations. E.g. a strawman that covers all of this is:

# ... stdlib imports

# Dependencies:
import trove_classifiers
import PIL  # requires: pillow
import packaging # requires: >=23.1
from azure import identity, synapse  # require: azure-identity, azure-synapse-artifacts

# ... local imports

But, for instance, how do you handle multi-line imports? Can that # require: show up on any line, only the first line, or only the last line? Is the lack of # requires: for the simple case too cute and not worth it? Is the opening # Dependencies: marker useful for simpler, faster parsing as well as making the simple case work as shown, or would a # requires: being required to drop the initial marker better? Is that multiple requirements bit not worth it and you should just have to write out your imports on separate lines? Is leaving the name off in that packaging example too cute/fancy? Do you support local imports as well if you drop the opening marker (which increases the parsing cost even more)?

I think the real question is what do people find more readable: this or what the PEP proposes?

pradyunsg · July 28, 2023, 10:33pm

FWIW, I agree – what you’ve described with better words than me is why I don’t think doing UX research work should be blockers, but they are certainly useful for guiding effort^[1].

To the extent we can guide efforts for a group of volunteers today anyway. ↩︎

h-vetinari · July 28, 2023, 10:47pm

To me this argument (“not introducing any new tools”) does not hold water. The new format needs to be parsed and handled correctly, which – more likely than not – will come with a reference library or tool to do so.

But even if there’s no reference, introducing a new format (that ~everyone needs to implement) is more impactful than a new tool (that no-one is forced to use). Has anyone asked IDE and editor authors how easy it would be for them to support PEP 722? Their users surely will be asking for it. I know of IDEs that still don’t have syntax highlighting for f-strings, for example.

Is that papercut a good enough reason not to reuse existing infrastructure / formats / concepts, and drastically cut down on the implementation complexity of this PEP? I find that a hard sell.

kknechtel · July 28, 2023, 11:23pm

I like the idea that requirements used by packaging tools are always in the same kind of place (a suitably named .toml file) with the same sort of format. I also like what @jeanas suggested, which I understood as basically “expand this to do what pyproject.toml can do”. I dislike the idea that every toolchain now potentially has to be aware of another specification (even if it’s just describing something that a couple tools already came up with) and parse magic comments. However, I like the idea of being able to keep the information in one file purely for distribution, for a single-file project.

I think I have a way to harmonize all of that.

Come to a consensus that we do, in fact, want pyproject.toml to be used for projects that shouldn’t generate a wheel.
Provide, with Python, one simple standardized script in the Tools dir that parses a source file for a single comment block containing text in the pyproject.toml format (it can afford to use quite naive/simple detection, I think), populates any missing required keys with sensible defaults (e.g. taking a project name from the file name and setting version to 0.0.1), and writes that file.

Now, when end users receive a single-file script, they can “install” it and its dependencies by simply running the toml-splitting script and then using their favourite tooling to install dependencies based on the now-existing pyproject.toml. When developers start a one-file project, they can just start typing pyproject.toml contents into the .py source. If it remains a one-file project, they can just distribute that file through GitHub, file sharing networks, social media etc. If the project later becomes more complex, the developer can use the toml-splitting script to create an initial pyproject.toml and go from there. Nobody has to be aware of a new standard or do any implementation work; it’s just a matter of documenting the toml-splitting script.