PEP 722: Dependency specification for single-file scripts

https://packaging.python.org/en/latest/specifications/declaring-project-metadata/

Hi all, as someone who has been learning a ton about how packaging works in Python (and reading a lot of these long threads) in recent months, I’m very excited to see one of these discussions happening live and hoped to chime in with my experience.

As a starting point, I really like the idea of PEP722 and would desperately like to see it adopted.

I’ve also been a strong convert to Hatch (thank you Ofek!) and use that exclusively for all of my own projects now, but see this use-case as completely unrelated to what I use (and am actively teaching a lot of people to use) Hatch for.

I work at a manufacturing company where I’m the technical lead for a small team doing data analysis for manufacturing lines in Python and presenting a lot of those findings in a custom built web app (using Django), so have decent exposure to both the data science and web dev sides. Most of the people on my team do not have a software background at all, they are instead primarily manufacturing engineers coming from plants who are learning Python on the fly.

Also, I’m currently in a master’s program for data science that is pretty heavily targeted at a similar demographic (people who have an undergraduate engineering degree, worked in an industrial or business setting, and now want to pivot into data science/analysis). Again, a lot of smart people coming in who often only have a basic familiarity with Python from a course or two in undergrad maybe a decade ago, but who are now having to learn it again (while also often having to learn R for a different class at the same time).

I think a lot of people here are vastly overestimating the understanding of beginning users, even people with backgrounds in technical fields like engineering.

Most of the people I work with have no idea what a shebang line is, what the $PATH variable is, what virtual environments are, what TOML is, etc.

Almost all of them have a folder (usually named “Python Scripts”), full of random scripts that do one simple thing, because bash/Powershell/etc. are much much much harder to learn than Python, and the primary way they are shared with other people is by email/Teams/slack (the actual file if I’m lucky, often just all of the contents of the file copied and pasted into a message)

Plenty of people who have asked me for help with something in Python could not tell me how they even installed Python in the first place and have no idea how to manage dependencies. No one has the slightest idea how to properly package up and distribute a script as an installable wheel with entry points defined.

Part of what has been confusing for a lot of the people I work with and why packaging seems complex and not user-friendly, is that there is essentially no on-ramp between:

“Here’s a single file script using only the standard library that you can run with python my_script.py
and
“You need to have a project directory with this special file in a format you’ve never heard of before, preferably with a nested src layout that you don’t see the point of, and then your code needs to be built into another format you’ve never heard of before, and then to use it you need to create a virtual environment following these steps and remember to activate that environment each time you want to use it”

The simpler the solution (which I don’t think it could really get any simpler than the current draft of PEP722) the absolute better it will be for all of the people I know.

The opening example in the PEP is just about perfect, my only suggestion would be to use a more “magical” looking set of beginning characters, maybe #% to somewhat copy the Jupyter world.

# In order to run, this script needs the following 3rd party libraries
#
## Script Dependencies:
##    requests
##    rich

It reads like plain English, it’s incredibly simple to parse, and even if you don’t know that there are special tools that could run this script for you, it’s not the same huge leap to look at that and say:
“I know about pip, I should probably do pip install requests and pip install rich

I don’t think anything using TOML is going to be beginner-user friendly, since all of the beginners I’ve worked with do not even know what TOML is.

18 Likes

This is true, but do they need to know what TOML is to understand the equally-magical formatting in e.g. this post? It seems like there’s a balance to strike between “human-readable format” and “clearly a format, not just prose”. In my experience[1], it just takes a while for people understand that exact formatting is important.

It feels like the discussion has bounced back and forth between a desire to standardize and a desire for experimentation. If the tools haven’t been written and the best practices are unknown, why codify the format in a PEP? It just seems like a constraint on future experimentation, even just for yourself. [2]

It makes sense to figure out a standard format to allow multiple tools that do different things with the same info. But when those tools don’t exist and don’t even have an author planning to create them, there aren’t enough use-cases or feature requests to figure out what the right format should be.

I totally get the desire to standardize early, before an ecosystem of different things develop and standardization gets harder, but maybe having pipx out there for a while is necessary to clarify what people want.

FWIW tomli parses all of these correctly and consistently, even the pathological case if there isn’t trailing whitespace (and there shouldn’t be).


  1. similar to yours, in that I work with a lot of Python novices ↩︎

  2. and if no one else actually bothers to write a tool for this, then hey, whatever you implement is the standard. :sweat_smile: ↩︎

1 Like

Repeating my earlier comment since it was entirely ignored.

We don’t need all the capabilities (and complexity) of pyproject.toml (or even just the standard project metadata fields), when it’s not clear that anything beyond dependencies (and maybe requires-python) would make sense for quick scripts that will not be distributed on PyPI or installed as a dependency.

If pyproject.toml can be embeded in a single-file script, then all tools that read configuration might now get requests to read their configuration from a script instead of a separate file. If only dependencies can be embeded in scripts, then tools that run scripts will get requests to support that, but otherwise the current configuration method remains the same. By the time you need all the features of pyproject.toml, you may as well write a pyproject.toml next to the script file, but that’s not needed in the case being described here.

11 Likes

If pyproject.toml had the feature of building multiple different binaries from the same directory, that would help, but there’s no such thing. I would prefer that as the solution rather than embedded toml.

Basically, because Brett expressed an interest in the format being standardised, so that VS Code could support it in some sort of “add a dependency to this script” action. So there is an interest from other tools in supporting scripts that declare their own dependencies, but only if it’s standardised (which is something I can understand, pip has a similar policy).

Not really. It’s come up a few times, but people seem to keep missing this point - the target audience is people writing scripts where the important point is explicitly that there’s no need to have a “build” step, they are just “write and run”.

5 Likes

FYI I plan to add some form of support to the Python Launcher for Unix for whichever PEP gets accepted (probably via a py run subcommand, once subcommands even exist and I have had time to verify my in-development py pip subcommand works appropriately). As for the Windows launcher, the tricky bit would probably be how to do the installation for such an official tool (i.e., do you assume pip is installed?). I can cheat with the Unix version since I get to do what I want (which is probably download and cache pip.pyz like I am doing with py pip). :grin:

To be clear, Jupyter doesn’t have some special support for #%, correct? I know about magic commands via %% and so I assume you’re suggesting a take on that and not something already supported?

1 Like

I will only respond to this because I think it actually condenses the core of the issue under discussion: where we want to be in 10 years, what world we’re trying to build.

In my ideal packaging world, there is one tool that

  • most people use, beginners and advanced users alike (but it will never be used by everybody, and that’s fine)
  • can run scripts
  • can run REPLs
  • can install Python and manage Python versions
  • can manage projects
  • can install projects
  • can manage lock files
  • etc.
  • reads one data format (TOML, PEP 621) for all metadata

(That is why I’m fond of Rye, which, in spite of being “yet another tool”, is the first tool to date, AFAIK, to handle the “can install Python and manage Python versions” bullet.)

(And in contrast, there can be as many build backends as useful, though preferably one “canonical” build backend for pure-Python projects, e.g., hatchling.)

I think the “too many tools” mantra is well-accepted by now. What are your reasons to think that in an ideal world of packaging that we arrive at in 10 years, there are still different tools for scripts and for projects[1]? How does that allow better UX?

In a footnote, you write “And by the way, the priorities and trade-offs for managing environments for scripts are very different than those for managing project environments.”. Can you explain what you mean by that?


  1. I’ve hesitated to write “scripts and projects” because I view the line between the two as blurrier than the PEP text. ↩︎

4 Likes

I don’t have much time to articulate a proper response but I think this is a very astute observation that I didn’t see myself until you said it.

I’m not at all saying that those who disagree with me are not doing this, but for me personally I am looking at absolutely everything through that decade or longer lens which is why I am putting a high value on cross tool interoperability/unification and UX. If as a user some task is arduous when there could be a simple workflow or something is not beautiful when it most certainly could be (like IDE syntax highlighting of this new section) I basically de-rank that heavily in my mind.

Perhaps that’s not a good mindset? I don’t know, but that is almost all I care about and I would have to be persuaded to change it.

4 Likes

A post was split to a new topic: My thoughts on the PEP process

GitHub - David-OConnor/pyflow: An installation and dependency system for Python did that quite a while back (at least 2 years ago).

2 Likes

Hey all, I was hoping that slow mode would cause you to reflect on what is on topic here more instead of falling into off topic discussion, but despite that you’re still getting off topic. This is not the place to discuss the PEP process, or what other installers exist or do, or respond to posts that are not about (or directly ignore) what’s stated by the PEP. Provide feedback about this PEP here, not about other things that could be PEPs or meta discussions instead.

4 Likes

A post was split to a new topic: What is the py launcher for Linux?

Magics are defined by the IPython kernel (which is usually, but not always, the kernel being used in a Jupyter notebook) . They can either be single line magic commands with %, or cell block magics with %%

Spyder recognizes “code cells” with either #%% or # %% to let you run small chunks of a script in an Interactive mode.
VS Code also recognizes # %% for the same purpose (thanks Brett and team!)

As far as I know though, #% on it’s own has no meaning anywhere in the Python ecosystem.

My suggestion (and the way I prefer my bikesheds painted) was indeed just to try to be similar to these pre-existing “magic” uses.
It might be best to avoid the similarity though, to avoid any confusion between them.

There are a lot of tools that do more or less the same thing: managing a project that will get built and distributed as an installable package.

That is not this use case. At all. This use case (which is an incredibly common one) is to have a single executable python script with it’s own dependencies. All of the complaints about “too many tools” are not complaining about a lot of tools that do a lot of things. The frustration I see more often than not is that there are a lot of tools that all do one thing, but none of them do this very simple thing.

However, there are plans for supporting this use case at last. pip-run does it, pipx soon will (merged but not released), and there are the plans mentioned about adding support to the VS Code Python extension.

The point of the PEP is to define a simple, easily understood and parsed format for this narrow, but common, use case: writing the “better batch file” that includes a 3rd party dependency

Often a lot of these scripts are put into a single directory, and they might have conflicting dependencies, so a single environment won’t work, but they are all individually small enough to not make it worthwhile to make each a full project.

The desire is to have each script be able to entirely able to stand on it’s own (and “distribution” is as simple as giving someone a copy of just the .py file). All that is needed for that is a way to declare what dependencies are needed outside of the standard library (and maybe a shebang line if you want to get crazy).

Environment management for a project involves declaring dependencies for the code itself, dependencies for testing, dependencies for building documentation, dependencies for the build environment, additional development tools like linters and type checkers. If the project includes optional dependencies, there may also need to be separate test environments for all of those as well.

If you want a script that can also include all of those features, nothing is stopping you from using the project management system and declaring an entry point for your code.

A single script doesn’t need any of those (and the script author doesn’t even need to be aware of the existence of any of those).


  1. I’ve hesitated to write “scripts and projects” because I view the line between the two as blurrier than the PEP text. ↩︎

2 Likes

Your reply sounds like I didn’t quite get my point across, so let me use an analogy.

Consider Python itself, as a language. If we assess its complexity, it is huge. It has dozens of statements, complex semantic features like async, hundreds of stdlib modules. Nobody can even know all of it. Hairy stuff, right? Beginners don’t need most of that. An absolute beginner to Python just needs some simple assignments/if/while sort of statements, a few simple types, a few basic functions, and perhaps some turtle graphics. Then how do so many absolute programming beginners manage to use Python? The answer is, because Python does a good job of making simple things simple and hiding complexity from beginners’ eyes.

The packaging tools we have aren’t really like that, famously. They’re more like Java, where you must create a class before you can even print “Hello, World” to the console.

PEP 722 makes an argument, which you repeat above and which is valid, that this is too much complexity for beginners to handle. As a remedy, it proposes adding a second class of tools which would be more like Logo. Only the essentials, only for beginners, no complexity.

That’s a valid proposal, but what I’m saying is that I would rather have the Java-like tools transformed into more Python-like tools that beginners find themselves comfortable with, without creating a split between two types of tools and two metadata minilanguages.

4 Likes

A post was split to a new topic: Who should approve a Packaging PEP?

I think it’s horribly misinterpreted by many people. But that’s offtopic for this discussion so I won’t say more here.

For running scripts in particular. Because running a script is a Python core function. And, the core devs have explicitly stated that they want packaging to be outside the core. So script running, while it may be something packaging tools will offer, is not exclusive to such tools.

Running a script with its dependencies will interact with packaging, but it will still be a core Python function, and therefore will (should) be part of the core feature set. Here, I’m considering the py launcher as core, because I expect that in 10 years it may well be shipped with Python for both Windows and Unix.

Project environments are persistent, and a core and visible part of the environment. They are typically managed explicitly by the developer (in terms of their content and use). They are often stored in the project directory (by choice, not as an implementation detail).

When running a script, an environment[1] may be needed, but it’s hidden, managed transparently, and transient. It may be cached for performance reasons, but if so, management of that cache (expiry of environments, in particular) should be transparent.

Those two sets of requirements are utterly different, and trying to force them to work the same will mean trade-offs to the detriment of both cases.

Not for beginners. For people who write Python scripts as part of a role that is not that of a Python developer. These are not beginners, they are often highly experienced and senior developers or admins, who simply view Python as a tool for automating important tasks within a single environment.

Honestly, I wish we would stop characterising users as “beginners”. Not everyone who uses Python is on a path that goes from being unfamiliar with more than the basics, through familiarity with the language and its ecosystem, to expert level understanding. People can be experts, but simply not consider parts of the Python ecosystem (for example packaging) to be a priority to them.

Would anyone consider Guido to be a “beginner” in Python? And yet, I’m pretty sure he has very little knowledge or interest in the packaging ecosystem. Would it make my argument stronger if I used him as the example of “someone who has no interest in, or patience for, the complexities of pyproject.toml”?


  1. in the broadest sense - it may not need to be a virtualenv, for example ↩︎

8 Likes

Not to speak for James or anything, but I think you got the point across fine the first time - this is just a disagreement.

No, I don’t think this is the argument at all.

First off, workflows like this are not necessarily “for beginners”, and thinking that way is problematic - in the same way that thinking about questions about fundamental tasks as “too easy” is problematic for Q&A sites, documentation writers etc.

But more to the point: what is proposed is not a new class of tools (script runners exist already, and this only advocates a format without saying anything about who will actually use that format), and it is also not an alternative to the packaging process. It serves a different need.

Single-file scripts are unquestionably applications rather than libraries in almost every case. People who write this stuff aren’t expecting to share it with friends, so that their friends can import it. The goal, realistically, is to run it.

Making it possible to run the script is not “packaging”, so it doesn’t make sense to judge the tools for doing so by the standards of the existing packaging framework, or consider the impact on the “ecosystem”. When people set up a script for others to run this way, it’s not just that there is no intent to mark files as belonging together, no intent to put the code in a centralized “index” for distribution, no intent to prepare a “wheel” or interface with code in other languages etc. etc.

It’s that there is no intention for others to “install” the script.

A script distributed this way should be able to just sit anywhere it likes - not put any representation of itself into site-packages. Someone who already happens to have a Python environment set up that includes the necessary dependencies, should be able to just use /path/to/appropriate/python my_friends_script.py and go. People who don’t want to learn about any tools can read the comments, pip install whatever to whichever, and proceed with step 1. People who want runtime isolation (not build isolation!) can do fancy_scriptrunner my_friends_script.py, and let it parse the comments automatically.

But in no case is the script “installed”, therefore it was not “packaged”. Even when a script runner creates a virtual environment (which is likely temporary anyway) to house dependencies, the script itself is not moved or copied; and no actions are taken for the purpose of “advertising” the script to the OS, other programs etc.

11 Likes

I agree that this is not a feature targeted solely at beginners. I’ve been around the block a few times. I have a number of packages on PyPI. I understand packaging. And I would absolutely use the feature that PEP 722 is documenting.

15 Likes

A post was merged into an existing topic: My thoughts on the Packaging PEP process