Sketchy and maybe crazy alternative to PEP 722

jeanas · July 28, 2023, 6:09pm

(Starting a new thread for this because PEP 722: Dependency specification for single-file scripts is already long enough.)

As a disclaimer, I am probably in the “Mount Stupid” part of the Dunning-Kruger effect about packaging knowledge. (At least I’m conscious of that.) This may be unimplementable, or outright ridiculous. Pardon my ignorance. I’m genuinely trying to help by giving a different, though relatively uninformed point of view.

Some people like me find PEP 722 to be “yet another way to do things”, while packaging should converge towards a more unified user experience instead of a myriad of different tools, workflows and methods.

For my part, I also believe there is a need to change the abstractions exposed to the user from virtual environments to projects. That means I do not have to worry about which environment my code runs it because some tool will associate one ^[1] virtual environment to each project I need to use and/or work on. I feel like the packaging space is already moving towards that, with tools like tox, Hatch, Poetry or more recently Rye, which replace manual python -m venv invocations.

At first glance, PEP 722 does that perfectly. It does something similar to all those tools, namely set up a venv automatically. But there’s a catch: what PEP 722 does is still expressed in terms of virtual environments. Assume PEP 722 is accepted and I am a user reading the documentation of $tool supporting it. It will say something like “using this command, you can run a script and have the dependencies installed for you in the background without interfering with your system”. And I will ask myself: wait, is this not what tox/Hatch/Poetry do already? Why a different format? Why a different tool? How is this all different from what I already know?

Granted, for many users who do not know about tox/Hatch/Poetry, this will suffice, but I fear there will be confusion among those who are in both worlds of “real projects” (with a project directory, version control, a proper pyproject.toml) and “quick scripts” using PEP 722 style dependencies. (Let’s not forget that beginners are taught and assisted by advanced users, so it is beneficial to have advanced users who are familiar with the same workflows that beginners use.)

We already have a “project” abstraction that can be the basis of a unified user experience centered around actions you can take on a project, such as: build the project, install the project (from a package index or Git repository or local folder), create an environment for developing the project, run a command line tool provided by the project, view the project’s metadata, etc. PEP 722 introduces a parallel system with a new abstraction that is different from “project”, since only one of these actions is available on a PEP 722 style script (namely “run the script in an environment containing its dependencies”).

So what could an alternative world with “single-file projects” look like?

In a world with single-file projects, “create a venv with the dependencies” is not the only available action. In particular, build frontends and backends also need to be able to build the project. There is a great diversity of build backends (setuptools, hatchling, flit, meson-python, scikit-build, maturin, pymsbuild, sip, etc.). I think it’s fair to say that this is objectively a good thing for the ecosystem, if only because non-pure-Python will always need to interface with a variety of C/C++/Rust build systems. On the other hand, I believe there is a much smaller number of frontends (pip, build, which other ones?). Also, in an ideal packaging world started from scratch, there is IMHO, in fact, just one frontend. Based on these considerations, it seems best to push the responsibility of handling single-file projects to frontends exclusively and let backends be unmodified.

Here are the specifics, then.

The definition of a “Python project” is extended from “directory with a pyproject.toml” to “directory with a pyproject.toml, OR single file”.

The metadata of a “directory” project is contained in its pyproject.toml file. The metadata of a “single-file” project is contained in an embedded pyproject.toml. Bikeshedding decides the exact format: comments, string literals, or something else.

If the name key is of the project table is not specified in pyproject.toml, it is implicitly set to the file name, stripped of its .py extension if any.

When a build frontend wants to build a single-file project, it prepares a build directory with the file, and a pyproject.toml filled from the metadata. This ensures that existing build backends do not need to be modified in order to be able to build single-file projects.

Tools that have commands implicitly determining the project from the directory (like rye) are encouraged to make those commands accept a file name as an alternative, to specify the project. For example, if $tool supports $tool repl to launch a REPL in a venv with the dependencies from the project in the current directory, it should also support $tool repl file.py, treating file.py as the project. (In the same vein, commands accepting a directory should be extended to accept a file.)

This has the big advantage that the user experience is exactly the same between single-file and directory projects. They will not need separate efforts and user-experience improvements. The only difference between single-file projects and directory projects is that, well, single-file projects can be kept in a single file.

Finally, in addition to the “UX unification” aspect, here are a few cases where the added metadata and capabilities could potentially be useful:

Bob has a script provided by his teacher as an exercise for him to modify. The script contains python-requires = "==3.11". He runs $tool run the-demo-script.py. The tool gently responds that he is running Python 3.10 and should change his Python version. (Or it just downloads and executes Python 3.11.)
Bob contacts his friend Alice, who is an expert programmer, because he wants to automate task X in his day job, but he doesn’t know how to do that, and he is a beginner with terminals. Alice sends a script with some entry points defined in the metadata and tells him: just run pip install myscript.py (or pipx install myscript.py), then you can do myscript args .... (As opposed to: Bob asks Alice how he can install the script to have it available as a command because he’s going to use it often; Alice replies that she does that by putting the script under her ~/bin folder; Bob despairs because he’s on Windows and doesn’t know what $PATH is or what an environment variable is.)
Carol posts a GitHub gist with a simple script that’s not worthy of a Git repository. She still fills the authors and/or urls and/or license fields so that pip show can be used to remember the info. (The sort of thing you’d often put in a comment at the top of the file.)

or several, e.g., with tox ↩︎

sinoroc · July 28, 2023, 8:12pm

This proposition seems to have a very different (much larger) scope than PEP 722.

As far as I understood PEP 722 does not care about virtual environments (“virtual environment” is mentioned only once and the following is clearly stated: “This PEP does not cover environment management”). Whether tools that implement support for PEP 722 do so via virtual environments or not, is not PEP 722’s concern.

My understanding is that PEP 722 aims to standardize a technique that already exists (in pip-run, in Jupyter, in pkg_resources’s __requires__, in fades and probably other places including upcoming in pipx). It is not about adding a new way to do things, on the contrary it is about standardizing existing practices so that new tools do not invent their own way of doing the same thing.

PEP 722 does not care about building anything, it is out of scope (and non-sensical for the usage cases that PEP 722 aims at?). PEP 722 is about listing the dependencies required to run a (pure Python) script. There is no build front-end or back-end to think of for these usages (at least not directly). There is nothing before or after the script (no source distribution, no build artifact), the script should be self-sufficient for redistribution and that’s the whole point (at least as I understood it).

I would also note that PEP 722 relies on the “Dependency specifiers” standard specification (first defined in PEP 508) which is what is used in the [project] section of pyproject.toml (and has been in use long before pyproject.toml) so there is nothing new here, the notation is exactly the same.

brettcannon · July 28, 2023, 8:18pm

See Projects that aren't meant to generate a wheel and `pyproject.toml` for the discussion of using pyproject.toml for projects which are not meant to build a wheel. I suspect you will need to see that conversation resolved in a way that makes pyproject.toml work for that scenario before you can try and generalize it as the format to use regardless of whether it’s stored as a separate file or embedded in a .py file.