PEP 723: Embedding pyproject.toml in single-file scripts

I expect that people would still try to use f-strings and other constructions and be disappointed and confused that it doesn’t work.

Reading the runtime value of __pyproject__ is a spec violation. The PEP currently states that the regex is canonical, so we’re already there. The module attribute could have been an f-string, or the result of module __getattr__, etc etc. At which point… Why is it being assigned? What value does exposing a string which may or may not match the spec at runtime provide?

Things are specified such that a user shouldn’t be making these mismatch. But tools can’t trust that, so it seems like this value will never be safe to read?

I feel that either I must be missing a key use case or else the value of this has been eroded as the spec is refined. If it’s the latter, it’s worth reassessing the multiline string vs comment decision point.

6 Likes

32 posts were split to a new topic: PEP 723: use a new [run] table instead of [project]?

I have this same concern, assigning to __pyproject__ makes it feel (to me) like we’re just introducing more possibility for confusion.

3 Likes

I haven’t read the full discussion, but neither PEP directly addresses a naive question I have:

Why settle for a script runner at all? If I distribute it a self-contained Python script, in an ideal world I’d want the script itself to Just Work (by fetching dependencies from internet into temp venv, or whatever it is a script runner will do…).
That is, in some far future :cloud:, one could write a run-time call:

import fictional_package_lib
fictional_package_lib.ensure("""
# TOML or requirement.txt-like or whatever syntax goes here...
requests
other_pypi_dep @ 1.2.3
""")

# And now import path is set up etc. so can immediately use them.
import requests
import other_pypi_dep
...

Now, I understand that (1) in present, it’s much easier to introduce external runners than bring fictional_package_lib to existing python installations (2) this goes against years of packaging best practices migrating off executable setup.py etc. to passive declarative files… (3) #! can make script runners almost transparent to the end user.

Still, I can’t help thinking this is a tempting “end-game” goal. I give you a Python file (not a “hatch file” etc.) and it’ll work. Batteries Available™.

  • does this deserve shooting down in Rationale / Rejected Ideas?
  • is forward-compatibility with such future worth considering?

P.S. the ability to freely mix execution with imports, instead of resolving imports in separate strictly declarative stage, is one of the things I really love in Python. It avoids a conceptual split which is hard for exactly those “average person”, “sysadmin” etc. categories that 723 lists


Yet another alternative which might be execution-friendly is add an import hook allowing you to “import” a TOML file. (Again, fetching deps & isolating in a venv behind the scenes.) This does require 2 files, but lets you specify same or different TOMLs between scripts in same dir.

I see people are stating goals like this, which throw my idea pretty much out…

I trust that’s for good reasons obvious to everyone involved, but it’d be nice if PEP spelled it out. Feel free to ignore me.

A lot of tooling is written in other languages that we must consider, for example, Visual Studio Code is written in TypeScript and Dependabot is written in Ruby.

Longer term, this functionality could be built into the py launcher (which at least for Windows users, is pretty much the definitive way to run Python) or even into the python executable itself - although that’s unlikely, given the SC’s preference for keeping packaging out of the core.

You can do that now. Just run pip in a subprocess. The problem is that a naive implementation installs the dependency in your main environment, which is generally not what people want here. You could run pip install --target to put the dependencies in a separate directory, modify sys.pathj, and delete the temporary directory on exit.

Edit: Actually, such a library function would make a perfectly reasonable consumer for PEP 722/723. So go ahead and write one if you think it’s a good approach - the PEPs will let you concentrate on implementing the infrastructure to make a list of dependencies available at runtime, without needing to design a data format or parse the dependencies out of your arguments.

The point is, this is available with no need for a standard. Although you could certainly write a function that read the script source and extracted PEP 722/723 metadata from it, meaning you get the best of both worlds - your preferred runtime functionality, and compatibility with runners for people who prefer them.

I could add it to the PEP 722 rejected ideas section. I probably should have, as this has been asked before, but honestly that section is so long already that I simply missed it. @ofek can add it to PEP 723 if he feels it’s worth it. If we come up with a combined PEP, I’ll try to remember to add it - there’s going to be a lot to do on the “rejected ideas” section for a merged PEP though, so no promises!

3 Likes

FWIW this was the basically the initial API designed for used in viv. Though rather than a newline-delimited string it is positional arguments as strings passed verbatim to a pip install (inside a cached venv):

__import__("viv").use("requests", "other_pypi_dep @ 1.2.3")

# And now import path is set up etc. so can immediately use them.
import requests
import other_pypi_dep
...

Obviously one still needs to have the viv script available on the PYTHONPATH (could just pip install it too of course) so that it is importable. But viv is a single script to maximize portability for this type of use case.

Since the spec says the value should be a valid TOML string, the implementation use something like ast.literal_eval to evaluate the Python string’s content, that way it mimics the runtime.

Also, if the TOML string will contain backslashes, the user should probably prefer single quoted TOML strings as they don’t support escaping.

As has been pointed out numerous times now, it is explicitly expected that that there will be tools that are not written Python (e.g. ruff, written in Rust, just to name one).

Triple-quoted string literals absolutely do support escaping:

>>> """\n"""
'\n'
>>> """
... \x40
... """
'\n@\n'

Perhaps you are thinking of raw string literals (with a r prefix)?

I’m talking about TOML strings, x = '\' in a TOML file is equivalent to x = "\\", since single quoted string literals in TOML don’t support escaping.

If I interpret the PEP correctly, it expects the part of the text file’s contents that’s matched by the regex group to be a subset of valid Python string syntax and valid TOML. Luckily they seem to be pretty compatible:

  • Both have \b, \t, \n, \f, \r, \", \\, \uXXXX, \UXXXXXXXX
  • TOML has \/
  • Python has \<newline>, \', \a, \v, \ooo, \xhh, \N{name}

The overlap contains almost all valid TOML syntax (the only exception being uselessly escaping forward slashes) and the behavior seems to be identical.

I’m talking about TOML strings, x = '\' in a TOML file is equivalent to x = "\\", since single quoted string literals in TOML don’t support escaping.

That makes this a bit of a gotcha in the sense that users can’t just put the value of the __pyproject__ variable through tomllib and expects the contents to be identical.

@ofek why didn’t you go with a raw string in the spec/regex? Then tomllib.loads(__pyproject__) and reference_implementation.read(script_path.read_text()) would be identical which I think would be less surprising.

2 Likes

As an alternative to “syntactic” __pyproject__ (which I do not like at all), what about a generic format for embeddable metadata comments?

I suggested something similar in the PEP 722 thread: PEP 722: Dependency specification for single-file scripts - #321 by gwerbin

For example, here’s how one might embed pyproject.toml in a Python script:

#!/usr/bin/env python3

# -*- pyproject:
# [project]
# dependencies = [
#   'sqlalchemy',
#   'click',
# ]
# -*-

if __name__ == "__main__":
    print("hello!")

Perhaps these -*- blocks, which are already in informal use, could be systematized, and thereby adapted for both PEP 722 and 723.

Yet another option (not incompatible with the above) would be to add syntactic support in Python itself for either “front matter” or “back matter” that is not necessarily embedded in a comment. Back matter might be easier to implement, because you can hide it behind some kind of special token like __END__ or __DATA__ like in Perl (see here).

#!/usr/bin/env python3

# Standard "coding" declaration
# -*- coding: utf-8 -*-

# PEP 722 dependencies
# -*- script dependencies:
#   click
#   httpx
# -*-

if __name__ == "__main__":
    print("hello!")

__DATA__

-*- pyproject:
[project]
name = 'all-in-one-demo'
license = { text = "MIT" }
-*-

Edit: A sort of informal “spec” here is that the pattern -*- delimits a metadata region, which can be one line or multiple lines, and the first : therein delimits the name/key of the region from the content/value. So the above script would have metadata that looks something like this (as a Python literal):

{
    "coding": "utf-8",
    "script dependencies": r"""
   click
   httpx
""",
    "pyproject": r"""
[project]
name = 'all-in-one-demo'
license = { text = "MIT" }
"""
}

Beyond that, the content would be the responsibility of the relevant tools to parse. This is a lot like how IPython magics work.

6 Likes

I would tend to nix the __DATA__ option because it requires much bigger changes which are harder to justify.
Unless you or someone else is advocating strongly for that, I’d like to see any discussion of this idea focus only on the comment delimiters.

I kind of like the idea of aligning with the encoding comment format. The fact that it’s not markdown will please some folks, myself included, who are wary of using triple back ticks.

My main worry is that such a simple pattern may match existing comments in files. However, unlike markdown, I think we’re within rights to ask users who have such comments AND want to use new tools which rely on these metadata comments to modify their comments in cases where it night be necessary.

I’m not sure that tools gain a lot because they would still need to check and ignore all such blocks which do not apply to them. Perhaps this can be resolved by simply insisting that the blocks are named in their first line. i.e.

# -*- python-requires
# 3.9
# -*-

# which is defined to be identical to

# -*- python-requires: 3.9 -*-

If you put names on the blocks, the suggestion suddenly overlaps a lot with PEP 722. I consider that a good thing.

My main questions are

  • does aligning with encoding comments pose issues?
  • it reads nicely to my taste; what do the PEP authors think?

Overall, I’m a soft +1 for this, assuming there aren’t problems which are spotted by more knowledgeable folks.

(Aside: I’m not sure what the best resolution is, but I expect this thread to easily become more relevant to one or the other of the proposed PEPs. Should it be split to a new topic?)

2 Likes

FWIW, I had a usecase today for “this script is useful to write and run with a few PyPI packages”. I took this opportunity to read both the PEPs and do a drive-through pretending that a magical script-run myscript.py command exists to run my script.


My main bit of feedback on PEP 723 is that it involves a lot of boilerplate for what I’d call the “base” case – declaring information in the form recommended by this PEP was a bit tedious and I need to be careful around quoting as well as other details that feel unnecessary (also, I can’t get automated tooling’s assistance today to write that :P).

__pyproject__ = """
[project]
dependencies = [
  "build",
  "pip",
  "httpx",
]
"""

Compared to PEP 722’s:

# Script Dependencies:
#   build
#   pip
#   httpx

Yes, that’s the “base-case” and PEP 723 supports more things today like requires-python. However, it’s still somewhat tedious to declare this information in the proposed format. I would prefer a comment-based/docstring-based approach over a dedicated variable.


From the PEP:

This is in a section about the [build-system] table, so it would be useful to tweak the language to avoid mixing of the concepts of Python’s current pyproject.toml-based build backends and generic “build system” – the title talks about build backends, which none have expressed support for that (based on a quick skim) and the wording here made me double back and be confused why pantsbuild/Pex is being used to show that this is something generic that we should have.

Why not limit build backend behavior? section doesn’t demonstrate any build-backend support for this and it’s unclear how this would even work – since the existing build-backend mechanisms are all based on the existance of a pyproject.toml.

We use SHOULD NOT instead of MUST NOT in order to allow tools to experiment [2] with such functionality before we standardize (indeed this would be a requirement).

Given that the expectation is that this will be something bespoke and specific to the tool, that there’s no prior art (the Rust RFC is an RFC still) and that there’s no clear path to actually having something standardised here, I think it’s a wrong choice to permit [build-system] here. I’d rather expect hatch make-a-script-package myscript.py (subject to renaming) would serve the underlying needs here and there’s really no concept of a build environment or build mechanism independent of whatever-that-tool-does.

PEP 517 was written after multiple efforts in a similar vein, and involved a lot of consideration/discussion about the various design choices within it. I expect that any similar abstraction mechanism would involve a similar order of scrutiny; or at least a decent amount of discussion. :slight_smile:


The specification section was difficult for me to read – the entire section is composed of paragraphs that are 1-4 lines, with not much in terms of clear structure to figure out what exactly the PEP is proposing. Here’s a suggested restructuring (in Markdown, because it’s easier): https://hackmd.io/@pradyunsg/ByLy0px63/edit

In a similar vein, the backwards compatibility section should be a single paragraph from “For example” to “that extra step of indirection.”.

1 Like

I think the idea here is that you could have something like python -m build --script myscript.py to actually build a wheel out of this script for distribution.

The PEP explicitly says the build frontends should ignore that information though. :wink:

That said, I expect that build isn’t going to add a --script until we know what the interface for that needs to be and it’s premature to lock ourselves to the build-system.requires and build-system.build-backend keys until we have that. There is no story for what the build system API would look like for these (think PEP 517), no established prior art, and it’s certainly not something that is standardised already that tooling has to just go-and-implement.

To simplify a lot – the argument made for permitting [build-system] in the PEP is “makes it possible to experiment” and I’m saying “that’s possible without it, which is how it’s actually going to be done, and we should not lock ourselves into existing concepts”.

1 Like

Here’s the draft of the final version:

Please save comments until I open the final discourse thread

1 Like

Here is the final discussion thread: Final - PEP 723: Embedding pyproject.toml in single-file scripts

3 Likes