PEP 723: Embedding pyproject.toml in single-file scripts

jeanas · August 7, 2023, 4:38pm

+1 to @jamestwebber’s comment here, I think that’s… just fine? Some of the metadata fields are more useful for single-file scripts than others, but that’s not a real problem.

license for example might not be suitable if you have a very restrictive corporate environment, but for a GitHub gist where I really don’t want to bother with licensing, I can imagine license = {text = "Unlicense"} being handy.

On the other hand, this does make me wonder if [project] fields that genuinely make no sense without a build backend should be disallowed, if only for consistency with how the PEP disallows the [build-system] table. The ones I’m looking at are entry-points, scripts and gui-scripts.

jamestwebber · August 7, 2023, 5:03pm

Thinking about the triple-quoting thing more (in the context of “very long readmes” and whatnot), I don’t know if it’s important to forbid this. There’s been a lot of discussion about making the format simple to parse, in part because it avoids parsing the AST and avoids versioning issues. But parsing the file is a totally plausible way to extract this info and some tools will do it–and having access to the correct python version is going to be an issue anyway, if you want to run the script. And tools like ruff exist which parse Python in another language (and support multiple versions).

I guess what I’m getting it is that, while it’d be convenient for tool-builders if the format is restricted, it’s a trade-off with making it “less than what python allows”, which is sort of confusing. If you’re developing in your home environment with all your dependencies, python my_script.py will work fine because your __pyproject__ is a valid string, but pip-run my_script.py will fail on someone else’s machine because you didn’t format this bit of metadata correctly.

So it might be worth softening that language to “should not” ^[1]. In the end perhaps this is point in favor of the comment-block approach, although that’s just defining a separate syntax instead of a hybrid one.

I don’t know what the right choice is here, just considering the options. Abstractly, I’d prefer it if “valid python w/ a valid TOML string in it” was all I needed to remember in terms of formatting.

I don’t know the exact meanings of must/should/may in the PEP context, though ↩︎

pf_moore · August 7, 2023, 6:20pm

The first time someone did __pyproject__ = f"""...""" you’d regret that leniency

Honestly, a lot of this is covered in PEP 722’s rejected alternative “Why not use (possibly restricted) Python syntax?” I’m not trying to say that PEP 723 is doomed because I was right all along, but I do think you should start by making sure you’ve got good arguments as to why the issues described in that section don’t apply to PEP 723. There’s a certain amount of subjectiveness to what I put in PEP 722, certainly, but I tried to be objective enough that a simple “I don’t agree” is likely to be missing something.

jamestwebber · August 7, 2023, 8:05pm

That was my abstract preference for the ideal, not what makes the most sense here. I was thinking more about restrictions on the text contents. I did mention earlier that the PEP should specify it must be a string literal. One might hope this would be obvious to people but it probably belongs in the PEP as a requirement. For instance, people will definitely try to concatenate strings together with + and not understand why it fails.

The reasons I view “valid python and valid TOML” as my preferred outcome are

I don’t need to learn new syntax, or some special-cases for the syntax I know. Those who don’t know TOML will at least learn a standard format
Parsers exist for both formats already and can be used without modification to add restrictions
Simple to transfer metadata between this format and pyproject.toml and vice versa

I can see how this makes other aspects trickier and I’m not saying it’s the only way. But I think these are useful features if they can be feasibly achieved.

I do think the question of python version is relevant to that section as well. A script that requires python version X will only successfully run if the running tool has access to version X ^[1]. Granted, it’s a little more convenient to not care up front, but on the other hand if you parse the script you can fail faster on incompatible syntax (rather than e.g. creating a venv and installing dependencies and then failing). So I’m not convinced that “needs the right python parser” is such a big obstacle, when in practice it will be a requirement.

I hope that reasoning is more objective than “I don’t agree”

especially if “install the right Python on demand” is out of scope here ↩︎

sinoroc · August 7, 2023, 8:19pm

Wouldn’t it be better for this proposal to reduce the scope to its minimum at first and then slowly expand in later PEPs if/when necessary? This is one of the things I like about PEP 722 which introduces the concept of generic metadata blocks but only specifies one type of metadata block for now. For example the proposal could say only project.dependencies is allowed. Once it is proven to work (with various implementations), we could then expand with the next most-asked feature(s). I am a bit worried, it is a bit too ambitious to want to allow (nearly) everything that pyproject.toml allows right from the start, that we will discover issues too late (after the PEP is approved) and will need to put band aids on the specification to fix them. The scope is quite large, it feels hard to make sure we thought about all the consequences and side effects.

Update: Ah, I just now read that PEP 722 is dropping the “metadata block” concept. This does not change my point of view for now.

jeanas · August 7, 2023, 8:27pm

What could possibly go wrong? What do we need to prove “works”? The author of a project and their email, the project license or the version(s) of Python it is compatible with are just inert metadata, they don’t care how they’re used (and the PEP intentionally considers it out of scope to specify whether or how tools use them). If there are bugs, it will be with the way tools use the metadata. I don’t see how the format itself would need any changes.

jamestwebber · August 7, 2023, 8:30pm

Brushing up on my lexical analysis I’m surprised to find that f-strings are still called a “string literal” even though it’s actually a complicated expression That’s not what I intended to suggest.

sinoroc · August 7, 2023, 8:46pm

You might be right. I guess maybe it is an unjustified worry from me that this might go wrong. I feel like maybe some users might get the wrong expectations about what will work out of the box. Users might put all kinds of fields in the embedded pyproject.toml and will be unhappy that they are not taken into account by the tools (for example because the tools have not added support for this specification yet or because the authors do not want to add support).

Maybe it would be good to expand the “How to Teach This” section and add something like “It should be clearly communicated to the users that it is up to the maintainers of each tool to explicitly add support for this specification. What is possible in pyproject.toml does not automatically become possible in the embedded variant”.

I feel like this is exactly the kind of message that can be confusing to users. We do not want users to file bug reports when they add fields to the embedded pyproject.toml and tools do not take them into account. Most likely this is not what you meant, but I am not sure everyone will understand the subtlety here.

pf_moore · August 7, 2023, 8:57pm

It might be, but I don’t really understand your point

The thing is, if I’m writing a tool to extract metadata from a script, I might write that tool for Python 3.9 (because it’s a general tool, and writing it for the lowest common denominator makes sense). I install the tool using pipx, which installs it in a virtualenv running Python 3.9.

Now I run that tool against a script that’s written to run on Python 3.11. It uses the match statement. How does my tool get access to a Python parser that understands the match statement? And if it can’t, how does it parse the metadata out of the script?

Solutions that say “just look for an assignment of a literal string to a variable” miss the point. If you don’t parse the full script, how do you know where that assignment is? There’s no incremental parser for Python that I know of, so you can’t just parse “up to the assignment” and ignore the rest. And if you extract it via a regex or something, you’re back to parsing a subset of Python.

Don’t get me wrong, it’s certainly possible to do something that’s good enough for most, if not all, practical purposes. The regex in PEP 723 is probably “good enough”, for example. But the difference between “good enough” and “complete” is where all the bugs and complaints lie

jeanas · August 7, 2023, 9:19pm

What does it mean that the metadata will not be taken into account by the tools?

When you’re building a project, you definitely want the metadata to end up in the sdists and wheels in the standard form. It’s part of the purpose of a build backend. If the build backend does not put all the metadata that it should into the sdists/wheels it generates, that’s a serious bug.

When you just want to run the project without building, that consideration doesn’t apply. That doesn’t mean the metadata is useless; there are still some use cases. Just to give a few examples:

If the dependency resolver finds contradictory dependency requirements, it should probably write <name from metadata> depends on foo >= 1.0 and bar == 0.8, but bar == 0.8 depends on foo == 0.5 rather than, say, tmp-unnamed-script-02739a433e07dac85329cffd592929de3bef7942 depends on foo >= 1.0 and bar == 0.8, but bar == 0.8 depends on foo == 0.5.
A tool for checking licensing compliance could support single-file scripts in addition to traditional projects, reading their metadata in the same format as traditional projects.
And of course, there is plenty of room for tools like Black, Ruff, Mypy, etc. reading useful config in their [tool.toolname] tables.

But, I cannot imagine lots of cases where the user could be surprised and treat it as a bug that a tool does not read metadata X. Maybe they’d be surprised if Black didn’t read an inline pyproject.toml, I don’t know (and let’s just wait for feedback from maintainers of popular tools on this PEP, as Ofek said he has contacted them). Maybe the user expects the script runner to fail (or download a different version of Python automatically) if the current Python version does not match the requires-python value; but then at least failing in the event of an incompatibility is something easy to implement that I would expect all script runners to support.

jamestwebber · August 7, 2023, 9:23pm

I guess I was thinking specifically of the “script launcher” tool, not other tools that might also use this data. If I have a script written for 3.11, a script launcher needs access to 3.11 to successfully launch it. So getting a 3.11 parser isn’t a huge obstacle?

The example you give makes sense, but it would also strike me as totally reasonable that a tool written for 3.9 can’t handle the syntax in 3.11 without changes. Using the same version I want to parse is a pretty simple upgrade, compared to some changes needed for new versions.

pf_moore · August 7, 2023, 9:30pm

It could be. pipx supports a --python option, meaning that pipx can be running under Python 3.9 and running a script under Python 3.11…

jeanas · August 7, 2023, 9:38pm

Remember that the step where the metadata is located and parsed is performance-sensitive, since the cost is paid at every run of the script (while the cost of creating the venv and installing packages is only paid every time the metadata actually changes). While pipx is written in Python, there are other tools that are not, such as the Python launcher for Windows (C), the Python launcher for Unix (Rust) and Rye (Rust), which would probably rather avoid having to pay the cost of spawning a second run of the Python interpreter.

sinoroc · August 7, 2023, 9:39pm

The hypothetical pipx running under Python 3.9 needs to be able to parse the __pyproject__ variable to be able to know that the script requires Python 3.12, but if the script uses Python 3.12 only syntax then pipx can not parse the script and can not read __pyproject__. And if instead of pipx the tool is for example the py launcher written in Rust (or rye or posy) then it also needs to understand Python 3.12 syntax, meaning updating the parser in the (Rust) tool even if the specification itself (PEP 723) has not changed at all. So that feels like we would be in a bit of a strange situation.

sinoroc · August 7, 2023, 9:55pm

I do not think I have mentioned packaging metadata. And indeed I do not have “packaging metadata” specifically in mind, since [build-system] is not allowed.

Yes, that is (mostly) what I had in mind. I argue that the specification should make it clear that it is up to the maintainer(s) of each tool to implement support for this specification or not. The specification should make it clear that we should teach that users of a tool that supports pyproject.toml must not expect that this tool also automatically support the embedded pyproject.toml unless told explicitly by the documentation of the tool. We want to avoid the “surprise” that turns into a bug report.

jamestwebber · August 8, 2023, 1:12am

Okay okay, don’t parse the AST I want to respond to a couple things and then I’ll wrap up.

In that scenario, it’s going to invoke the specified python’s own version of pip, yes? Furthermore, it makes sure pip is up to date. It seems like the obstacle for that working is pip having support for parsing embedded metadata. I certainly hope that happens, regardless of what version of this idea is eventually adopted.

A performance-sensitive tool should only parse the metadata if the file changes, not on every invocation. Outside of development (when the runner isn’t even needed, really) that should be very rare, I’d think?

For that matter there is at least one very fast AST parser out there, which will continue to be updated for new syntax as it is introduced. For tools that need the performance, it’s available without invoking python.

If the tool doesn’t know up-front what version of python to use, it’s going to fail when the script runs. It’s true that this PEP has support for requires-python but there isn’t a proposal to install a new binary as far as I can tell. So either way, this ends in an error, and the important thing is to provide a good clue to the cause (like “we found a syntaxerror, are you using the right python version?”)

I understand the concerns here, but I wanted to hear the various arguments in more detail because it didn’t feel like such an obstacle to me (mostly I was responding to Paul’s post about that section).

I’m satisfied that parsing the AST has enough thorns that it shouldn’t be required for this to work–I brought it up more because I didn’t find the arguments against it so convincing ^[1].

The main thing I was interested in was fixing the spec for __pyproject__ so that it doesn’t become a pseudo-TOML, and users can freely copy between real toml files and the metadata without worrying about breaking formatting. I hope that can be done with better parsing. A short regex is nice but it doesn’t have to be the actual answer.

I still don’t think it’s that big a deal but I’m happy to defer to the tool-writers about what they think is reasonable ↩︎

ofek · August 8, 2023, 4:48am

I think I have incorporated all of the feedback so far:

PR: PEP 723: update based on feedback by ofek · Pull Request #3279 · python/peps · GitHub
Preview: PEP 723 – Embedding pyproject.toml in single-file scripts | peps.python.org

For any new feedback unrelated to the changes please make that explicit so that I can address that here or in a follow-up PR.

Personally, I am quite content with how the document looks now and do not anticipate much changing other than potentially adding a section that documents what maintainers of various tools have said about this PEP (tomorrow Charlie will comment about Ruff).

@brettcannon, I’m certain that we will be able to meet your deadline of the 14th

pf_moore · August 8, 2023, 7:38am

pip is unlikely to ever get support for parsing embedded metadata itself as it isn’t a script runner. The only exception would be if a pip run command was added, and that would almost certainly act like all other pip commands and run the script using the version of python used to run pip, ignoring any requires-python metadata.

Also, the pip-run tool doesn’t use virtual environments, so it has no way of supporting python-requires either…

jamestwebber · August 8, 2023, 1:59pm

I was trying to be succinct but clearly I didn’t communicate what I was thinking at all.

My line of reasoning went like this: if you’re using a 3.9 version of pipx to run a 3.11 script with this metadata, you need to provide it an external python 3.11 to use ^[1]. Then pipx needs to do a few things:

figure out the dependencies
install them using the pip that is associated with the 3.11 python
run the script using that version

One way to accomplish 1 and 2 is for pipx itself to parse the dependencies and then call pip with the result. An alternative way to do that is if pip gained support for this metadata in the form of pip install --from-script-metadata my_script.py. That’s what I was getting at in the above comment. This sidesteps the syntax mismatch issue by always parsing with the correct version.

This would be a nice feature for another reason: someone who isn’t using pipx (or another script runner) can use that option to install the requirements directly. While the metadata has been designed for the “self-contained venv” use-case, it’s still plausible that people will want to install the script in an environment they’re using for other things.

I think that’d be a useful feature regardless of how the metadata is formatted and parsed. But of course the maintainers of pip ^[2] can make that decision later.

as I understand it from your description, at least ↩︎
whoever they are… ↩︎

pf_moore · August 8, 2023, 2:14pm

Ah, I see. But pipx doesn’t just install the dependencies, it reads them itself to determine if there’s a cached environment that matches which can be reused. That basically makes the “pip reads the data” approach a non-starter for pipx at least, and I imagine other runners will want to do a similar thing. It also means that error handling will be a mess - pip can’t know what pipx is trying to do, so the message won’t be ideal, and pipx definitely doesn’t want to start parsing pip’s messages.

I actually like pip install --from-script-dependencies better for PEP 722, where the data is clearly and explicitly only dependencies. There’s an ongoing debate about the semantics^[1] of pip install from a pyproject.toml, and making it install the dependency section for a script, but potentially something else for a project, is just going to be a confusing mess.

Edit: Although given that you can write a trivial wrapper as shown below, it seems a bit pointless to add a whole new option to pip for it.

import sys
import subprocess

# Insert the 25-line reference implementation of PEP 722 here

script = sys.argv[1]
pip_args = sys.argv[2:]

subprocess.run([
    sys.executable, "-m", "pip",
    "install"
] + pip_args + [str(req) for req in get_script_dependencies(script)]
)

around dynamic dependencies ↩︎