PEP 723: Embedding pyproject.toml in single-file scripts

It could be. pipx supports a --python option, meaning that pipx can be running under Python 3.9 and running a script under Python 3.11…

Remember that the step where the metadata is located and parsed is performance-sensitive, since the cost is paid at every run of the script (while the cost of creating the venv and installing packages is only paid every time the metadata actually changes). While pipx is written in Python, there are other tools that are not, such as the Python launcher for Windows (C), the Python launcher for Unix (Rust) and Rye (Rust), which would probably rather avoid having to pay the cost of spawning a second run of the Python interpreter.

The hypothetical pipx running under Python 3.9 needs to be able to parse the __pyproject__ variable to be able to know that the script requires Python 3.12, but if the script uses Python 3.12 only syntax then pipx can not parse the script and can not read __pyproject__. And if instead of pipx the tool is for example the py launcher written in Rust (or rye or posy) then it also needs to understand Python 3.12 syntax, meaning updating the parser in the (Rust) tool even if the specification itself (PEP 723) has not changed at all. So that feels like we would be in a bit of a strange situation.

I do not think I have mentioned packaging metadata. And indeed I do not have “packaging metadata” specifically in mind, since [build-system] is not allowed.

Yes, that is (mostly) what I had in mind. I argue that the specification should make it clear that it is up to the maintainer(s) of each tool to implement support for this specification or not. The specification should make it clear that we should teach that users of a tool that supports pyproject.toml must not expect that this tool also automatically support the embedded pyproject.toml unless told explicitly by the documentation of the tool. We want to avoid the “surprise” that turns into a bug report.

1 Like

Okay okay, don’t parse the AST :sweat_smile: I want to respond to a couple things and then I’ll wrap up.

In that scenario, it’s going to invoke the specified python’s own version of pip, yes? Furthermore, it makes sure pip is up to date. It seems like the obstacle for that working is pip having support for parsing embedded metadata. I certainly hope that happens, regardless of what version of this idea is eventually adopted.

A performance-sensitive tool should only parse the metadata if the file changes, not on every invocation. Outside of development (when the runner isn’t even needed, really) that should be very rare, I’d think?

For that matter there is at least one very fast AST parser out there, which will continue to be updated for new syntax as it is introduced. For tools that need the performance, it’s available without invoking python.

If the tool doesn’t know up-front what version of python to use, it’s going to fail when the script runs. It’s true that this PEP has support for requires-python but there isn’t a proposal to install a new binary as far as I can tell. So either way, this ends in an error, and the important thing is to provide a good clue to the cause (like “we found a syntaxerror, are you using the right python version?”)

I understand the concerns here, but I wanted to hear the various arguments in more detail because it didn’t feel like such an obstacle to me (mostly I was responding to Paul’s post about that section).

I’m satisfied that parsing the AST has enough thorns that it shouldn’t be required for this to work–I brought it up more because I didn’t find the arguments against it so convincing [1].

The main thing I was interested in was fixing the spec for __pyproject__ so that it doesn’t become a pseudo-TOML, and users can freely copy between real toml files and the metadata without worrying about breaking formatting. I hope that can be done with better parsing. A short regex is nice but it doesn’t have to be the actual answer.


  1. I still don’t think it’s that big a deal but I’m happy to defer to the tool-writers about what they think is reasonable ↩︎

1 Like

I think I have incorporated all of the feedback so far:

For any new feedback unrelated to the changes please make that explicit so that I can address that here or in a follow-up PR.

Personally, I am quite content with how the document looks now and do not anticipate much changing other than potentially adding a section that documents what maintainers of various tools have said about this PEP (tomorrow Charlie will comment about Ruff).

@brettcannon, I’m certain that we will be able to meet your deadline of the 14th :slightly_smiling_face:

3 Likes

pip is unlikely to ever get support for parsing embedded metadata itself as it isn’t a script runner. The only exception would be if a pip run command was added, and that would almost certainly act like all other pip commands and run the script using the version of python used to run pip, ignoring any requires-python metadata.

Also, the pip-run tool doesn’t use virtual environments, so it has no way of supporting python-requires either…

I was trying to be succinct but clearly I didn’t communicate what I was thinking at all.

My line of reasoning went like this: if you’re using a 3.9 version of pipx to run a 3.11 script with this metadata, you need to provide it an external python 3.11 to use [1]. Then pipx needs to do a few things:

  1. figure out the dependencies
  2. install them using the pip that is associated with the 3.11 python
  3. run the script using that version

One way to accomplish 1 and 2 is for pipx itself to parse the dependencies and then call pip with the result. An alternative way to do that is if pip gained support for this metadata in the form of pip install --from-script-metadata my_script.py. That’s what I was getting at in the above comment. This sidesteps the syntax mismatch issue by always parsing with the correct version.

This would be a nice feature for another reason: someone who isn’t using pipx (or another script runner) can use that option to install the requirements directly. While the metadata has been designed for the “self-contained venv” use-case, it’s still plausible that people will want to install the script in an environment they’re using for other things.

I think that’d be a useful feature regardless of how the metadata is formatted and parsed. But of course the maintainers of pip [2] can make that decision later.


  1. as I understand it from your description, at least ↩︎

  2. whoever they are… :sweat_smile: ↩︎

Ah, I see. But pipx doesn’t just install the dependencies, it reads them itself to determine if there’s a cached environment that matches which can be reused. That basically makes the “pip reads the data” approach a non-starter for pipx at least, and I imagine other runners will want to do a similar thing. It also means that error handling will be a mess - pip can’t know what pipx is trying to do, so the message won’t be ideal, and pipx definitely doesn’t want to start parsing pip’s messages.

I actually like pip install --from-script-dependencies better for PEP 722, where the data is clearly and explicitly only dependencies. There’s an ongoing debate about the semantics[1] of pip install from a pyproject.toml, and making it install the dependency section for a script, but potentially something else for a project, is just going to be a confusing mess.

Edit: Although given that you can write a trivial wrapper as shown below, it seems a bit pointless to add a whole new option to pip for it.

import sys
import subprocess

# Insert the 25-line reference implementation of PEP 722 here

script = sys.argv[1]
pip_args = sys.argv[2:]

subprocess.run([
    sys.executable, "-m", "pip",
    "install"
] + pip_args + [str(req) for req in get_script_dependencies(script)]
)

  1. around dynamic dependencies ↩︎

Speaking on behalf of Ruff, I think we can support any of the proposals described here and would plan to do so were the PEP to be accepted.

Tools serving purposes unrelated to packaging (such as linters or code formatters, but not build frontends) which accept both single files and directories, and which can be configured though the [tool.toolname] table of a pyproject.toml file when invoked on a directory, SHOULD also be configurable via [tool.toolname] in the __pyproject__ of a single file.

I honestly want this because it’s much easier to implement and will have less of an impact on performance (since we only need to check for __pyproject__ in very limited contexts), though I do think it sounds like a confusing user experience. (For example, not only would ruff . and ruff /path/to/file.py produce different results, but so would ruff . vs. pre-commit IIUC, since pre-commit passes individual files.). We can support either setup though.

The TOML document MUST NOT contain multi-line double-quoted strings, as that would conflict with the Python string containing the document. Single-quoted multi-line TOML strings may be used instead.

Broadly, from an implementation perspective, the stricter the spec, the better… For example, it would be nice to know that it has to triple-quoted (as in the PEP), but also, can’t contain an implicit concatenation (if we’re encouraged to use a regex), that there can’t be multiple __pyproject__ assignments (for correctness and removal of any ambiguity), etc. Again, purely from an implementation perspective, it’d be nice if it had to appear in the first N lines of the file, like the encoding pragma… but, that’s probably not a desirable user experience.

(Relatedly, how does this interact with __future__ imports? I’d assume you want this to be at the top of the file, before imports, by convention (as in the PEP examples), but you can’t have assignments before __future__ imports.)

Anyway, we ship with a full Python parser, so it’s actually not super important to me whether we’re required to parse the AST or not in order to extract the TOML string. It would be nice if we could do it with a regex, but again, not strictly required for us. (Either way, we’d likely do a fast plaintext search to see if __pyproject__ appears in the source, then do a slower search to actually extract it.)

P.S. Apologies if I’ve missed any of the nuance in the discussion, there was a lot to read through.

8 Likes

May I suggest that we reframe this PEP not as a proposal for “single-file scripts” but rather “single-file projects”?
I think that would clarify the difference of purpose and approach between this and PEP 722. The motivation behind PEP 722 appears to be very narrow (I hope that characterization is accurate). Scripts are a particularly strict subset of python programs, and it’s becoming clear there are other use-cases which motivate this PEP.

If the target use-case were only a single-file script with embedded dependencies, then build-system should be invalid to include in this context, as in the first draft.

I don’t think it’s unfair to say that PEP 722 is specified “for scripts” and considers all other use-cases either future considerations or out of scope. And that PEP 723 is “for projects” and considers any use-case for embedded metadata in a python file to be in-scope.

Is any of this inaccurate? Is there any reason not to retitle 723?

7 Likes

That’s a pretty good idea actually, nice

edit: I’m going to land the existing PR and then that will be I think my final change

1 Like

We can’t make that assumption due to the [tool] tables. Tools can ask to have whatever they want in there, so restricting what subset of TOML can be specified will have an impact. I’m not saying the PEP can’t restrict it, just that we can’t just brush it aside like there are no consequences.

I would personally be fine with a regex that defines exactly how to extract the TOML.

I personally had hopes, but I don’t know if I specifically had expectations. Plus I think it was Black that jumped ahead and used [tool] before we opened it to non-build tools (hence why we back-patched PEP 518 to allow it).

It’s because it is a string literal with a prefix modifier. Just because the representation in running code is different doesn’t change the fact that it’s a “literal” just like all dict and set constants are also called “literals”. And in the string case, think of it like having \x or even \n render differently than how you write the string literal.

That answers that response the way I was going to. :grin: Assume someone is going to have to write this in TypeScript. Now what does that do about your requirements?

Why do you assume pip will be used? We have to be thinking in decades, not today. That means you can’t assume pip will be the installer used as it might not be the most popular installer at that point. What if someone wrote an installer in Rust? Or I wrote a new installer that only followed standards (which I have actually been doing)?

:slightly_smiling_face: As I said in the PEP 722 thread, that target is aspirational, so if it slips it’s not the end of the world. I’m hoping both discussions settle down by the end of this week so it won’t have to slip, but it isn’t a problem if it does in order to make sure both PEPs reach a steady state with their proponents.

1 Like

I was discussing an existing pipx feature, so it seemed safe to refer to how it currently works. :sweat_smile:

In any case, the point was just that whatever installer was invoked, it would need to handle the relevant version of python–as long as installing a package might involve running some of its code, I would think that’ll continue to be true?

This seems to indicate to me that PEP 723 falls into the trap of adding another way to do things for an existing use case, since projects are of course already handled by existing tooling. PEP 722 on the other hand is about a so far unaddressed, very narrow use case. I’m not sure single-file projects are common enough to warrant this additional complexity.

5 Likes

Probably, but as @jeanas pointed out, that requires creating a subprocess just to parse this information which will add to the overhead and probably cause any environment caching to take a costly performance hit. And subprocesses are notoriously expensive on Windows, so the fewer subprocesses the better.

Plus you wouldn’t believe the crazy things I have seen people do to their Python installation (including customizing the build, stuff they do in their sitecustomize.py, etc.), all of which makes relying on the Python interpreter itself that you are executing to do work on your behalf always a slightly risky thing to do compared to just pointing that interpreter at some code and saying, “run it” (and luckily virtual environment creation doesn’t require executing Python code at all).

2 Likes

That did cause a bit of an incident in the community, though (and to be honest still causes nowadays). People that did not fully understand the consequences of adding a pyproject.toml file to their repositories started to experience problems with build isolation when they tried to configure black. A very similar thing happened recently with PEP 621 and requires-python: cbuildwhell docs accidentally incentivised people setting project.requires-python to influence the behaviour of cbuildwheel. But then people did not fully understood the consequences of opting into PEP 621 (and all the checks and restrictions it brings, specially regarding the dynamic behaviour), which caused a lot of breakages.

I think these are good anecdotes that illustrate that it is important to consider the unintended consequences of reusing a standard/file/format that was not originally designed for a given purpose.

4 Likes

As it is now PEP 723 contains:

Replaces: 722

Should there be a section dedicated to explaining how one can use PEP 723 to execute the exact same use case(s) as PEP 722? Sorry if it is already in the document and I missed it.

I guess the explanation as to why PEP 723 replaces PEP 722 is in there somewhere in the Rejected Ideas.

You can’t use PEP 723 to “execute” something, nor PEP 722 for that matter. Both only define a way to embed metadata in a single-file script. What tools do with that metadata is up to the tools.

PEP 722 only allows embedding dependencies, PEP 723 allows embedding richer metadata that includes dependencies among other things, so it should be clear that PEP 723 covers a superset of the use cases that PEP 722 covers.

I think the latest preview should address that PEP 723: Embedding pyproject.toml in single-file scripts - #38 by ofek

1 Like