PEP 723: Embedding pyproject.toml in single-file scripts

Okay okay, don’t parse the AST :sweat_smile: I want to respond to a couple things and then I’ll wrap up.

In that scenario, it’s going to invoke the specified python’s own version of pip, yes? Furthermore, it makes sure pip is up to date. It seems like the obstacle for that working is pip having support for parsing embedded metadata. I certainly hope that happens, regardless of what version of this idea is eventually adopted.

A performance-sensitive tool should only parse the metadata if the file changes, not on every invocation. Outside of development (when the runner isn’t even needed, really) that should be very rare, I’d think?

For that matter there is at least one very fast AST parser out there, which will continue to be updated for new syntax as it is introduced. For tools that need the performance, it’s available without invoking python.

If the tool doesn’t know up-front what version of python to use, it’s going to fail when the script runs. It’s true that this PEP has support for requires-python but there isn’t a proposal to install a new binary as far as I can tell. So either way, this ends in an error, and the important thing is to provide a good clue to the cause (like “we found a syntaxerror, are you using the right python version?”)

I understand the concerns here, but I wanted to hear the various arguments in more detail because it didn’t feel like such an obstacle to me (mostly I was responding to Paul’s post about that section).

I’m satisfied that parsing the AST has enough thorns that it shouldn’t be required for this to work–I brought it up more because I didn’t find the arguments against it so convincing [1].

The main thing I was interested in was fixing the spec for __pyproject__ so that it doesn’t become a pseudo-TOML, and users can freely copy between real toml files and the metadata without worrying about breaking formatting. I hope that can be done with better parsing. A short regex is nice but it doesn’t have to be the actual answer.


  1. I still don’t think it’s that big a deal but I’m happy to defer to the tool-writers about what they think is reasonable ↩︎

1 Like

I think I have incorporated all of the feedback so far:

For any new feedback unrelated to the changes please make that explicit so that I can address that here or in a follow-up PR.

Personally, I am quite content with how the document looks now and do not anticipate much changing other than potentially adding a section that documents what maintainers of various tools have said about this PEP (tomorrow Charlie will comment about Ruff).

@brettcannon, I’m certain that we will be able to meet your deadline of the 14th :slightly_smiling_face:

3 Likes

pip is unlikely to ever get support for parsing embedded metadata itself as it isn’t a script runner. The only exception would be if a pip run command was added, and that would almost certainly act like all other pip commands and run the script using the version of python used to run pip, ignoring any requires-python metadata.

Also, the pip-run tool doesn’t use virtual environments, so it has no way of supporting python-requires either…

I was trying to be succinct but clearly I didn’t communicate what I was thinking at all.

My line of reasoning went like this: if you’re using a 3.9 version of pipx to run a 3.11 script with this metadata, you need to provide it an external python 3.11 to use [1]. Then pipx needs to do a few things:

  1. figure out the dependencies
  2. install them using the pip that is associated with the 3.11 python
  3. run the script using that version

One way to accomplish 1 and 2 is for pipx itself to parse the dependencies and then call pip with the result. An alternative way to do that is if pip gained support for this metadata in the form of pip install --from-script-metadata my_script.py. That’s what I was getting at in the above comment. This sidesteps the syntax mismatch issue by always parsing with the correct version.

This would be a nice feature for another reason: someone who isn’t using pipx (or another script runner) can use that option to install the requirements directly. While the metadata has been designed for the “self-contained venv” use-case, it’s still plausible that people will want to install the script in an environment they’re using for other things.

I think that’d be a useful feature regardless of how the metadata is formatted and parsed. But of course the maintainers of pip [2] can make that decision later.


  1. as I understand it from your description, at least ↩︎

  2. whoever they are… :sweat_smile: ↩︎

Ah, I see. But pipx doesn’t just install the dependencies, it reads them itself to determine if there’s a cached environment that matches which can be reused. That basically makes the “pip reads the data” approach a non-starter for pipx at least, and I imagine other runners will want to do a similar thing. It also means that error handling will be a mess - pip can’t know what pipx is trying to do, so the message won’t be ideal, and pipx definitely doesn’t want to start parsing pip’s messages.

I actually like pip install --from-script-dependencies better for PEP 722, where the data is clearly and explicitly only dependencies. There’s an ongoing debate about the semantics[1] of pip install from a pyproject.toml, and making it install the dependency section for a script, but potentially something else for a project, is just going to be a confusing mess.

Edit: Although given that you can write a trivial wrapper as shown below, it seems a bit pointless to add a whole new option to pip for it.

import sys
import subprocess

# Insert the 25-line reference implementation of PEP 722 here

script = sys.argv[1]
pip_args = sys.argv[2:]

subprocess.run([
    sys.executable, "-m", "pip",
    "install"
] + pip_args + [str(req) for req in get_script_dependencies(script)]
)

  1. around dynamic dependencies ↩︎

Speaking on behalf of Ruff, I think we can support any of the proposals described here and would plan to do so were the PEP to be accepted.

Tools serving purposes unrelated to packaging (such as linters or code formatters, but not build frontends) which accept both single files and directories, and which can be configured though the [tool.toolname] table of a pyproject.toml file when invoked on a directory, SHOULD also be configurable via [tool.toolname] in the __pyproject__ of a single file.

I honestly want this because it’s much easier to implement and will have less of an impact on performance (since we only need to check for __pyproject__ in very limited contexts), though I do think it sounds like a confusing user experience. (For example, not only would ruff . and ruff /path/to/file.py produce different results, but so would ruff . vs. pre-commit IIUC, since pre-commit passes individual files.). We can support either setup though.

The TOML document MUST NOT contain multi-line double-quoted strings, as that would conflict with the Python string containing the document. Single-quoted multi-line TOML strings may be used instead.

Broadly, from an implementation perspective, the stricter the spec, the better… For example, it would be nice to know that it has to triple-quoted (as in the PEP), but also, can’t contain an implicit concatenation (if we’re encouraged to use a regex), that there can’t be multiple __pyproject__ assignments (for correctness and removal of any ambiguity), etc. Again, purely from an implementation perspective, it’d be nice if it had to appear in the first N lines of the file, like the encoding pragma… but, that’s probably not a desirable user experience.

(Relatedly, how does this interact with __future__ imports? I’d assume you want this to be at the top of the file, before imports, by convention (as in the PEP examples), but you can’t have assignments before __future__ imports.)

Anyway, we ship with a full Python parser, so it’s actually not super important to me whether we’re required to parse the AST or not in order to extract the TOML string. It would be nice if we could do it with a regex, but again, not strictly required for us. (Either way, we’d likely do a fast plaintext search to see if __pyproject__ appears in the source, then do a slower search to actually extract it.)

P.S. Apologies if I’ve missed any of the nuance in the discussion, there was a lot to read through.

8 Likes

May I suggest that we reframe this PEP not as a proposal for “single-file scripts” but rather “single-file projects”?
I think that would clarify the difference of purpose and approach between this and PEP 722. The motivation behind PEP 722 appears to be very narrow (I hope that characterization is accurate). Scripts are a particularly strict subset of python programs, and it’s becoming clear there are other use-cases which motivate this PEP.

If the target use-case were only a single-file script with embedded dependencies, then build-system should be invalid to include in this context, as in the first draft.

I don’t think it’s unfair to say that PEP 722 is specified “for scripts” and considers all other use-cases either future considerations or out of scope. And that PEP 723 is “for projects” and considers any use-case for embedded metadata in a python file to be in-scope.

Is any of this inaccurate? Is there any reason not to retitle 723?

7 Likes

That’s a pretty good idea actually, nice

edit: I’m going to land the existing PR and then that will be I think my final change

1 Like

We can’t make that assumption due to the [tool] tables. Tools can ask to have whatever they want in there, so restricting what subset of TOML can be specified will have an impact. I’m not saying the PEP can’t restrict it, just that we can’t just brush it aside like there are no consequences.

I would personally be fine with a regex that defines exactly how to extract the TOML.

I personally had hopes, but I don’t know if I specifically had expectations. Plus I think it was Black that jumped ahead and used [tool] before we opened it to non-build tools (hence why we back-patched PEP 518 to allow it).

It’s because it is a string literal with a prefix modifier. Just because the representation in running code is different doesn’t change the fact that it’s a “literal” just like all dict and set constants are also called “literals”. And in the string case, think of it like having \x or even \n render differently than how you write the string literal.

That answers that response the way I was going to. :grin: Assume someone is going to have to write this in TypeScript. Now what does that do about your requirements?

Why do you assume pip will be used? We have to be thinking in decades, not today. That means you can’t assume pip will be the installer used as it might not be the most popular installer at that point. What if someone wrote an installer in Rust? Or I wrote a new installer that only followed standards (which I have actually been doing)?

:slightly_smiling_face: As I said in the PEP 722 thread, that target is aspirational, so if it slips it’s not the end of the world. I’m hoping both discussions settle down by the end of this week so it won’t have to slip, but it isn’t a problem if it does in order to make sure both PEPs reach a steady state with their proponents.

1 Like

I was discussing an existing pipx feature, so it seemed safe to refer to how it currently works. :sweat_smile:

In any case, the point was just that whatever installer was invoked, it would need to handle the relevant version of python–as long as installing a package might involve running some of its code, I would think that’ll continue to be true?

This seems to indicate to me that PEP 723 falls into the trap of adding another way to do things for an existing use case, since projects are of course already handled by existing tooling. PEP 722 on the other hand is about a so far unaddressed, very narrow use case. I’m not sure single-file projects are common enough to warrant this additional complexity.

5 Likes

Probably, but as @jeanas pointed out, that requires creating a subprocess just to parse this information which will add to the overhead and probably cause any environment caching to take a costly performance hit. And subprocesses are notoriously expensive on Windows, so the fewer subprocesses the better.

Plus you wouldn’t believe the crazy things I have seen people do to their Python installation (including customizing the build, stuff they do in their sitecustomize.py, etc.), all of which makes relying on the Python interpreter itself that you are executing to do work on your behalf always a slightly risky thing to do compared to just pointing that interpreter at some code and saying, “run it” (and luckily virtual environment creation doesn’t require executing Python code at all).

2 Likes

That did cause a bit of an incident in the community, though (and to be honest still causes nowadays). People that did not fully understand the consequences of adding a pyproject.toml file to their repositories started to experience problems with build isolation when they tried to configure black. A very similar thing happened recently with PEP 621 and requires-python: cbuildwhell docs accidentally incentivised people setting project.requires-python to influence the behaviour of cbuildwheel. But then people did not fully understood the consequences of opting into PEP 621 (and all the checks and restrictions it brings, specially regarding the dynamic behaviour), which caused a lot of breakages.

I think these are good anecdotes that illustrate that it is important to consider the unintended consequences of reusing a standard/file/format that was not originally designed for a given purpose.

4 Likes

As it is now PEP 723 contains:

Replaces: 722

Should there be a section dedicated to explaining how one can use PEP 723 to execute the exact same use case(s) as PEP 722? Sorry if it is already in the document and I missed it.

I guess the explanation as to why PEP 723 replaces PEP 722 is in there somewhere in the Rejected Ideas.

You can’t use PEP 723 to “execute” something, nor PEP 722 for that matter. Both only define a way to embed metadata in a single-file script. What tools do with that metadata is up to the tools.

PEP 722 only allows embedding dependencies, PEP 723 allows embedding richer metadata that includes dependencies among other things, so it should be clear that PEP 723 covers a superset of the use cases that PEP 722 covers.

I think the latest preview should address that PEP 723: Embedding pyproject.toml in single-file scripts - #38 by ofek

1 Like

I forgot to mention, but since other tools have chimed in I think I should on behalf of Hatch as well. Hatch will definitely implement this PEP should it be accepted and will begin experimenting with what building packages from scripts would look like. Additionally, the feature that I’ve been working on in my free time is actually Python management so it would be one such tool that would actually use the Python version requirement that users set to automatically set everything up for users’ scripts.

5 Likes

Right. A better choice of word could have been “cover the use case”, maybe? Sorry for the confusion.

+1 to this, but also a note:
It’s not just a matter of people’s preferences, but also of what the tooling ecosystem provides out of the box. I would bet that most pre-commit users are blind to some or all of the details of its invocation pattern.
Let’s not assume that users can make informed choices regarding use of the new metadata. More likely they’ll try things which work with one tool (e.g. ruff) and then be surprised when they don’t work with another tool or have different semantics.

pre-commit’s behavior also poses an issue in describing directory-oriented vs file-oriented behavior. Because filenames are always explicitly passed, there’s no such thing as directory-based behavior. I would be concerned that ruff src/ and pre-commit run -a could have different results, confusing users.

I think this needs some plan in order to minimize surprises.

However, I’m concerned that trying to specify anything in the PEP about precedence or precise tool behaviors will lead to ambiguities or outright conflicts with the ways that tools are implemented.

I would argue for the following approach, although I’m still scratching my head about exactly what’s right:

  • recommend (“SHOULD”) that tools warn or error if there is automatically discovered configuration (pyproject.toml/setup.cfg/toolnamerc) and per file configuration with no explicit option chosen
  • suggest (“MAY”) that tools provide configs or options for tuning that behavior (e.g. a config for “prefer_file_settings = true”)

The thought here is that as a tool author I want to know what I’m supposed to do, but the PEP should not assume any particular invocation pattern or knowledge of the end user. This sets up a situation in which tools can experiment and converge on sensible implementations, but if a user does something ambiguous, the PEP says “tell the user about it, don’t silently accept it with some built in behavior”.

I’m sure lots of users would ask for one behavior or the other as a default for any tool/use-case combination. And the PEP could always be revised later if some behavior becomes a de facto standard. But this approach starts by trying to forbid confusing mixtures of tool defaults altogether.

1 Like

The latest version of the PEP is up: PEP 723 – Embedding pyproject.toml in single-file scripts | peps.python.org

I think that is a good compromise between guidance and tool choice, I will add that in the final PR later tonight.

3 Likes