PEP 723: Embedding pyproject.toml in single-file scripts

Speaking on behalf of Ruff, I think we can support any of the proposals described here and would plan to do so were the PEP to be accepted.

Tools serving purposes unrelated to packaging (such as linters or code formatters, but not build frontends) which accept both single files and directories, and which can be configured though the [tool.toolname] table of a pyproject.toml file when invoked on a directory, SHOULD also be configurable via [tool.toolname] in the __pyproject__ of a single file.

I honestly want this because it’s much easier to implement and will have less of an impact on performance (since we only need to check for __pyproject__ in very limited contexts), though I do think it sounds like a confusing user experience. (For example, not only would ruff . and ruff /path/to/file.py produce different results, but so would ruff . vs. pre-commit IIUC, since pre-commit passes individual files.). We can support either setup though.

The TOML document MUST NOT contain multi-line double-quoted strings, as that would conflict with the Python string containing the document. Single-quoted multi-line TOML strings may be used instead.

Broadly, from an implementation perspective, the stricter the spec, the better… For example, it would be nice to know that it has to triple-quoted (as in the PEP), but also, can’t contain an implicit concatenation (if we’re encouraged to use a regex), that there can’t be multiple __pyproject__ assignments (for correctness and removal of any ambiguity), etc. Again, purely from an implementation perspective, it’d be nice if it had to appear in the first N lines of the file, like the encoding pragma… but, that’s probably not a desirable user experience.

(Relatedly, how does this interact with __future__ imports? I’d assume you want this to be at the top of the file, before imports, by convention (as in the PEP examples), but you can’t have assignments before __future__ imports.)

Anyway, we ship with a full Python parser, so it’s actually not super important to me whether we’re required to parse the AST or not in order to extract the TOML string. It would be nice if we could do it with a regex, but again, not strictly required for us. (Either way, we’d likely do a fast plaintext search to see if __pyproject__ appears in the source, then do a slower search to actually extract it.)

P.S. Apologies if I’ve missed any of the nuance in the discussion, there was a lot to read through.

8 Likes

May I suggest that we reframe this PEP not as a proposal for “single-file scripts” but rather “single-file projects”?
I think that would clarify the difference of purpose and approach between this and PEP 722. The motivation behind PEP 722 appears to be very narrow (I hope that characterization is accurate). Scripts are a particularly strict subset of python programs, and it’s becoming clear there are other use-cases which motivate this PEP.

If the target use-case were only a single-file script with embedded dependencies, then build-system should be invalid to include in this context, as in the first draft.

I don’t think it’s unfair to say that PEP 722 is specified “for scripts” and considers all other use-cases either future considerations or out of scope. And that PEP 723 is “for projects” and considers any use-case for embedded metadata in a python file to be in-scope.

Is any of this inaccurate? Is there any reason not to retitle 723?

7 Likes

That’s a pretty good idea actually, nice

edit: I’m going to land the existing PR and then that will be I think my final change

1 Like

We can’t make that assumption due to the [tool] tables. Tools can ask to have whatever they want in there, so restricting what subset of TOML can be specified will have an impact. I’m not saying the PEP can’t restrict it, just that we can’t just brush it aside like there are no consequences.

I would personally be fine with a regex that defines exactly how to extract the TOML.

I personally had hopes, but I don’t know if I specifically had expectations. Plus I think it was Black that jumped ahead and used [tool] before we opened it to non-build tools (hence why we back-patched PEP 518 to allow it).

It’s because it is a string literal with a prefix modifier. Just because the representation in running code is different doesn’t change the fact that it’s a “literal” just like all dict and set constants are also called “literals”. And in the string case, think of it like having \x or even \n render differently than how you write the string literal.

That answers that response the way I was going to. :grin: Assume someone is going to have to write this in TypeScript. Now what does that do about your requirements?

Why do you assume pip will be used? We have to be thinking in decades, not today. That means you can’t assume pip will be the installer used as it might not be the most popular installer at that point. What if someone wrote an installer in Rust? Or I wrote a new installer that only followed standards (which I have actually been doing)?

:slightly_smiling_face: As I said in the PEP 722 thread, that target is aspirational, so if it slips it’s not the end of the world. I’m hoping both discussions settle down by the end of this week so it won’t have to slip, but it isn’t a problem if it does in order to make sure both PEPs reach a steady state with their proponents.

1 Like

I was discussing an existing pipx feature, so it seemed safe to refer to how it currently works. :sweat_smile:

In any case, the point was just that whatever installer was invoked, it would need to handle the relevant version of python–as long as installing a package might involve running some of its code, I would think that’ll continue to be true?

This seems to indicate to me that PEP 723 falls into the trap of adding another way to do things for an existing use case, since projects are of course already handled by existing tooling. PEP 722 on the other hand is about a so far unaddressed, very narrow use case. I’m not sure single-file projects are common enough to warrant this additional complexity.

5 Likes

Probably, but as @jeanas pointed out, that requires creating a subprocess just to parse this information which will add to the overhead and probably cause any environment caching to take a costly performance hit. And subprocesses are notoriously expensive on Windows, so the fewer subprocesses the better.

Plus you wouldn’t believe the crazy things I have seen people do to their Python installation (including customizing the build, stuff they do in their sitecustomize.py, etc.), all of which makes relying on the Python interpreter itself that you are executing to do work on your behalf always a slightly risky thing to do compared to just pointing that interpreter at some code and saying, “run it” (and luckily virtual environment creation doesn’t require executing Python code at all).

2 Likes

That did cause a bit of an incident in the community, though (and to be honest still causes nowadays). People that did not fully understand the consequences of adding a pyproject.toml file to their repositories started to experience problems with build isolation when they tried to configure black. A very similar thing happened recently with PEP 621 and requires-python: cbuildwhell docs accidentally incentivised people setting project.requires-python to influence the behaviour of cbuildwheel. But then people did not fully understood the consequences of opting into PEP 621 (and all the checks and restrictions it brings, specially regarding the dynamic behaviour), which caused a lot of breakages.

I think these are good anecdotes that illustrate that it is important to consider the unintended consequences of reusing a standard/file/format that was not originally designed for a given purpose.

4 Likes

As it is now PEP 723 contains:

Replaces: 722

Should there be a section dedicated to explaining how one can use PEP 723 to execute the exact same use case(s) as PEP 722? Sorry if it is already in the document and I missed it.

I guess the explanation as to why PEP 723 replaces PEP 722 is in there somewhere in the Rejected Ideas.

You can’t use PEP 723 to “execute” something, nor PEP 722 for that matter. Both only define a way to embed metadata in a single-file script. What tools do with that metadata is up to the tools.

PEP 722 only allows embedding dependencies, PEP 723 allows embedding richer metadata that includes dependencies among other things, so it should be clear that PEP 723 covers a superset of the use cases that PEP 722 covers.

I think the latest preview should address that PEP 723: Embedding pyproject.toml in single-file scripts - #38 by ofek

1 Like

I forgot to mention, but since other tools have chimed in I think I should on behalf of Hatch as well. Hatch will definitely implement this PEP should it be accepted and will begin experimenting with what building packages from scripts would look like. Additionally, the feature that I’ve been working on in my free time is actually Python management so it would be one such tool that would actually use the Python version requirement that users set to automatically set everything up for users’ scripts.

5 Likes

Right. A better choice of word could have been “cover the use case”, maybe? Sorry for the confusion.

+1 to this, but also a note:
It’s not just a matter of people’s preferences, but also of what the tooling ecosystem provides out of the box. I would bet that most pre-commit users are blind to some or all of the details of its invocation pattern.
Let’s not assume that users can make informed choices regarding use of the new metadata. More likely they’ll try things which work with one tool (e.g. ruff) and then be surprised when they don’t work with another tool or have different semantics.

pre-commit’s behavior also poses an issue in describing directory-oriented vs file-oriented behavior. Because filenames are always explicitly passed, there’s no such thing as directory-based behavior. I would be concerned that ruff src/ and pre-commit run -a could have different results, confusing users.

I think this needs some plan in order to minimize surprises.

However, I’m concerned that trying to specify anything in the PEP about precedence or precise tool behaviors will lead to ambiguities or outright conflicts with the ways that tools are implemented.

I would argue for the following approach, although I’m still scratching my head about exactly what’s right:

  • recommend (“SHOULD”) that tools warn or error if there is automatically discovered configuration (pyproject.toml/setup.cfg/toolnamerc) and per file configuration with no explicit option chosen
  • suggest (“MAY”) that tools provide configs or options for tuning that behavior (e.g. a config for “prefer_file_settings = true”)

The thought here is that as a tool author I want to know what I’m supposed to do, but the PEP should not assume any particular invocation pattern or knowledge of the end user. This sets up a situation in which tools can experiment and converge on sensible implementations, but if a user does something ambiguous, the PEP says “tell the user about it, don’t silently accept it with some built in behavior”.

I’m sure lots of users would ask for one behavior or the other as a default for any tool/use-case combination. And the PEP could always be revised later if some behavior becomes a de facto standard. But this approach starts by trying to forbid confusing mixtures of tool defaults altogether.

1 Like

The latest version of the PEP is up: PEP 723 – Inline script metadata | peps.python.org

I think that is a good compromise between guidance and tool choice, I will add that in the final PR later tonight.

3 Likes

Another potential parsing problem I thought of. If some code wants to use backslash-escapes in the embedded pyproject.toml, for instance

[project]
summary = "AA short description containing a \\ character"

then the regex given in the PEP will parse the following code correctly:

__pyproject__ = """
[project]
summary = "A short description containing a \\ character"
"""

However, at runtime, the __pyproject__ variable will contain an unescaped backslash, making it invalid TOML. And because the PEP doesn’t allow raw strings, this isn’t fixable by the user. Even if the PEP did allow raw strings, I imagine it would be a very common error for users to forget to add the r""". If the PEP required raw strings, I imagine an awful lot of bug reports from people who forget the r and don’t understand why their __pyproject__ is getting ignored (remember - a significant proportion of the target user base will have a relatively superficial knowledge of Python).

And if the PEP allowed tools to parse the assignment using the AST, the two approaches (regex and AST) would give different results. Given the way the PEP is worded, I expect tools that like the idea of using the AST might well do so anyway, treating the description of the regex as “canonical” as meaning it’s the reference implementation, not that it’s mandatory.

This all feels horribly like the sort of nested quoting nightmares that everyone loves to complain about with shell scripting. And I’m afraid I can’t think of an effective resolution, short of making unreasonable restrictions like “the TOML may not contain backslashes or backslash escapes” or “the __pyproject__ variable cannot be referenced at runtime” :slightly_frowning_face:

8 Likes

Also,

When there are multiple __pyproject__ variables defined, tools MUST produce an error.

How are tools meant to do this? What about

'''This script processes "embedded pyproject" files. We look for text of the form

__pyproject__ = """
... something here
"""

and extract it.
'''

__pyproject__ = """
[project]
dependencies = ['tomli; python_version < "3.10"']
"""

# etc.

Actually, how is a tool meant to parse this code correctly? You can’t do what PEP 722 does and say multiple blocks are allowed but you ignore the second and subsequent ones, because you can’t reorder the code to put an assignment before the docstring. You have to simply say that this isn’t allowed, which I guess is acceptable but not ideal.

Also, I hope I got the quoting of that requirement with a marker correct. I’m pretty sure I did, but I had to check PEP 508, which suggests that this might be another case where people find the (nested) quoting rules tricky and/or frustrating. Or you can assume I’m simply dumber than the average user over things like this - that may well be true :slightly_smiling_face:

1 Like

Rereading, the PEP is clear that using the regex is mandatory. So tools parsing using the AST are simply wrong. Apologies for misreading the spec. I don’t know if it’s worth trying to improve the wording - to be honest, few people will actually read the spec, so to an extent the real question here is whether tool writers might guess wrong about how to do the parsing. But we can’t prevent people from writing software with bugs, so I wouldn’t worry too much about this.

It’s a side issue anyway - the main point is that runtime behaviour is different, and that people will interpret the embedded pyproject based on their expectation of the runtime behaviour.

1 Like

I think this might not be an issue for most people not because they will know the syntax offhand, but because the change in difficulty isn’t noticable.

As a moderately experienced user, but not the maintainer of any packaging tools, I’m constantly checking docs and examples when I need non-trivial PEP 508 strings. The existence of some nonstandard extensions and other syntaxes (poetry) makes it hard to keep it all in my head at once.


At the very least, supporting raw strings seems like a good idea. Anything else leads to nightmare fuel.

I need to read the current proposal again to see what it says on the matter, but there’s a lot of benefit to this all going into a comment, rather than a special variable. Some of the confusing cases for that were addressed in the 722 discussion, and users can always call a PEP 723 parser if they want the data at runtime.

(As I was writing this, @sirosen got there first :slight_smile: )

At the risk of crossing the streams a bit: there isn’t much justification for using a python variable in the first place. It would be accessible at runtime, but it’s unclear what anyone would use that information for. I’m sure there are possibilities but I can’t think of them and the PEP doesn’t suggest any.

So an alternative is to put the TOML in a comment block similar to PEP 722. This would address many of the concerns @pf_moore raises above, but maybe it muddies the waters a bit.

Embedding TOML in a comment block seems very similar to the 722 proposal, it’s almost at the level of bikeshedding the format. Instead of # Script Dependencies you look for # [project] [1]. Instead of listing packages like it’s a requirements.txt file you have a list as in pyproject.toml–there’s more syntax but it’s a standardized format that can be reused.

# [project]
# dependencies = [
#     "requests",
#     "rich",        # Needed for the output
#     
#    # Not needed - just to show that fragments in URLs do not
#    # get treated as comments
#    "pip @ https://github.com/pypa/pip/archive/1.3.1.zip#sha1=da9234ee9982d4bbb3c72346a6de940a148ea686"
# ]

With any text editor or IDE that can handle block commenting, this is easy to copy/paste in and out of a pyproject.toml


  1. in this scenario I’d argue to just require this format rather than allow the various alternatives. maybe a higher-level marker would be useful if people want to use [tool] ↩︎

2 Likes