PEP 723: Embedding pyproject.toml in single-file scripts

Probably, but as @jeanas pointed out, that requires creating a subprocess just to parse this information which will add to the overhead and probably cause any environment caching to take a costly performance hit. And subprocesses are notoriously expensive on Windows, so the fewer subprocesses the better.

Plus you wouldn’t believe the crazy things I have seen people do to their Python installation (including customizing the build, stuff they do in their sitecustomize.py, etc.), all of which makes relying on the Python interpreter itself that you are executing to do work on your behalf always a slightly risky thing to do compared to just pointing that interpreter at some code and saying, “run it” (and luckily virtual environment creation doesn’t require executing Python code at all).

2 Likes

That did cause a bit of an incident in the community, though (and to be honest still causes nowadays). People that did not fully understand the consequences of adding a pyproject.toml file to their repositories started to experience problems with build isolation when they tried to configure black. A very similar thing happened recently with PEP 621 and requires-python: cbuildwhell docs accidentally incentivised people setting project.requires-python to influence the behaviour of cbuildwheel. But then people did not fully understood the consequences of opting into PEP 621 (and all the checks and restrictions it brings, specially regarding the dynamic behaviour), which caused a lot of breakages.

I think these are good anecdotes that illustrate that it is important to consider the unintended consequences of reusing a standard/file/format that was not originally designed for a given purpose.

4 Likes

As it is now PEP 723 contains:

Replaces: 722

Should there be a section dedicated to explaining how one can use PEP 723 to execute the exact same use case(s) as PEP 722? Sorry if it is already in the document and I missed it.

I guess the explanation as to why PEP 723 replaces PEP 722 is in there somewhere in the Rejected Ideas.

You can’t use PEP 723 to “execute” something, nor PEP 722 for that matter. Both only define a way to embed metadata in a single-file script. What tools do with that metadata is up to the tools.

PEP 722 only allows embedding dependencies, PEP 723 allows embedding richer metadata that includes dependencies among other things, so it should be clear that PEP 723 covers a superset of the use cases that PEP 722 covers.

I think the latest preview should address that PEP 723: Embedding pyproject.toml in single-file scripts - #38 by ofek

1 Like

I forgot to mention, but since other tools have chimed in I think I should on behalf of Hatch as well. Hatch will definitely implement this PEP should it be accepted and will begin experimenting with what building packages from scripts would look like. Additionally, the feature that I’ve been working on in my free time is actually Python management so it would be one such tool that would actually use the Python version requirement that users set to automatically set everything up for users’ scripts.

5 Likes

Right. A better choice of word could have been “cover the use case”, maybe? Sorry for the confusion.

+1 to this, but also a note:
It’s not just a matter of people’s preferences, but also of what the tooling ecosystem provides out of the box. I would bet that most pre-commit users are blind to some or all of the details of its invocation pattern.
Let’s not assume that users can make informed choices regarding use of the new metadata. More likely they’ll try things which work with one tool (e.g. ruff) and then be surprised when they don’t work with another tool or have different semantics.

pre-commit’s behavior also poses an issue in describing directory-oriented vs file-oriented behavior. Because filenames are always explicitly passed, there’s no such thing as directory-based behavior. I would be concerned that ruff src/ and pre-commit run -a could have different results, confusing users.

I think this needs some plan in order to minimize surprises.

However, I’m concerned that trying to specify anything in the PEP about precedence or precise tool behaviors will lead to ambiguities or outright conflicts with the ways that tools are implemented.

I would argue for the following approach, although I’m still scratching my head about exactly what’s right:

  • recommend (“SHOULD”) that tools warn or error if there is automatically discovered configuration (pyproject.toml/setup.cfg/toolnamerc) and per file configuration with no explicit option chosen
  • suggest (“MAY”) that tools provide configs or options for tuning that behavior (e.g. a config for “prefer_file_settings = true”)

The thought here is that as a tool author I want to know what I’m supposed to do, but the PEP should not assume any particular invocation pattern or knowledge of the end user. This sets up a situation in which tools can experiment and converge on sensible implementations, but if a user does something ambiguous, the PEP says “tell the user about it, don’t silently accept it with some built in behavior”.

I’m sure lots of users would ask for one behavior or the other as a default for any tool/use-case combination. And the PEP could always be revised later if some behavior becomes a de facto standard. But this approach starts by trying to forbid confusing mixtures of tool defaults altogether.

1 Like

The latest version of the PEP is up: PEP 723 – Embedding pyproject.toml in single-file scripts | peps.python.org

I think that is a good compromise between guidance and tool choice, I will add that in the final PR later tonight.

3 Likes

Another potential parsing problem I thought of. If some code wants to use backslash-escapes in the embedded pyproject.toml, for instance

[project]
summary = "AA short description containing a \\ character"

then the regex given in the PEP will parse the following code correctly:

__pyproject__ = """
[project]
summary = "A short description containing a \\ character"
"""

However, at runtime, the __pyproject__ variable will contain an unescaped backslash, making it invalid TOML. And because the PEP doesn’t allow raw strings, this isn’t fixable by the user. Even if the PEP did allow raw strings, I imagine it would be a very common error for users to forget to add the r""". If the PEP required raw strings, I imagine an awful lot of bug reports from people who forget the r and don’t understand why their __pyproject__ is getting ignored (remember - a significant proportion of the target user base will have a relatively superficial knowledge of Python).

And if the PEP allowed tools to parse the assignment using the AST, the two approaches (regex and AST) would give different results. Given the way the PEP is worded, I expect tools that like the idea of using the AST might well do so anyway, treating the description of the regex as “canonical” as meaning it’s the reference implementation, not that it’s mandatory.

This all feels horribly like the sort of nested quoting nightmares that everyone loves to complain about with shell scripting. And I’m afraid I can’t think of an effective resolution, short of making unreasonable restrictions like “the TOML may not contain backslashes or backslash escapes” or “the __pyproject__ variable cannot be referenced at runtime” :slightly_frowning_face:

8 Likes

Also,

When there are multiple __pyproject__ variables defined, tools MUST produce an error.

How are tools meant to do this? What about

'''This script processes "embedded pyproject" files. We look for text of the form

__pyproject__ = """
... something here
"""

and extract it.
'''

__pyproject__ = """
[project]
dependencies = ['tomli; python_version < "3.10"']
"""

# etc.

Actually, how is a tool meant to parse this code correctly? You can’t do what PEP 722 does and say multiple blocks are allowed but you ignore the second and subsequent ones, because you can’t reorder the code to put an assignment before the docstring. You have to simply say that this isn’t allowed, which I guess is acceptable but not ideal.

Also, I hope I got the quoting of that requirement with a marker correct. I’m pretty sure I did, but I had to check PEP 508, which suggests that this might be another case where people find the (nested) quoting rules tricky and/or frustrating. Or you can assume I’m simply dumber than the average user over things like this - that may well be true :slightly_smiling_face:

1 Like

Rereading, the PEP is clear that using the regex is mandatory. So tools parsing using the AST are simply wrong. Apologies for misreading the spec. I don’t know if it’s worth trying to improve the wording - to be honest, few people will actually read the spec, so to an extent the real question here is whether tool writers might guess wrong about how to do the parsing. But we can’t prevent people from writing software with bugs, so I wouldn’t worry too much about this.

It’s a side issue anyway - the main point is that runtime behaviour is different, and that people will interpret the embedded pyproject based on their expectation of the runtime behaviour.

1 Like

I think this might not be an issue for most people not because they will know the syntax offhand, but because the change in difficulty isn’t noticable.

As a moderately experienced user, but not the maintainer of any packaging tools, I’m constantly checking docs and examples when I need non-trivial PEP 508 strings. The existence of some nonstandard extensions and other syntaxes (poetry) makes it hard to keep it all in my head at once.


At the very least, supporting raw strings seems like a good idea. Anything else leads to nightmare fuel.

I need to read the current proposal again to see what it says on the matter, but there’s a lot of benefit to this all going into a comment, rather than a special variable. Some of the confusing cases for that were addressed in the 722 discussion, and users can always call a PEP 723 parser if they want the data at runtime.

(As I was writing this, @sirosen got there first :slight_smile: )

At the risk of crossing the streams a bit: there isn’t much justification for using a python variable in the first place. It would be accessible at runtime, but it’s unclear what anyone would use that information for. I’m sure there are possibilities but I can’t think of them and the PEP doesn’t suggest any.

So an alternative is to put the TOML in a comment block similar to PEP 722. This would address many of the concerns @pf_moore raises above, but maybe it muddies the waters a bit.

Embedding TOML in a comment block seems very similar to the 722 proposal, it’s almost at the level of bikeshedding the format. Instead of # Script Dependencies you look for # [project] [1]. Instead of listing packages like it’s a requirements.txt file you have a list as in pyproject.toml–there’s more syntax but it’s a standardized format that can be reused.

# [project]
# dependencies = [
#     "requests",
#     "rich",        # Needed for the output
#     
#    # Not needed - just to show that fragments in URLs do not
#    # get treated as comments
#    "pip @ https://github.com/pypa/pip/archive/1.3.1.zip#sha1=da9234ee9982d4bbb3c72346a6de940a148ea686"
# ]

With any text editor or IDE that can handle block commenting, this is easy to copy/paste in and out of a pyproject.toml


  1. in this scenario I’d argue to just require this format rather than allow the various alternatives. maybe a higher-level marker would be useful if people want to use [tool] ↩︎

2 Likes

I didn’t get a chance to open a PR last night but when I do today I will also add support for raw strings (r"""...""") in the regular expression. I will also add an explicit “why not” section about storing metadata in a comment. Before I do that however please note that I talk about that in the latest version:

The concept of regular comments that do not appear to be intended for machines (i.e. encoding declarations) affecting behavior would not be customary to users of Python and goes directly against the “explicit is better than implicit” foundational principle.

Users typing what to them looks like prose could alter runtime behavior. This PEP takes the view that the possibility of that happening, even when a tool has been set up as such (maybe by a sysadmin), is unfriendly to users.

If that is still unclear than I can add the section as I mentioned and reiterate that point

I guess “appear to be intended for machines” is in the eye of the beholder to some extent. I don’t think a TOML comment looks like prose, but maybe I’m an aesthete. :sweat_smile:

But obviously it’s your PEP. If you don’t want to propose such a thing that’s totally fine.

1 Like

I think we have different takes on what novice user expectations would be. (Or maybe sysv init scripts have melted my brain too much!)

In particular, I don’t think a magical dunder variable is in any way more explicit than a magical comment. For a novice user, this, __name__ == "__main__", and all manner of other things are unintuitive. But they’re learned by example first and foremost, and most users don’t worry about how the magic works until they’ve already tried using it.

There may be a weak link between dunder vars and a vague sense of “here be the magic!” But I’m not personally super-duper convinced. It’s hard for me to imagine anyone opening a file and seeing a big block of toml in a comment (with some leading delimiter, like Rust is doing) and thinking that it’s prose rather than data.

1 Like

That wording is quite strong – shebangs and encoding declarations are well established standards!

(And regular reminder that PEP 20 is a serious joke musing about language design for Python, not a set of laws that must bind every decision ever in the whole ecosystem)

3 Likes

That is fair, this is a distinctly different idea and by prose I was indeed referring to 722 and “Script Dependencies”.

If we go with the comment approach we couldn’t have a leading delimiter for the same reason that 722 reverted that idea, so I’ll have to think about this during lunch

1 Like

FWIW, I think “is the metadata put in machine-readable comments or in a special variable” and “does the metadata have the form of a TOML string with the structure defined by PEP 621 or a simple newline-separated list of dependencies” are orthogonal questions.

In other words, it would make sense for @brettcannon to choose PEP 722 but request that it be changed to a special variable, or choose PEP 723 but request that it be changed to comments. I don’t know if such situations have happened in the past and how it was dealt with them on the process level.

(With that being said, I personally view the choice of metadata format as more important than how it’s embedded.)

5 Likes