PEP 723: Embedding pyproject.toml in single-file scripts

In your specific plugin system use case, I guess you could just let the tool add [build-system] while converting to a source tree, just like you might already have to add name and version?

If you wanted to experiment with building PEP 723 style scripts and you wanted the build backend to be specifiable, you could just use a [tool.$tool] table where you put the same information. [1]

Edit: I think the advantage of this point of the PEP (“MUST NOT contain a build-system table”) is that there is a strong expectation that metadata outside the tool table is standardized. That seems like a useful aspect to keep.


  1. I doubt that would provide much value though, because there aren’t 100 ways to build a single-file project. The major build backends for pure Python code differ in things like how they specify included/ignored files, or whether they allow plugins. Ok, the latter could be useful in the abstract, but I can’t imagine a use case where there is also a strong motivation to keep the project single-file. ↩︎

But this does make me wonder - is the following valid?

__pyproject__ = """
[project]
""" \
"""
dependencies = ["requests"]
"""

And if not, then where precisely in the PEP does it say it’s not allowed? Yes, I know nobody should ever do this. It’s a pathological edge case. But every time we’ve accepted a PEP on the basis that “people should be sensible” we’ve had problems, because someone hasn’t been sensible.

My point here is simply that the spec as written isn’t sufficiently precise. It’s not a fatal problem, and in fact it could be fixed by the simple expedient of declaring the regex given in the PEP as the formal definition of how to parse the script for the data. (Well, no, for example it still needs to state what happens if there are two valid assignments to __pyproject__, but that’s a separate issue…) But it does need to be fixed if the PEP is to be usable as a specification (IMO).

7 Likes

Another comment:

Non-script running tools MAY choose to read from their expected [tool] sub-table.

I wonder if it would make sense to add language like:

“Tools serving purposes unrelated to packaging (such as linters or code formatters, but not build frontends) which accept both single files and directories, and which can be configured though the [tool.toolname] table of a pyproject.toml file when invoked on a directory, SHOULD also be configurable via [tool.toolname] in the __pyproject__ of a single file.”

Basically strengthening MAY to SHOULD, excluding build frontends.

Otherwise, it’s not fully clear from the PEP whether Black, Mypy, Ruff, Pylint, etc., are officially encouraged to read inline __pyproject__ config (I personally think they should be), or whether it’s just an option at their discretion.

Has anyone asked any of those tools whether they support this idea? It seems rather important to make sure that the intended users of this option are interested. As well as asking them how they would feel about the question of “only use the embedded data if you’re looking at just the file containing it and not if it’s part of a directory”. If I were a tool maker, I’d be very reluctant to support something like that, so assuming it’s fine just because the PEP says so seems optimistic at best.

3 Likes

Yes, maintainers will respond soon!

2 Likes

Speaking on behalf of Pex and Pantsbuild, we will support whatever decision is accepted, and neither seem technically infeasible.

This includes if later on, additional facilities are introduced in top of these (e.g. other metadata to treat single scripts as packages, or tool configs embedded in the toml, or replacing the dependencies with a locked set)

4 Likes

(as a maintainer of mypy)

For mypy, I’d be supportive of using embedded metadata to configure mypy for a single script. I think this would create a more consistent configuration experience for mypy. The set of CLI options or inline comments is not as expressive as what you can do with a config file (and you can’t check CLI options into the same file).

Like others, I was surprised by the “only use the embedded data if you’re looking at just the file containing it and not if it’s part of a directory”. This would solve some issues for mypy (there are some things you can only configure globally in a single mypy invocation), but will certainly cause surprises for users and limits the applicability to just a single script case. On net, I think this prescription is probably undesirable — in the code quality ecosystem, people seem to really like having a single invocation of a tool running on all the things, and integrations like pre-commit expect this.

I definitely like having more structured per-file configuration though. The current state of the art is special comments and I think this could work better. It’s often not expressive enough. It’s not special enough to avoid common mistakes, for instance, if you accidentally delete some code and end up with a module level # type: ignore, you’re going to be unhappy. Here’s a recent example of a similar issue in ruff: Ruff v0.0.281

(as a maintainer of Black)

I’m new to maintaining Black and Black is a project that actively dissuades its users from configuration, so I need to think more before I say anything.

(general comments)

Overall, I was surprised by how much people like putting tool-specific configuration in pyproject.toml. Reducing the number of configuration dialects and extra files seems to have been valuable; I’m not sure the authors of PEP 518 anticipated this. PEP 518 prescribes almost nothing, and certainly did not coordinate with non build tools in the manner we’re doing now, but it created a Schelling point and the community ran with it.

Since I already prefer the embedded TOML format for the core dependencies use case, I think it’s a bonus that PEP 723 allows for further serendipity in this space.

12 Likes

Thanks for writing this draft so quickly, I like it! My preference is still somewhat for a block comment, as I thought it would be easier for tools to store their lock files that way a well, but I understand the motivation against it. Tools that want to store their lock file could support doing that in the same pyproject.toml as well although they would lose a level of nesting.

The risk here is part of the functionality of the tool being used to run the script, and as such should already be addressed by the tool itself. The only additional risk introduced by this PEP is if an untrusted script with a embedded metadata is run, when a potentially malicious dependency might be installed. This risk is addressed by the normal good practice of reviewing code before running it.

It may be worth mentioning here that further locking could be done by specific tools that would additional (optional) metadata.

Two other pyproject.toml fields that might need a rethink for the embedding case are readme and license, both of which (in the existing case) refer to external files. If scripts choose to store this data, they will almost certainly want it to be embedded in the script (for all the same reasons they don’t want a separate pyproject.toml file).

How will this proposal support embedded readme and license data?

readme = {text = "Foo bar baz read me.", content-type = "text/markdown"} and license = {text = "BSD 3-clause license"} are valid per the spec.

Ah, I’d missed that (I did check, honest!)

Even so, readme text is often substantial, and I’d imagine people would either want to reference the script docstring, or use TOML triple-quoted strings, to include substantial blocks of text.

Licenses (if present) are typically added as a big chunk of boilerplate comment - especially if this is some sort of corporate environment (“All rights reserved, you can’t use this for other than the stated purpose without permission, …”) I don’t imagine a legal department would be too happy with summarising that as a one-liner, and even reformatting as anything other than a comment block might be problematic.

Certainly in the environments I worked in, I’d be very wary of adding a license like this.

1 Like

They can use single-quoted string literals, at least. If their README is getting so complicated that they want to nest multiline strings inside it, or do something else that isn’t allowed in a string literal, that is probably a sign that they shouldn’t be trying to keep everything in one file.

More broadly on that point, it’s probably good if any solution for single-file metadata has some idea of when it’s potentially harmful to use it. It would be a shame if enabling this reduced the use of real packaging tools in favor of ten-thousand-line monstrosities.

This seems like something that’s totally up to the user, though, as it is now with pyproject.toml.

For many single-file scripts I’ve seen, the license is embedded in a comment at the top. One could still do that, and then use the OSI shorthand in the toml.

3 Likes

+1 to @jamestwebber’s comment here, I think that’s… just fine? Some of the metadata fields are more useful for single-file scripts than others, but that’s not a real problem.

license for example might not be suitable if you have a very restrictive corporate environment, but for a GitHub gist where I really don’t want to bother with licensing, I can imagine license = {text = "Unlicense"} being handy.

On the other hand, this does make me wonder if [project] fields that genuinely make no sense without a build backend should be disallowed, if only for consistency with how the PEP disallows the [build-system] table. The ones I’m looking at are entry-points, scripts and gui-scripts.

Thinking about the triple-quoting thing more (in the context of “very long readmes” and whatnot), I don’t know if it’s important to forbid this. There’s been a lot of discussion about making the format simple to parse, in part because it avoids parsing the AST and avoids versioning issues. But parsing the file is a totally plausible way to extract this info and some tools will do it–and having access to the correct python version is going to be an issue anyway, if you want to run the script. And tools like ruff exist which parse Python in another language (and support multiple versions).

I guess what I’m getting it is that, while it’d be convenient for tool-builders if the format is restricted, it’s a trade-off with making it “less than what python allows”, which is sort of confusing. If you’re developing in your home environment with all your dependencies, python my_script.py will work fine because your __pyproject__ is a valid string, but pip-run my_script.py will fail on someone else’s machine because you didn’t format this bit of metadata correctly.

So it might be worth softening that language to “should not” [1]. In the end perhaps this is point in favor of the comment-block approach, although that’s just defining a separate syntax instead of a hybrid one.

I don’t know what the right choice is here, just considering the options. Abstractly, I’d prefer it if “valid python w/ a valid TOML string in it” was all I needed to remember in terms of formatting.


  1. I don’t know the exact meanings of must/should/may in the PEP context, though ↩︎

The first time someone did __pyproject__ = f"""...""" you’d regret that leniency :slight_smile:

Honestly, a lot of this is covered in PEP 722’s rejected alternative “Why not use (possibly restricted) Python syntax?” I’m not trying to say that PEP 723 is doomed because I was right all along, but I do think you should start by making sure you’ve got good arguments as to why the issues described in that section don’t apply to PEP 723. There’s a certain amount of subjectiveness to what I put in PEP 722, certainly, but I tried to be objective enough that a simple “I don’t agree” is likely to be missing something.

That was my abstract preference for the ideal, not what makes the most sense here. I was thinking more about restrictions on the text contents. I did mention earlier that the PEP should specify it must be a string literal. One might hope this would be obvious to people but it probably belongs in the PEP as a requirement. For instance, people will definitely try to concatenate strings together with + and not understand why it fails.

The reasons I view “valid python and valid TOML” as my preferred outcome are

  1. I don’t need to learn new syntax, or some special-cases for the syntax I know. Those who don’t know TOML will at least learn a standard format
  2. Parsers exist for both formats already and can be used without modification to add restrictions
  3. Simple to transfer metadata between this format and pyproject.toml and vice versa

I can see how this makes other aspects trickier and I’m not saying it’s the only way. But I think these are useful features if they can be feasibly achieved.

I do think the question of python version is relevant to that section as well. A script that requires python version X will only successfully run if the running tool has access to version X [1]. Granted, it’s a little more convenient to not care up front, but on the other hand if you parse the script you can fail faster on incompatible syntax (rather than e.g. creating a venv and installing dependencies and then failing). So I’m not convinced that “needs the right python parser” is such a big obstacle, when in practice it will be a requirement.

I hope that reasoning is more objective than “I don’t agree” :slight_smile:


  1. especially if “install the right Python on demand” is out of scope here ↩︎

Wouldn’t it be better for this proposal to reduce the scope to its minimum at first and then slowly expand in later PEPs if/when necessary? This is one of the things I like about PEP 722 which introduces the concept of generic metadata blocks but only specifies one type of metadata block for now. For example the proposal could say only project.dependencies is allowed. Once it is proven to work (with various implementations), we could then expand with the next most-asked feature(s). I am a bit worried, it is a bit too ambitious to want to allow (nearly) everything that pyproject.toml allows right from the start, that we will discover issues too late (after the PEP is approved) and will need to put band aids on the specification to fix them. The scope is quite large, it feels hard to make sure we thought about all the consequences and side effects.


Update: Ah, I just now read that PEP 722 is dropping the “metadata block” concept. This does not change my point of view for now.

2 Likes

What could possibly go wrong? What do we need to prove “works”? The author of a project and their email, the project license or the version(s) of Python it is compatible with are just inert metadata, they don’t care how they’re used (and the PEP intentionally considers it out of scope to specify whether or how tools use them). If there are bugs, it will be with the way tools use the metadata. I don’t see how the format itself would need any changes.

Brushing up on my lexical analysis I’m surprised to find that f-strings are still called a “string literal” even though it’s actually a complicated expression :upside_down_face: That’s not what I intended to suggest.

You might be right. I guess maybe it is an unjustified worry from me that this might go wrong. I feel like maybe some users might get the wrong expectations about what will work out of the box. Users might put all kinds of fields in the embedded pyproject.toml and will be unhappy that they are not taken into account by the tools (for example because the tools have not added support for this specification yet or because the authors do not want to add support).

Maybe it would be good to expand the “How to Teach This” section and add something like “It should be clearly communicated to the users that it is up to the maintainers of each tool to explicitly add support for this specification. What is possible in pyproject.toml does not automatically become possible in the embedded variant”.

I feel like this is exactly the kind of message that can be confusing to users. We do not want users to file bug reports when they add fields to the embedded pyproject.toml and tools do not take them into account. Most likely this is not what you meant, but I am not sure everyone will understand the subtlety here.

1 Like