PEP 723: Embedding pyproject.toml in single-file scripts

ofek · August 11, 2023, 2:54pm

Not really, just use tools that do what is desired and then based on usage we can come to a consensus on a possible standard. The build backend expansion happened and we made a choice to allow users and tools to experiment because there was basically only one way to do things.

If we are using standards to experiment with things that can already happen then I am an extremely hard -1 on us writing any more standards.

sinoroc · August 11, 2023, 6:22pm

Offload the experimental features into plugins. That is probably what I would do.

As interesting as it is, I can’t help but feel like the discussion “what is a project?” is out of scope. As I mentioned earlier, if I understood PEP 723 correctly then it is possible to embed pyproject.toml into any Python file, even a single importable module in the middle of a library. Meaning the PEP allows embedding in a file that is probably not executable (not a script), a file that is not a full project of its own.

I am a bit worried about this. PEP 621’s [project] table has a specific purpose, is meant to be located in a specific location (a pyproject.toml file), and for specific kinds of projects (say: projects that are meant to be built as a wheel; in other words: packaging). And now we want to reuse [project] nearly as-is in possibly very different contexts without much caution.

PEP 621 says a tool has to take all metadata from [project] and place it in Core Metadata fields. Now it seems like tools such as pipx, pip-run, hatch, and so on will be free to pick whatever fields they want from PEP 723’s [project] table and do whatever they want with them.

Maybe there is no reason to be worried, but I can’t shake the feeling that it does not seem exactly right. Maybe the [project] specification (the one resulting from PEP 621) needs to be amended?

I do not have a solution to offer.

[Off-topic: Many times I have wished docutils was part of Python’s standard library. Too bad…]

ofek · August 11, 2023, 6:44pm

That’s actually a really interesting idea and would I think assuage many of the concerns expressed about that. The differences would be minimal I think:

That table is for reading by any tool that needs metadata about the project and its runtime requirements where project is defined as Python code that is executable or importable
The name and version fields would be optional and only required when build backends in particular are the consumer since they must write core metadata

Something like that would be great and I would love if I could get people’s thoughts about this!

brettcannon · August 11, 2023, 7:06pm

Interestingly enough, I was just talking with someone today who asked whether it was desirable for linters or something to help making single-file scripts more isolated. I don’t think the PEPs should prescribe this, but tools could choose to support such a helper feature if they wanted to (e.g. symlink/copy the script to somewhere so that sys.path doesn’t pick up the local directory). But I think that’s a tools question as to whether they want to make the single-script portability an important use case.

Agreed. I’m personally ignoring it as I find it tangential to either PEP’s contents as it doesn’t change what conceptually the PEPs are each proposing (and I’m aware of the differences in scope between them).

It’s all a balancing act. The key point is you have to accept you may get it wrong. You can let tools experiment endlessly, but unless you’re willing to stop and choose something and be willing to get over your fear that it might not be perfect, you will end up with no standards in which case you end up with no interoperability and everyone doing everything differently because it’s all defined by the tools (and I do not want to go back to a convention-based world).

Another possibility is to define a new [run] table ala Projects that aren't meant to generate a wheel and `pyproject.toml` and that’s the only thing allowed in a script (i.e., really lean into the idea of this is replacing requirements.txt for the simple case and then scale up). And to be clear, I’m not trying to guide you or anyone else towards this, but this is an option that is sitting in the back of my head if [project] becomes the stumbling block while embedding TOML is not.

brettcannon · August 11, 2023, 7:18pm

Related to this, an option is to also flat-out forbid [tool] tables and say if you want to go to that level of “production”, then please make a directory with a pyproject.toml. That would do away with the per-tool precedence question and also potentially simplifies explaining what the metadata is for and how it will be treated. I think this ties into the question/concern some people have expressed that folks are going to (ab)use this for way longer than they should before taking the time to create a directory and a separate pyproject.toml.

davidism · August 11, 2023, 7:25pm

I would split out the off-topic “what’s a project” discussion, but it’s mixed in enough that I’m not clear how to do it. Given that “all of pyproject” essentially means “a project”, it’s hard to distinguish what’s actually meaningful to the PEP compared to what’s outside the immediate topic. If @ofek wants the conversation to be more focused, message me with how you want to split it.

I’m requesting people stick to the specifics of this PEP in this topic, and create a new topic if they feel they have things to say about projects.

pf_moore · August 11, 2023, 7:28pm

Obviously such an amendment would need a PEP.

I’d be concerned that we are normalising the idea of reading metadata directly from pyproject.toml, rather than reading it from the core metadata fields in an actual metadata file (PKG-INFO in a sdist, and METADATA in a wheel or installed project). The pyproject.toml file is by definition less reliable than those places, because fields can be dynamic in pyproject.toml and filled in later, for those other locations. I don’t have a specific issue here, just a general feeling that we’re taking a risk, and we should be cautious about assuming everything will be OK.

Even just looking at dependencies, tools can’t reliably get a project’s dependencies without invoking the build backend unless they are willing to reject any project that declares its dependencies as dynamic. And editable installs are explicitly allowed to inject additional dependencies even if the pyproject.toml states that the dependencies are static. How would a PEP 621 spec change address that?

The idea of making name and version optional would be quite problematic, unless it was tightly constrained. Many tools (for example, pip) rely on the idea that a package is uniquely identified by its name and version. If we combine making those fields optional with the idea of tools reading metadata from pyproject.toml, we could end up with tools that can’t tell if two projects are identical or not.

Basically, I think this would be quite a complex and risky PEP to write with sufficient precision to ensure we don’t cause problems because people misinterpret the spec, or read it in different, incompatible ways.

And I’m sorry to go on about this again, but this still seems to be motivated mostly by a sense of “it would be nice if we could…” and not by actual user requirements or use cases. This is one of my biggest reservations with PEP 723, and it sounds like you’re now simply proposing to push that problem a step further back, and apply it to the definition of pyproject.toml as well.

I’m not against amending PEP 621 if we need to. There’s an ongoing discussion in Projects that aren't meant to generate a wheel and `pyproject.toml`, which may well result in a proposal for a change to that spec. But that discussion needs to run its course and get some sort of consensus, and then someone needs to write a PEP proposing the agreed changes to the spec. If PEP 723 relies on modifications to PEP 621, then I don’t see how we can reasonably call PEP 723 ready for approval before that happens. And conversely, if it allows embedding of something that looks like pyproject.toml, but to which different rules apply, it’s both misleading and harmful^[1] to claim it’s proposing an “embedded pyproject.toml”

In the sense of further damaging the packaging community’s credibility over “complicated and confusing rules” and “too many similar but different ways of doing things”. ↩︎

pf_moore · August 11, 2023, 7:36pm

This is very close to a TOML-based variant of PEP 722, with run.dependencies as the dependency block data, and all other sub-keys of run as “for future expansion”. I’d support exploring this as a combined version of 722/723, if we could address our other differences of opinion over format.

But I’m not sure how this links in at all to pyproject.toml, except in the sense that “Projects that aren’t meant to generate a wheel and pyproject.toml might end up with something similar, but we don’t know that for sure yet”, so if we do go down that route someone would have to explain that to me.

ntessore · August 11, 2023, 8:23pm

Is there a real need for the PEP to specify the format in terms of a regex instead of simply saying something to the effect of tomllib.loads(__pyproject__) being equivalent to tomllib.load(open("pyproject.toml"))? It seems unnecessarily strict to ask the PEP to produce an airtight specification that third party tools can read with minimal effort. If a tool cannot deal with e.g. __pyproject__ in a docstring, let that be a limitation of the tool.

ofek · August 11, 2023, 8:24pm

Yes this is a hard requirement so other languages can implement the spec.

ntessore · August 11, 2023, 8:28pm

Tools in other languages can still implement it, it’s just more work on their end. The first thing that comes if mind is Ruff, of course, and we already heard that it’s not an issue there.

thejcannon · August 11, 2023, 8:59pm

I’d honestly prefer this the most too.

I love the simplicity of PEP 722. I love the structured data approach of PEP 723.

Combining both like this would be such a simple thing for us to support both in Pants and PEX.

I don’t have many thoughts on where it goes. The back ticks in a comment approach seems the easiest middle-ground to support for us. I’d hope that the spec wasn’t too prescriptive on if we had to use a regex parser, because we already build on top of a Rust-based tree-sitter parser.

So, I really do think this is the right middle ground and grabs the best of each, while ALSO solving many of the cons in each

ofek · August 11, 2023, 9:22pm

The new [run] table approach would preclude, for example, the possibility of any standard for building distributions from single files since any backend defined in [build-system] mostly depends on [project] in order to write core metadata appropriately.

I am okay with that situation if we continue to allow the [tool] table for extra functionality. If we are okay with that then I am comfortable adjusting the PEP or collaborating with Paul for a new one.

thejcannon · August 11, 2023, 9:51pm

I think each of the other top-level keys (if that’s the right toml term) could be its own PEP. Could it not? Does it need to be specified in a PEP scoped back down to “running single file scripts”?

First PEP of any introduces the embedded toml metadata, then future PEPs get to add (like project or tool). That seems ideal (at least to me)

sinoroc · August 11, 2023, 9:51pm

[project] and [run] would serve different purposes. [project] would keep being for packaging (building wheels, which might also require the presence of [build-system), and [run] would be for whatever we specify it should be (running single-file scripts and/or whatever that other thread ends up deciding). The two tables could be in the same embedded TOML if that ever makes sense. PEP 723 could focus on “embed TOML for metadata and/or configuration”, and PEP 723+n would be the actual specification for [run].

pf_moore · August 11, 2023, 10:24pm

Sorry, but I’m personally not OK with that. So I don’t think it’s a viable route for combining PEPs 722 and 723. I think there’s too much potential for confusion and/or implementation-defined variations if we allow [tool] config to be embedded in files as well as in a pyproject.toml. Does the embedded data take precedence over the standalone one? Does the embedded get ignored if there’s a standalone one, or vice versa? Do they get merged somehow? If tools can choose which they prefer, users won’t be able to infer anything by analogy with tools they already know, and we’ll just have more complaints of Python packaging being confusing.

IMO, as soon as we allow pyproject.toml data to appear in more than one location in a project, we have to address these sorts of question.

PEP 722 avoids this confusion because it defines data unrelated to pyproject.toml. PEP 722 dependency blocks only define the dependencies needed to run the file as a script, which is completely independent of any data in pyproject.toml. So there’s no overlap, and hence no need to worry about questions of precedence.

ofek · August 11, 2023, 10:57pm

I don’t understand why packaging is being brought up, can you please explain?

pf_moore · August 11, 2023, 11:09pm

Packaging as in “people complain Python packaging is confusing”. Making the behaviour of pyproject.toml more confusing by introducing a new way of writing a pyproject.toml, with ill-defined interactions if both the existing (standalone file) and new (embedded) ways are used together strikes me as bound to cause people to complain more that “packaging is confusing and too complicated”.

None of this is about “packaging” in the sense of building wheels, if that’s what you mean. PEP 722 is clear that the “scripting” use case is distinct from building wheels. PEP 723 muddies the water a bit by talking about build backends and single-file builds, so it’s harder to be sure that building wheels isn’t affected. But my point is about people’s perception of “the packaging ecosystem”, which is a much broader thing.

If I’m not getting my point across, don’t worry too much. The key thing is that I’m not willing to support a variant of PEP 722/723 that introduces per-file [tool] config. Per-file tool config is a complex enough issue (IMO) that it should be handled with its own, standalone, PEP.

brettcannon · August 11, 2023, 11:26pm

Once again, to belabour the point, I’m not trying to direct any of this; I’m just trying to be open about thoughts I have in my head and they are not necessarily fully formed (as you all are well aware ).

But what I was thinking was the hypothetical [run] table would work in both pyproject.toml and inside a file. So while it might be simple in terms of what it contains (i.e., requires-python, dependencies), it would be the same regardless of where it was written down.

…

It’s actually way more work to implement it without putting restrictions on the Python syntax (hence the regex). Ruff can read the variable if its given a string literal, and that’s only because it has a complete Python parser. But if someone tried to use an f-string and suddenly Ruff wouldn’t be able to process the string without executing arbitrary Python code. This also would mean any other tool that wanted Ruff’s level of support would also need to use a full Python parser to support, e.g. implicit string concatenation, different string styles, etc. And all of this is assuming user’s can debug why things suddenly didn’t work appropriately because they used some Python feature that worked for a parser but doesn’t resolve to the string they wanted.

Because I can guarantee you that if we are not very precise on how tools are supposed to get this value, someone is going to say “separation of concerns” and want the magic of pointing e.g. pipx run at a file to handle the environment for them, and they will do:

import pathlib
__pyproject__ = pathlib.Path("requirements.txt").read_text(encoding="utf-8")

And if we say, “it needs to be a string literal”, they will do:

# Requires Python 3.12 for the nested quotes.
__pyproject__ = f"{pathlib.Path("requirements.txt").read_text(encoding="utf-8")}"

… and then argue, “that’s a string literal!” So unless we want to mandate any tool that does the execution can run arbitrary Python code, we have to be very explicit about what’s allowed in the variable-assignment case.

ofek · August 11, 2023, 11:32pm

I don’t have much time to comment today but the concerns about formatted strings wouldn’t work based on the way the PEP is written. The regular expression is canonical and it does not allow that.