PEP 723: Embedding pyproject.toml in single-file scripts

ofek · August 10, 2023, 2:18pm

I didn’t get a chance to open a PR last night but when I do today I will also add support for raw strings (r"""...""") in the regular expression. I will also add an explicit “why not” section about storing metadata in a comment. Before I do that however please note that I talk about that in the latest version:

The concept of regular comments that do not appear to be intended for machines (i.e. encoding declarations) affecting behavior would not be customary to users of Python and goes directly against the “explicit is better than implicit” foundational principle.

Users typing what to them looks like prose could alter runtime behavior. This PEP takes the view that the possibility of that happening, even when a tool has been set up as such (maybe by a sysadmin), is unfriendly to users.

If that is still unclear than I can add the section as I mentioned and reiterate that point

jamestwebber · August 10, 2023, 2:20pm

I guess “appear to be intended for machines” is in the eye of the beholder to some extent. I don’t think a TOML comment looks like prose, but maybe I’m an aesthete.

But obviously it’s your PEP. If you don’t want to propose such a thing that’s totally fine.

sirosen · August 10, 2023, 2:32pm

I think we have different takes on what novice user expectations would be. (Or maybe sysv init scripts have melted my brain too much!)

In particular, I don’t think a magical dunder variable is in any way more explicit than a magical comment. For a novice user, this, __name__ == "__main__", and all manner of other things are unintuitive. But they’re learned by example first and foremost, and most users don’t worry about how the magic works until they’ve already tried using it.

There may be a weak link between dunder vars and a vague sense of “here be the magic!” But I’m not personally super-duper convinced. It’s hard for me to imagine anyone opening a file and seeing a big block of toml in a comment (with some leading delimiter, like Rust is doing) and thinking that it’s prose rather than data.

merwok · August 10, 2023, 2:36pm

That wording is quite strong – shebangs and encoding declarations are well established standards!

(And regular reminder that PEP 20 is a serious joke musing about language design for Python, not a set of laws that must bind every decision ever in the whole ecosystem)

ofek · August 10, 2023, 2:42pm

That is fair, this is a distinctly different idea and by prose I was indeed referring to 722 and “Script Dependencies”.

If we go with the comment approach we couldn’t have a leading delimiter for the same reason that 722 reverted that idea, so I’ll have to think about this during lunch

jeanas · August 10, 2023, 2:47pm

FWIW, I think “is the metadata put in machine-readable comments or in a special variable” and “does the metadata have the form of a TOML string with the structure defined by PEP 621 or a simple newline-separated list of dependencies” are orthogonal questions.

In other words, it would make sense for @brettcannon to choose PEP 722 but request that it be changed to a special variable, or choose PEP 723 but request that it be changed to comments. I don’t know if such situations have happened in the past and how it was dealt with them on the process level.

(With that being said, I personally view the choice of metadata format as more important than how it’s embedded.)

jamestwebber · August 10, 2023, 3:04pm

I agree with that, which is why having the two proposals differ in both respects seemed like a distraction.

After all the discussions here about formatting, I think I’d only want a special variable if parsing the python was required for reading the data. Otherwise there are too many gotchas about escape characters and syntax. If the data will be parsed via regex (or similar pattern-matching of the raw text) then a comment makes more sense.

And given that parsing the python file has other issues for tools, it seems like a comment makes the more sense. The pertinent question is the format.

pf_moore · August 10, 2023, 3:09pm

I think this is fair. I also think that “is the data just dependencies or a full pyproject structure?” is yet another axis. While it makes little sense to do “full pyproject but not TOML”, I think that “just dependencies but using TOML” is a reasonable option. I’ve argued my choices for all of these questions in PEP 722, but another PEP could reasonably choose any other set of answers for the 3 questions.

Let’s hope we don’t need 8 different PEPs, one for each combination, though!

ofek · August 10, 2023, 3:25pm

As far as parsing I have restricted what is allowed so much that regular expressions would in all cases be easier to implement for the current proposal rather than comments. I’m not sure where that thought is coming from precisely. For example, if regular expressions worked best on comments then 722 would have that in the reference implementation.

In any case, it’s conceivable today that I may give in to the storage being comments. To be honest, I can’t think of a great delimiter so if that happens I might straight up copy what Rust has chosen with its embedded Markdown code block:

# ```toml
# [project]
# requires-python = ">=3.11"
# dependencies = [
#   "requests<3",
#   "rich",
# ]
# ```

import requests
from rich.pretty import pprint

resp = requests.get("https://peps.python.org/api/peps.json")
data = resp.json()
pprint([(k, v["title"]) for k, v in data.items()][:10])

There are some downsides obviously which I will document but on the bright side there doesn’t exist an easier format for IDEs to provide syntax highlighting lol

jeanas · August 10, 2023, 3:30pm

I would personally like that a little more than the current __pyproject__ = proposal (though I’d pick a different bikeshed color for ```toml, perhaps ```pyproject or ```metadata). It’s not easy to say why, but it might have to do with the question that __pyproject__ = raises over what escaping is valid or not.

jamestwebber · August 10, 2023, 3:42pm

I think what makes me uneasy about the __pyproject__ version is the choice between these options:

the text extracted via regex can be parsed as TOML, but the python value might not be valid
the value of the python string is valid TOML, but the raw text isn’t
the syntax is restricted so that both 1 and 2 are true, which means valid TOML from a pyproject.toml file can’t always be used here

Going with #3 is a reasonable choice. But lacking any use for the __pyproject__ variable, at that point a comment block seems better. A benefit of using TOML is the ability to move config back and forth easily between this and other projects, and restricting the syntax makes that harder, even if it’s only in some edge cases.

ofek · August 10, 2023, 3:47pm

Okay, I will definitely switch to comments today. Before I do however I’m curious if someone with IDE experience (maybe @brettcannon?) could talk about if reusing the existing ```toml``` language is preferable or if it makes sense to fully copy Rust and go with something new like ```pyproject```.

pf_moore · August 10, 2023, 4:43pm

I think this is an important point which hasn’t been fully addressed yet. The existing section in PEP 723 covering “why not just use a separate pyproject.toml file” seems to rest heavily on the idea that the PEP is about running scripts. But it doesn’t cover the “single file project” concept at all, and there’s a very strong (IMO) argument that if you’re already considering your single file as a project, it’s time to use a full project directory structure.

So I’d argue that PEP 723, as currently written, doesn’t address all of the arguments against the “single file project” idea. That may well be because many of those questions were raised in the PEP 722 discussion, where the response “because we’re targeting scripts not projects” was the straightforward answer. But that doesn’t carry over to PEP 723, because the other implication of that answer in the PEP 722 thread was “… and that’s also why we don’t need the full power of pyproject.toml”.

ofek · August 10, 2023, 5:12pm

This PEP doesn’t mean to go against the idea because it is in support of that idea. Do you think I should make that more explicit somehow? Like a slight pivot by changing the word “scripts” to “projects”?

jamestwebber · August 10, 2023, 5:22pm

Well hold on a second

I think the overloaded definition of “project” has led to a merging of pyproject with “a project I’m working on” and PEP 723 has the potential to support the above quote better than a list of requirements alone.

ofek · August 10, 2023, 5:41pm

No need to answer actually, I’m going with the latter to be even more explicit about “where the magic is”. I assume this is also the reason for what Rust chose.

Good job @epage it seems your proposal has thought of everything

brettcannon · August 10, 2023, 6:23pm

This has happened before, and as the PEP delegate I can request a PEP be changed in order to be accepted. In this situation it could be asking Paul and Ofek to work together on one of their PEPs (or a new PEP) that meets certain criteria and melds ideas from both PEPs or something.

pf_moore · August 10, 2023, 7:49pm

Oops. Good catch. Although to be fair, as you say below, there’s a lot of different ways people can mean “project”…

I disagree, but because I think that pyproject.toml is very much focused on a specific type of project right now, and until the thread that you took my quote from is resolved, I don’t think it’s a given that pyproject.toml is the right solution for everything that we might call a “project” - even if we restrict it to “python projects”.

I find it really hard to explain my point of view here, because everyone seems to have different ideas of what common terms mean, and use cases that seem obvious to me, other people seem to not know what I’m talking about.

To put this in concrete terms, there are a huge number of things I’d call a “project”, and which I’d use Python as part of. Only a very small minority would I describe as “python projects”, and for the rest, structuring them as if the Python code was the key aspect makes no sense to me.

A statistical analysis project that looks at the probability of various outcomes in different games. Uses some Python, mostly in a Jupyter notebook, but also some one-off scripts. Also uses batch files running a custom program (an executable not written by me). And files of data, etc.
A data download project that has a number of small Python scripts for downloading different data sets, some SQL scripts for data loading and analysis, a bunch of data files, some final output and some examples of “bad data” I need to analyse. Plus some “experimental” scripts trying out approaches that didn’t work, but have ideas I want to keep.
Data analysis projects looking for trends in production job runtimes. A mixture of SQL (for getting the data), shell scripts, Python (mostly Jupyter notebooks) and Excel.
A project that runs automated builds of a different project, where I use some Python scripts to automate bits of the build, but which is basically a series of github actions at the core.
Obviously a bunch of Python projects like pip, pipx, etc.

I think only the last of these would I classify as “Python projects” where I’d want to use a workflow tool that’s focused on Python. For the others, all I’d want was a means to “run a Python script”. In most of them, I could create a virtual environment in the project directory and include a requirements file or batch file to rebuild it. So a “script runner” saves me having to remember to activate the virtual environment, and could save me having to maintain it, but that’s about all. I’m not going to have conflicting dependencies, or anything complicated, so I want my workflow to focus on everything except the Python parts of the project.

I also use Python in a number of contexts that I’d never call a “project”:

Some utilities I wrote and put on my path. Only one of them is over 60 lines long, and most are under 20.
A lot of one-off experiments, many of which use libraries I was just trying out, or playing with.
Some automation scripts for monitoring systems that I used to support.
A number of Jython scripts to automate Oracle middleware.

For none of those would I remotely consider them as a project, or want anything more than a one-file script. I certainly wouldn’t need any of the features of a pyproject.toml beyond somewhere to record dependencies for a script runner. And importantly, none of them is ever likely to “grow” into a project.

Anyway, this is way offtopic (sorry @ofek!). The only real point is that I genuinely don’t know what people are thinking of when they refer to a “single file project” with the expectation that such a thing would need any of the extra capabilities of a pyproject.toml. And I’d find it much easier to decide whether I liked PEP 723^[1] if it explained that for me…

Not that persuading the author of the competing PEP to like your PEP is crucial, but I’d still prefer consensus over picking one option and disappointing everyone who preferred the other… ↩︎

pf_moore · August 10, 2023, 7:52pm

I’d certainly be willing to do that. The one key decision that I don’t see us being able to resolve, that I think will need a PEP-delegate decision, is “dependencies only or embedded pyproject.toml”?

jamestwebber · August 10, 2023, 8:03pm

I don’t know if this is that off-topic, as I think this question (and the associated thread) are strongly linked to both this PEP and 722. I think that decision is up to @ofek.

I think the confusion has already happened in the larger python community, and it’s why the popularity of PEP 518 surprised people.

It makes total sense that folks embedded in the packaging community have kept the distinction clear, while a lot of other people have never read the definition of “project” in the PyPA glossary. For those people^[1], a “python project” is a project they write in python, and so “pyproject.toml” seemed like a good place for config stuff, and here we are.

including myself, until I read it today ↩︎