I am literally an “IDE and editor author”, so the answer is yes and I support this PEP (or something like it).
I don’t, and see below as to why …
Because this is about lowering the ceremony around getting a simple script to work that just wants one or two external dependencies to run. Every extra file, extra motion, etc. required to make this use case work waters down the usefulness and increases the leap from “just using the stdlib” to “using one external dependency”. The use case this is meant to tackle are scripts where your testing is, “does the output look right?”, thus you literally may have one, maybe two or three external dependencies. Need to download some CSV file and process it? You probably just want httpx or requests. Need a little data analysis? Might toss in pandas. But you very likely are not going to have a test suite. You are not going to have any CI for this. The point is adding another file at this workload size feels like overkill when it’s a single .py file sitting in some directory with some files you happen to use to process those e.g. CSV files.
The mental exercise I’m doing around this PEP and the various alternative proposals is what would it take to add requests as a requirement to a script? That’s the use case I think we are targeting here and the one I think people should be optimizing for here.
Between these concrete examples of reasons to have a single file:
, and the fact that sometimes these single-file scripts might share a directory (such as, again, /usr/bin plausibly, supposing that the single-file script has a shebang line that explicitly invokes a script runner)
, and this description:
I am on board, and I want the syntax to look like it is in the proposal, not like ordinary requirements TOML.
Rationale: this syntax is clearly easier to work with for the cases described, and doesn’t make impositions that aren’t required outside a packaging context (name and version specifications, in particular).
As I understand the proposal, it’s entirely up to external tools to determine an interpretation for the comment; this just standardizes a format so that it can become a target for tooling.
A script runner might parse it and directly install requirements in a venv.
An IDE could suggest things that belong, based on its static analysis and knowledge of popular third-party libraries (up to the IDE’s discretion). Perhaps it might even have a database of information about deprecations, so it can warn that certain versions are necessary for the functionality requested in the code.
A full development toolchain might offer a migrate command that parses the same block and uses it to generate a skeleton pyproject.toml with dummy name and version.
Because this format describes something more specialized (only the requirements), and that thing doesn’t have any complex structure (it’s just a plain sequence of requirements specifiers), the TOML format is unnecessary overhead for the writer.
However, conditional on the PEP’s acceptance, I would strongly prefer for the specific task of “bootstrapping” pyproject.toml to be implemented by a standard tool that either ships with Python or at least can trivially be bootstrapped into the Scripts directory. This is about as clear of a There should be one-- and preferably only one --obvious way to do it. case as I can imagine. Sometimes projects do take that step, and there’s no good reason not to facilitate doing so.
My preference there is strong enough, actually, that I’d like to volunteer to write such a tool.
Separately, I also like the idea of being able to have other .toml files in the directory that use this format for locally relevant purposes, but the name pyproject.toml is privileged by the packaging ecosystem - i.e., Pip looks for that name, therefore you need to have one to get onto a Pip-compatible package index, and tooling is intended to use it for the process of building a package for such an index. (As such, the tool I describe above should have an option to use a different filename, but default to pyproject.toml.)
I’m not sure it’s quite one-or-the-other. Just like how tools already support the requirements block this PEP proposes, at least one tool already supports import mapping .
And, bikeshedding on implementation aside, the big pros of scraping imports are that the metadata can’t drift from reality as far (remove an import and you removed a dependency, vs having to remember to remove the line at the top), and less duplication.
In the context of this PR, however, I do think it’s worth a section in the alternatives section that mentions even if such a tool was widespread, the proposal would still exist because …
I’ll try and work on getting it standalone in the next week or so, to prove it’s easy and fast (thanks Rust!) ↩︎
I’m -1 on putting the block at the end of the file, so that’s not going to be in the PEP I’m afraid (no reason other than personal preference). If you prefer putting it at the end, you can, but it’ll be user choice, not mandated.
As to the other part of your point, what precisely do you mean by “this”? Presumably not cargo.toml Do you want a full pyproject.toml embedded in the file? What’s the use case? If you need a full pyproject.toml, why wouldn’t you be able to use a project directory? The whole point of this proposal is that it’s aimed at “better batch file” simple scripts, where almost everything in the pyproject.toml makes no sense. If we say that we allow an “embedded pyproject.toml”, we’re bound to get users getting confused because not everything works (“I tried to set up a hatch environment for my script using tool.hatch.envs and hatch doesn’t recognise it”, or “I put a custom ruff config in foo.py and ruff is ignoring it”, or "I embedded the pyproject.toml for mylib.py, and pip won’t build it, …).
The rust example is slightly misleading, because as a compiled language Rust has to do a build (even for running a single-file script) so having a build config file is reasonable.
For context, I’m the author of Rust eRFC #3424 for integrating single-file package support (including dependencies) into cargo.
The rust-script syntax that was given is for embedding the manifest / Cargo.toml directly into the file by using a markdown code fence in the module’s doc comment. Using #! would not be equivalent in Python but putting it in the module’s doc comment and using … rST? markdown? syntax for a literal block. However, we are concerned about taking that approach because we are wanting allow people to do this with libraries and then we dirty up people’s documentation with implementation details.
My Pre-RFC contains a lot of details for different trade offs for different ways of embedded manifests or lockfiles (more equivalent of this proposal) within files, particularly the Unresolved Questions section.
Some relevant thoughts
Bug reports and educational material are very big use cases in my mind.
We’ve also gotten a lot of feedback about process overhead with having smaller packages, so we are looking at supporting this for libs and supporting publishing them. In Python terms, this would be to have pyproject.toml support
However, we have a lot more opinionated of a build system which makes it easier to default a bunch of fields to make the syntax overhead is low
Most likely, tooling will want to edit and not just read, so consider that workflow as well
Thanks a lot for the context here @epage! I had no idea that this was such a recent change in cargo.
I agree (e.g. auto-populate that section from imports in the file), which is why I think it should be in a structured format.
Yeah, not recognizing //! as a doc comment is my lack of Rust familiarity showing. I don’t see the library case for python though (certainly not in the context of this PEP), and I think a module comment would actually be great for a number of reasons.
There was a concrete example @ofek was responding to…? But to update this in view of @epage’s inputs, how about something à la:
This is the docstring of my script, which does X.
It has the following requirements:
requires-python = ">=3.9"
dependencies = [
import numpy as np
This would kill several birds with one stone:
Still easy to extract (some specially marked section of the docstring)
Still easy to parse and insert values into programmatically (because it’s a toml file)
Still consistent with pyproject.toml
Self-documenting in an already established, canonical place (plus, the docstring of a script sounds like a great place for putting requirements)
Easy to copy & paste to or from actual toml files, because no extra characters (# etc.) in front
Solves the “which Python version does this script need” problem that came up further upthread
This still has about the same list of things up for bikeshedding as above (e.g. what’s the marker around the toml file, is it [project] or [requirements] or …, etc.), but I hope it’s concrete enough to communicate the intent?
Which is why we’re not talking about the full pyproject.toml, but some reduced form of it. Though the ruff config is a great example IMO of why following existing patterns is better than creating new ones, because people will still want to want to lint their single-file scripts, and that way we’d have a canonical place to put the config (as an extension in the future, not for this PEP).
Overall I like the suggestion and I appreciate that it plans for extensibility. I absolutely agree with the above points, but there are a couple I’ve singled out…
Here I’m more skeptical. Yes, it’s easy to get a __doc__ attribute, split it into lines, find the right section, and feed it to a TOML parser. But on the return trip, we’re talking about modifying the data, feeding it to a TOML formatter, wrapping it, restoring the rest of the docstring content, putting the result into a legible source code format and editing it back into the original. The italicized step is where I see the most potential complication. There could be string escaping issues; there’s a goal to ensure that a triple-quoted representation is used; and after all of that, the result could differ from the original formatting in a lot of different ways. I think we already discussed it ITT, but TOML parsers generally don’t try to preserve exact details about the input so that they can be reflected in later formatted data.
And, yes, of course one can also treat the data as if it weren’t TOML and just look for the dependencies = [ and ] lines and trust that everything is formatted in a way that won’t cause problem. That… obviously isn’t as robust.
There’s definitely merit to this, but it raises the small issue of whether we want such sections to get dumped unchanged into Pydoc output, or of what Sphinx should do with them.
Thank you for the PEP! It looks useful – even if it’s a limited use case. Even if it was just for pipx and VS Code, it’d be great to have it.
Here are some comments I don’t think were raised yet; apologies if I missed something:
AFAICS, the meat of the document should move to the PyPA standards page after acceptance, to make later revisions easier. Is that right? You might want to note that in the PEP.
[specifying the Python version] is not something I plan on including
And what about specifying the interpreter? pyproject.toml rightfully doesn’t let you do that, but if pipx or pip-run occupies the shebang, it could be nice if you could add something like
# Python: ~/custom-pypy-build/pypy
Perhaps that’s better left as a tool-specific extension.
But judging from this thread, people want such extensions, and will add them. It (sadly) might be good to consider how they would work.
I can see people wanting to write:
# GUI-Name: Download latest training data
Did you consider requiring the entries to be indented more than the header, rather than (or in addition to) ending with an empty line?
Besides making the header searchable and unique, did you consider using an extra sigil to mark the comment as special – something that you should search for if you don’t know it? E.g. something like:
Encoding declarations and type: comments don’t do this, but IMO it would be a good idea to start marking machine-readable comments.
Yes, this will be added to the PyPA specifications section of the packaging guide. I didn’t mention that explicitly because it’s simply the normal process (even if it’s not always done as promptly as we’d like).
I like the syntax suggestions here. I agree that while I don’t want to extend this PEP beyond declaring dependencies, there’s clearly a possibility that people will want to add further data at some point, and so having something that’s extensible is important. So here’s a concrete proposal. My plan is for this to be the spec that goes into the next revision of the PEP, so feedback is appreciated - but at some point I have to draw a line and say “this is what I propose”, so any suggestions for particularly radical changes will at this point likely just get told “thanks, but no thanks” and go into the rejected alternatives.
A script can contain one or more “metadata blocks”, each consisting of a block of one or more consecutive lines starting with ##. Blocks end at the first line that doesn’t follow this format.
Leading and trailing whitespace are ignored in a block (so no significant indentation), as are lines with nothing but the ## marker.
The first line of a block is a header, and consists of a block name (which defines the block type), followed by a colon, optionally followed by data.
Interpretation of the data on the header line, and of the rest of the block, depends on the block type.
The only block type defined in this PEP is Script Dependencies. This block cannot have data on the header line, and every other non-blank line in the block is required to be a PEP 508 dependency specifier.
(I’ll tighten up the specification in the actual PEP, but that should be clear enough to explain the idea).
Under this proposal, this would be
## Script Dependencies:
## GUI-Name: Download latest training data
The blank line is required, to separate the two metadata blocks.
To everyone who’s been proposing TOML in one form or another, I’m sorry, but that won’t be the proposed syntax. If someone wants to, they can propose a competing PEP that uses TOML. Or, given that I generally think that “competing PEPs” is an unhealthy approach, if someone wants to persuade @brettcannon (as PEP delegate) to declare that the spec should use TOML, then I’ll happily hand over authorship of this PEP to someone else to make that change. But I don’t think I could fairly represent a proposal that used TOML myself. I do intend to expand my reasoning in the relevant “rejected alternative” section of the PEP - what’s currently there relies too heavily on the argument that the PEP doesn’t need the complexity, and I think that’s insufficient once we start considering future extensions. But I don’t expect to change people’s minds - I’m just intending to document my reasons for the choices I’ve made.
I really didn’t expect putting something else on my plate but I don’t mind writing the PEP for the embedded pyproject.toml proposal, if that’s okay with you and Brett is not yet persuaded. I think that way is much better long term for the community with regards to user expectations and interoperability (dependency management tooling, IDEs & linters, security scanners, etc.).
I’m fine with it as long as you can get your alternative PEP done within a week of when Paul finishes the rewrite of his PEP. That way it doesn’t greatly delay me, @courtneywebster , and our team from evaluating the PEPs, lining up user studies, etc.
I will say, though, that with Projects that aren't meant to generate a wheel and `pyproject.toml` unresolved I’m reluctant to lean into a TOML solution. I also think it will be more confusing for beginners/occasional Python developers who are going to ramp up into this way sooner than to pyproject.toml (and I’m willing to be a lot of them will never go past whichever solution is chosen as they just don’t need to; Python is still the glue language of the internet and I bet by sheer quantity makes it the largest user base of Python). But if user studies disprove this hunch then I’m open to a TOML approach.
I don’t understand why this is so time-sensitive. Now that @ofek announced he’s willing to write an alternative PEP, shouldn’t that get a fair chance? Sure, we shouldn’t stall forever on a promise, but a week seems rather short. Has a similar condition been attached for other competing PEPs before?
Thank you so much for picking up this mantle! If you want some review/support, please feel free to reach out.
Not being able to use a few third-party dependencies in simple scripts has been my main pain point in Python for years, so I’m really glad that there is finally going to be a good solution for it.
I think the currently proposed simple syntax is the right choice for this use case. It’s easy to remember and parse, especially by other simple scripts. If it used TOML on the other hand I would most likely have to look up the exact syntax almost every time I write a script, which is exactly the kind of hurdle that this proposal is trying to address. It would also mean that any tool trying to read or write it would need a full-blown TOML parser. So I am very much in favour of keeping it as simple as possible.
I know many here will disagree, but I think there’s a universe where both embedded pyproject.toml and pip-run’s requirements could be standard.
The simple requirements initially proposed in the PEP are quick to type, and therefore are easier to remember the format of: introducing more words, more structure (eg indentation, separation from other blocks), and a new syntax (the double hash ##) makes the format harder to remember.
The embedded pyproject.toml allows the full extensibility of the existing file for free, with no new syntax (outside of the embedding syntax). It would be a half-way point between the basic requirements and a full Python project.
The main counter-argument here is “the are multiple ways to do the same thing, increasing mental burden on new users”, but I don’t think the embedded pyproject.toml really is more burden than the original distinct file, so I would say only one new format (which already exists non-standard) is being introduced.