PEP 722: Dependency specification for single-file scripts

h-vetinari · August 1, 2023, 10:41pm

Thanks a lot for the context here @epage! I had no idea that this was such a recent change in cargo.

I agree (e.g. auto-populate that section from imports in the file), which is why I think it should be in a structured format.

Yeah, not recognizing //! as a doc comment is my lack of Rust familiarity showing. I don’t see the library case for python though (certainly not in the context of this PEP), and I think a module comment would actually be great for a number of reasons.

There was a concrete example @ofek was responding to…? But to update this in view of @epage’s inputs, how about something à la:

#!/usr/bin/env python
"""
This is the docstring of my script, which does X.

It has the following requirements:
```toml
[requirements]
requires-python = ">=3.9"
dependencies = [
    "numpy>=1.22.4",
    "requests",
]
```toml
"""
import numpy as np
import requests
print("Hello world!")

This would kill several birds with one stone:

Still easy to extract (some specially marked section of the docstring)
Still easy to parse and insert values into programmatically (because it’s a toml file)
Still consistent with pyproject.toml
Self-documenting in an already established, canonical place (plus, the docstring of a script sounds like a great place for putting requirements^[1])
Easy to copy & paste to or from actual toml files, because no extra characters (# etc.) in front
Solves the “which Python version does this script need” problem that came up further upthread

This still has about the same list of things up for bikeshedding as above (e.g. what’s the marker around the toml file, is it [project] or [requirements] or …, etc.), but I hope it’s concrete enough to communicate the intent?

Which is why we’re not talking about the full pyproject.toml, but some reduced form of it. Though the ruff config is a great example IMO of why following existing patterns is better than creating new ones, because people will still want to want to lint their single-file scripts, and that way we’d have a canonical place to put the config (as an extension in the future, not for this PEP).

if it has to be within that script ↩︎

kknechtel · August 2, 2023, 4:46am

Overall I like the suggestion and I appreciate that it plans for extensibility. I absolutely agree with the above points, but there are a couple I’ve singled out…

Here I’m more skeptical. Yes, it’s easy to get a __doc__ attribute, split it into lines, find the right section, and feed it to a TOML parser. But on the return trip, we’re talking about modifying the data, feeding it to a TOML formatter, wrapping it, restoring the rest of the docstring content, putting the result into a legible source code format and editing it back into the original. The italicized step is where I see the most potential complication. There could be string escaping issues; there’s a goal to ensure that a triple-quoted representation is used; and after all of that, the result could differ from the original formatting in a lot of different ways. I think we already discussed it ITT, but TOML parsers generally don’t try to preserve exact details about the input so that they can be reflected in later formatted data.

And, yes, of course one can also treat the data as if it weren’t TOML and just look for the dependencies = [ and ] lines and trust that everything is formatted in a way that won’t cause problem. That… obviously isn’t as robust.

There’s definitely merit to this, but it raises the small issue of whether we want such sections to get dumped unchanged into Pydoc output, or of what Sphinx should do with them.

encukou · August 2, 2023, 8:15am

Thank you for the PEP! It looks useful – even if it’s a limited use case. Even if it was just for pipx and VS Code, it’d be great to have it.
Here are some comments I don’t think were raised yet; apologies if I missed something:

AFAICS, the meat of the document should move to the PyPA standards page after acceptance, to make later revisions easier. Is that right? You might want to note that in the PEP.

[specifying the Python version] is not something I plan on including

And what about specifying the interpreter? pyproject.toml rightfully doesn’t let you do that, but if pipx or pip-run occupies the shebang, it could be nice if you could add something like

# Python: ~/custom-pypy-build/pypy

Perhaps that’s better left as a tool-specific extension.
But judging from this thread, people want such extensions, and will add them. It (sadly) might be good to consider how they would work.

I can see people wanting to write:

# Script-Dependencies:
#     click
#     httpx
# GUI-Name: Download latest training data

Did you consider requiring the entries to be indented more than the header, rather than (or in addition to) ending with an empty line?

Besides making the header searchable and unique, did you consider using an extra sigil to mark the comment as special – something that you should search for if you don’t know it? E.g. something like:

## Script-Dependencies:
##     click

Encoding declarations and type: comments don’t do this, but IMO it would be a good idea to start marking machine-readable comments.

pf_moore · August 2, 2023, 3:42pm

Yes, this will be added to the PyPA specifications section of the packaging guide. I didn’t mention that explicitly because it’s simply the normal process (even if it’s not always done as promptly as we’d like).

I like the syntax suggestions here. I agree that while I don’t want to extend this PEP beyond declaring dependencies, there’s clearly a possibility that people will want to add further data at some point, and so having something that’s extensible is important. So here’s a concrete proposal. My plan is for this to be the spec that goes into the next revision of the PEP, so feedback is appreciated - but at some point I have to draw a line and say “this is what I propose”, so any suggestions for particularly radical changes will at this point likely just get told “thanks, but no thanks” and go into the rejected alternatives.

Proposed Syntax

A script can contain one or more “metadata blocks”, each consisting of a block of one or more consecutive lines starting with ##. Blocks end at the first line that doesn’t follow this format.
Leading and trailing whitespace are ignored in a block (so no significant indentation), as are lines with nothing but the ## marker.
The first line of a block is a header, and consists of a block name (which defines the block type), followed by a colon, optionally followed by data.
Interpretation of the data on the header line, and of the rest of the block, depends on the block type.

The only block type defined in this PEP is Script Dependencies. This block cannot have data on the header line, and every other non-blank line in the block is required to be a PEP 508 dependency specifier.

(I’ll tighten up the specification in the actual PEP, but that should be clear enough to explain the idea).

Under this proposal, this would be

## Script Dependencies:
##     click
##     httpx

## GUI-Name: Download latest training data

The blank line is required, to separate the two metadata blocks.

To everyone who’s been proposing TOML in one form or another, I’m sorry, but that won’t be the proposed syntax. If someone wants to, they can propose a competing PEP that uses TOML. Or, given that I generally think that “competing PEPs” is an unhealthy approach, if someone wants to persuade @brettcannon (as PEP delegate) to declare that the spec should use TOML, then I’ll happily hand over authorship of this PEP to someone else to make that change. But I don’t think I could fairly represent a proposal that used TOML myself. I do intend to expand my reasoning in the relevant “rejected alternative” section of the PEP - what’s currently there relies too heavily on the argument that the PEP doesn’t need the complexity, and I think that’s insufficient once we start considering future extensions. But I don’t expect to change people’s minds - I’m just intending to document my reasons for the choices I’ve made.

ofek · August 2, 2023, 4:04pm

I really didn’t expect putting something else on my plate but I don’t mind writing the PEP for the embedded pyproject.toml proposal, if that’s okay with you and Brett is not yet persuaded. I think that way is much better long term for the community with regards to user expectations and interoperability (dependency management tooling, IDEs & linters, security scanners, etc.).

sinoroc · August 2, 2023, 7:11pm

If we want to allow the embedding of a whole pyproject.toml inside a script, will we want the dependencies to be listed under project.dependencies (à la PEP 621)? Shouldn’t we get the discussion “Projects that aren’t meant to generate a wheel and pyproject.toml” to a conclusion first?

jeanas · August 2, 2023, 7:22pm

Yup, that’s been the reaction on Sketchy and maybe crazy alternative to PEP 722 as well.

ssweber · August 2, 2023, 8:12pm

I see your color Dependencies and suggest this shed be painted Needs

## Script Needs:
##     click
##     httpx

brettcannon · August 2, 2023, 9:53pm

I guess that’s official now.

I’m fine with it as long as you can get your alternative PEP done within a week of when Paul finishes the rewrite of his PEP. That way it doesn’t greatly delay me, @courtneywebster , and our team from evaluating the PEPs, lining up user studies, etc.

I will say, though, that with Projects that aren't meant to generate a wheel and `pyproject.toml` unresolved I’m reluctant to lean into a TOML solution. I also think it will be more confusing for beginners/occasional Python developers who are going to ramp up into this way sooner than to pyproject.toml (and I’m willing to be a lot of them will never go past whichever solution is chosen as they just don’t need to; Python is still the glue language of the internet and I bet by sheer quantity makes it the largest user base of Python). But if user studies disprove this hunch then I’m open to a TOML approach.

h-vetinari · August 2, 2023, 10:41pm

I don’t understand why this is so time-sensitive. Now that @ofek announced he’s willing to write an alternative PEP, shouldn’t that get a fair chance? Sure, we shouldn’t stall forever on a promise, but a week seems rather short. Has a similar condition been attached for other competing PEPs before?

Thank you so much for picking up this mantle! If you want some review/support, please feel free to reach out.

janlarres · August 2, 2023, 11:44pm

Not being able to use a few third-party dependencies in simple scripts has been my main pain point in Python for years, so I’m really glad that there is finally going to be a good solution for it.

I think the currently proposed simple syntax is the right choice for this use case. It’s easy to remember and parse, especially by other simple scripts. If it used TOML on the other hand I would most likely have to look up the exact syntax almost every time I write a script, which is exactly the kind of hurdle that this proposal is trying to address. It would also mean that any tool trying to read or write it would need a full-blown TOML parser. So I am very much in favour of keeping it as simple as possible.

EpicWink · August 2, 2023, 11:46pm

I know many here will disagree, but I think there’s a universe where both embedded pyproject.toml and pip-run’s requirements could be standard.

The simple requirements initially proposed in the PEP are quick to type, and therefore are easier to remember the format of: introducing more words, more structure (eg indentation, separation from other blocks), and a new syntax (the double hash ##) makes the format harder to remember.

The embedded pyproject.toml allows the full extensibility of the existing file for free, with no new syntax (outside of the embedding syntax). It would be a half-way point between the basic requirements and a full Python project.

The main counter-argument here is “the are multiple ways to do the same thing, increasing mental burden on new users”, but I don’t think the embedded pyproject.toml really is more burden than the original distinct file, so I would say only one new format (which already exists non-standard) is being introduced.

jamestwebber · August 3, 2023, 12:50am

One way to merge the two even further is to use the same names in this PEP as in pyproject.toml (which might be a good idea to reduce confusion, anyway).

Doing this also reduces the amount of bikeshedding since the answer to “what to call X” is always “the same thing its called now”, e.g. dependencies or description or what have you.

This makes switching a script to a toml format a little easier–you’d still need to reformat a bit but at least the sections are clearly one-to-one.

kknechtel · August 3, 2023, 3:22am

As far as I’m aware, competing PEPs are rare. Aside from that, Python’s current release cadence is a fair bit tighter than it was several years ago, and 3.12 stable is on the horizon which is keeping a lot of people busy.

johnthagen · August 3, 2023, 8:13am

I also support the embedded subset of pyproject.toml approach for the “there should be one and preferably only one” sense of it.

TOML provides a nice existing syntax, tomllib is in the standard library now, TOML provides a syntax to naturally expand the metadata overtime, and transitioning a project to full pyproject.toml is a simple cut and paste into a new file without having to learn yet another packaging format.

Thanks @ofek for championing this!

pf_moore · August 3, 2023, 9:47am

Oops! I’m sorry, I thought I’d responded previously to that effect. Too many posts, too easy to forget what I’ve responded to. My bad.

To be fair, I don’t have any say in whether anyone submits a PEP, but honestly, this is not what I was suggesting. My intention was that if someone wanted the functionality of PEP 722, but using TOML syntax, that would need to be a different PEP. But if you want my approval for a completely different “embedded pyproject.toml” proposal, then I’m sorry, but no, I don’t think that’s a good idea (either in general, or more specifically as a “competitor” to PEP 722). I’d rather just have Brett reject PEP 722 in isolation, because my reasons for rejecting the “embedded pyproject.toml” (as stated in the PEP) didn’t address the community feedback properly.

Speaking as someone who’s been in this situation, choosing between two “competing” PEPs is a horrible experience. There’s an expectation that you choose one or the other, which makes it very hard to consider the PEPs on their individual merits, and in particular, makes “reject both” a very unpleasant option - given that you know how much effort and passion people have put into a discussion that ends up in this situation. I really don’t want to put Brett, or the community, in that position if I can avoid it.

I’ll leave my objections to the “embedded pyproject.toml” approach for the PEP 722 “rejected items” section (which I’m hoping to get completed very soon - sorry for the delay!) But in much the same way that I was persuaded to stop referring to the dependencies in PEP 722 as “requirements”, because it implied a closer link to requirements files than was intended, I strongly recommend that if you want a TOML format for script dependencies, you avoid talking about it in terms of being related to pyproject.toml, because people will assume similarities that you don’t intend.

I think there’s still way too much uncertainty about the whole idea of pyproject.toml for anything other than projects being built as wheels (i.e., the use case the file was introduced for in PEPs 517 and 518).

Personally, if PEP 722 isn’t accepted (or maybe a variant that just differs in syntax), then I’d rather see nothing standardised at this point, in preference to rushing any sort of “complete solution”. Tools are still experimenting with the workflow for scripts, and if we try to lock down ideas around how people “should” write or share scripts, we’re bound to make mistakes because we don’t have enough information. That’s why PEP 722 is being so strict about defining nothing except a standard way of storing dependencies. I genuinely believe we don’t have enough experience with the use case yet to go any further than that. The only reason I even considered standardising this much was because Brett suggested it, as VS Code is looking for a way to store that data and they didn’t want to use a tool-specific format.

Long term, I certainly hope we come up with a clear and effective “official solution” for writing scripts with non-stdlib dependencies. But I don’t yet know what form that will take. The community has ignored that use case for a long time now, and we’re not going to solve it just based on a sudden burst of interest triggered by one PEP. So let’s take our time on designing tools and workflows, and stick to what we know is beneficial for now.

h-vetinari · August 3, 2023, 10:13am

It’s become wildly popular for essentially all configuration a python project might need, due to ease of extensibility & syntax.

Saying that it was only intended for wheels IMO misses the point of how broadly this has been embraced. I find it a stretch to believe that all this momentum will be (or even can be) undone.

What you consider beneficial (though a weighty opinion indeed) is not necessarily everyone’s point of view. I find another syntax (even a minimal one) strictly worse than reusing the toml-approach.

And we don’t have to rush to a complete solution, we can easily stipulate for now that any TOML-tables other than [dependencies] (or whatever $allowlist) MUST raise an error.

But that approach would have built-in extensibility (if/once we choose to enable it), rather than having to go through a multi-year long cycle to potentially, eventually deprecate the minimal syntax that PEP 722 introduces, once we “long term” come up with something better.

I think it’s highly unrealistic to come up with something more coherent, popular & extensible than the existing pyproject.toml approach, and from my POV, spending the very limited time of the overall Python packaging ecosystem on such an endeavour would be hubris.

We have a successful format, the doc-embedding has been pioneered successfully in cargo, it solves additional problems for single-file scripts (like requires-python) compared to PEP 722, and it does not require implementing or teaching additional tools or formats. In front of us, today, basically for free. How are we supposed to ever out-design that?

jeanas · August 3, 2023, 10:14am

I don’t know in what sense you mean “uncertainty”, but there certainly seems to be widely differing opinions on what pyproject.toml should be for in the first place.

Maybe this is a case where the recently proposed idea of a packaging council would help? With the PEP process, there is an SC or PEP delegate who decides in a way that is based on rough community consensus, but not necessarily full community consensus (there will always be a few people disagreeing). I don’t think “setting the direction for pyproject.toml” is something that can be really embodied in a PEP, so I don’t know how else we could “resolve” broad discussions like Projects that aren't meant to generate a wheel and `pyproject.toml`.

petersuter · August 3, 2023, 12:12pm

If there will be a proposal for embedding pyproject.toml, it should probably come much later as first a prototype tool should be implemented and tried if it’s even feasible.

What if it turns out it just doesn’t make sense to embed (most sections of) a pyproject.toml? You propose the other tables can remain forever as “MUST raise an error”? Then what’s the big advantage? Will that not be even more confusing?

None of the tools that use pyproject.toml can currently use an embedded one. What if they don’t (can’t / want to) support that? Will that not be even more confusing?

If a tool is required to extract the embedded file, that tool can also trivially parse the PEP 722 syntax and convert to pyproject.toml. So again there’s not really a big advantage.

On the other hand there’s not a big downside with starting now with the minimal PEP 722 approach which has been proven to work in multiple existing tools. And even if much later a full embedded TOML syntax is added, just support both forever.

ofek · August 3, 2023, 12:29pm

I would be quite interested in hearing from maintainers of other projects that would potentially have to implement support for single file scripts like Poetry, Dependabot, etc.

Perhaps it’s because I maintain such a tool so I see the pragmatic implementation and user workflows pretty clearly, but I would be willing to bet precisely zero of them would want to adopt a new format and would rather prefer the embedded pyproject.toml in order to reuse their existing code and also to not lose features like the ability to edit on behalf of the user.

I don’t know all the ways in which IDEs do syntax highlighting but in that case I would also assume detecting the embedded indicator would pretty easily be able to render as TOML.