PEP 722/723 decision

DavidCEllis · October 22, 2023, 8:57pm

I’ve modified my implementation so now it will only report ‘nested’ blocks as a possible cause when toml parsing fails:

Example and error

from ducktools.pep723parser import PEP723Parser

src = """
# /// pyproject
# [run]
# requires-python = ">=3.11"
# /// pyproject
# [run]
# dependencies = ["ducktools-lazyimporter>=0.1.1"]
# ///
"""
parser = PEP723Parser.from_string(src)

print(parser.pyproject_raw)  # returns the full block
print(parser.pyproject_toml)  # TOMLDecodeError

Error example:

tomli.TOMLDecodeError: Invalid statement (at line 3, column 1); Possible PEP723 Errors: New 'pyproject' block encountered before block 'pyproject' closed.

petersuter · October 22, 2023, 9:49pm

run.dependencies = ["""
///"""]

I think this is valid TOML (confirmed by https://www.toml-lint.com/).

DavidCEllis · October 22, 2023, 10:46pm

Or if you want to give me a headache…

# /// pyproject
# run.dependencies = ["""
# ///
# """]
# ///

Which made me realise that the regex parses to the final # /// in any comment block as it grabs the whole block here.

This includes examples like this:

# /// pyproject
# [run]
# requires-python = ">=3.11"
# ///
#
# /// newblock
# <content>
# ///

Which is parsed as one pyproject block by the regex. Is it worth mentioning that a non-comment line is required between/at the end of blocks in the text spec if this is intended?

h-vetinari · October 22, 2023, 11:14pm

I’m glad how the decision process turned out (and not just the result!).

W.r.t. the parsing complexity, I think it would be good to keep it as simple as possible. So before we climb down the rabbit hole of fixing all possible corner-cases with ///, I just wanted to recall that there’s some prior art by rust (also mentioned in the PEP) which uses a special comment (//! rather than the normal //).

Translating that to Python and keeping everything else from PEP 723, we’d have:

#! /// pyproject
#! [run]
#! requires-python = ">=3.11"
#! dependencies = [
#!   "requests<3",
#!   "rich",
#! ]
#! ///

That would make it unambiguous^[1] as well as dead-simple^[2]. It’d also be trivial to adapt the reference regex.

take only (consecutive) lines starting with #! and then strip off the first and last line with /// ↩︎
e.g. the occurrence of other # /// lines in the same comment would be sidestepped completely ↩︎

ofek · October 22, 2023, 11:19pm

That is not an option because of linting IIRC from discussions.

Jelle · October 22, 2023, 11:28pm

For what it’s worth, Black doesn’t insert a space after the # in @h-vetinari’s proposed syntax. I think that’s because of some other tool that treated #! comments specially but I don’t recall the details.

Other linters or autoformatters might still complain, of course.

h-vetinari · October 22, 2023, 11:34pm

Isn’t that putting the cart before the horse? Python defines what the syntax is, then linters help enforce it. If Python introduces something that linters currently complain about, the linters have to change. I get that it’s annoying to ensure the linters are updated everywhere, but I don’t think we shouldn’t pessimize the design because of something like that.

pf_moore · October 22, 2023, 11:48pm

As the author of the rejected proposal, I would object very strongly if substantive changes were allowed to PEP 723 after the pronouncement has been made. And as a matter of process, I think that any such changes would need a new PEP - they are emphatically not “textual clarifications” at this point.

At the moment, I consider @brettcannon to have authority over what’s allowable here - let’s wait for his input before proposing any changes to the PEP.

ofek · October 22, 2023, 11:48pm

I take the view of Paul (if I understand the last comment correctly) that nothing needs to be done here.

If changing the grammar of Python was possible (like they are doing with Rust) then I probably would have come up with a slightly better design proposal. That won’t happen and I won’t be making changes to the content of the PEP other than to make things more explicit if requested.

h-vetinari · October 23, 2023, 12:34am

I don’t see how that would require changing the grammar – #! is legal today, we’d just imbue it with additional meaning.

In any case, if people prefer PEP 723 as-is, that’s fine of course. Just thought I’d recall an easy^[1] escape-hatch for the parsing issues that were brought up.

technically, not procedurally… ↩︎

jamestwebber · October 23, 2023, 1:35am

Is there any reason anyone would use # /// for anything else, or include it in a pyproject.toml at some point? This seems like a pathological example and specifying “between # /// pyproject and the next # ///” would be sufficient to cover all realistic uses of this PEP.

edit: I think this change would satisfy that description:

- REGEX = r'(?m)^# /// (?P<type>[a-zA-Z0-9-]+)$\s(?P<content>(^#(| .+)$\s)+)^# ///$'
+ REGEX = r'(?m)^# /// (?P<type>[a-zA-Z0-9-]+)$\s(?P<content>(^#(| [^(///)]+)$\s)+)^# ///$'

edit: or just making that last + into a +?. I thought I had tested that and found it lacking but it seems to work fine.

ofek · October 23, 2023, 2:31am

I would prefer to not change anything please if that’s okay.

jamestwebber · October 23, 2023, 3:50am

I agree! To me this is just useful debugging before the PEP is finalized, to make sure the definition and implementation are unambiguous. The spec as written is fine under the “we’re all reasonable adults” assumption. But making it a little more strict would probably save implementers a little bit of time with corner cases.

petersuter · October 23, 2023, 6:25am

I guess because finditer is used, which iterates of “non-overlapping matches”.
So nested blocks are not possible. This is just one big block:

# /// pyproject
# [run]
# nested="""
# /// pyproject
# [run]
# inner=1
# ///
# """
# ///

I agree this should not trigger even more changes to the PEP now. (Beyond fixing the broken code regarding AttributeError: 're.Match' object has no attribute 'replace'.)

As long as I can write a simple reminder in the script to use that tool, maybe:

# Script Dependencies:
#    pep723

pitrou · October 23, 2023, 8:17am

Well, this is certainly an amusing case of forgetting the good old wisdom about regular expressions.

barry-scott · October 23, 2023, 8:46am

In the IETF world an RFC is not accepted until there that successful implementations, I do not recall if 1 is sufficient.

@DavidCEllis’s work on do that is obviously valuable and if that leads to changes in the PEP text or a SC reconsideration then so be it, but better that that work is done and before finalising surely?

pf_moore · October 23, 2023, 10:01am

No, because I’d expect at that stage that PEP 722 would be open for reconsideration. It’s not a fair choice if the syntax that is finally accepted is not the one that PEP 722 was rejected in favour of.

ssweber · October 23, 2023, 12:02pm

Yeah I get that. I think what’s been lost is the free-form simplicity of its original use-case, script-running.

For instance, fades has (apparently) forever allowed simple import rich # fades

No tooling needed for that.

Then Paul independently implements similar format in pipx and wants to formalize it, in a format that seemed like it could gain support amongst script runners:

# Script Dependencies:
# rich

I’d just need to remember “Script Dependencies:”

And now that 723 is picked. Ill need to do:

# /// pyproject
# [run]
# dependencies = [
# “rich”
# ]
# ///

(did I get it right? ) hopefully that’s helpful. I’ll stop pressing the point now.

pf_moore · October 23, 2023, 1:16pm

Yes, that’s correct. Obviously, I agree with your view that PEP 722 provided a simpler approach, and that “tools will fill in the syntax for you” is not the right trade-off, but the PEP process is about allowing us to reach a decision when people have different opinions, and that’s what’s happened here. So I don’t think we should be going over old arguments at this point - PEP 723 is Python’s syntax for declaring dependencies in scripts, and we need to look at how we can get the ecosystem supporting it.

To that end, I’d rather see people working on helping pipx and pip-run support the format, and working out how we handle the transition. And yes, adding PEP 723 support to new tools like hatch, but to complement existing tools, not replace them (we don’t want a split in the ecosystem between “old” and “new” approaches). Working out how tools can provide user-friendly support for malformed input is another example of how we can move forward - not “the spec doesn’t support this, let’s change the spec”, but “this is invalid, how do we inform the user what’s wrong in a helpful manner?”

Hopefully, we’ll now see the supporters of PEP 723 demonstrating why it was the right choice, and we can move on to making more improvements for people wanting to share and reuse simple scripts without needing to build a full package or installer every time.

DavidCEllis · October 23, 2023, 1:35pm

While in some part that’s what I’m looking at, there’s also an element of “the text of the spec is ambiguous on this, can the intention be made clear” and not agreeing on “just do what the regex does” as acceptable given that the text is supposed to take precedence.

The issue that I’d like to see resolved is this:

The spec states:

Any Python script may have top-level comment blocks that start with the line # /// TYPE where TYPE determines how to process the content, and ends with the line # /// .

Ambiguous example:

# /// pyproject
# [run]
# requires-python = ">=3.11"
# ///
#
# <Additional comments>
#
# /// newblock
# <content>
# ///

From the text, is this two blocks or one? (a pyproject and a newblock block or one large pyproject block.)

In both cases the block(s) ‘end with’ # /// as this is a description of the contents of the blocks. What is unclear is if encountering a # /// line should immediately cause the block to terminate (in other words, be lazy instead of greedy). I would argue that it should, but the text is unclear and the regex treats this as one block.