PEP 722/723 decision

brettcannon · October 24, 2023, 11:03pm

Yes and yes.

Probably.

I personally trust you to not let your personal preference for PEP 722 to get in the way. But if you want to be PEP delegate for a “[run] in pyproject.toml” then I can do that as well.

Yep, that’s my feeling. If I’m wrong then the condition for PEP 723’s acceptance is made invalid and the acceptance gets nullified. As the rationale for PEP 723 says, " the user can simply copy and paste the metadata into a pyproject.toml file and continue working without having to learn a new format", and if that isn’t true that changes its appeal. PEP 723 also says, " Any future PEPs that define additional fields for the [run] table when used in a pyproject.toml file MUST include the aforementioned fields exactly as specified", which to me suggests that [run] is expected to be in pyproject.toml. I don’t think we want to end up with a weird dichotomy between # /// pyproject and pyproject.toml which will simply confuse users.

pradyunsg · October 25, 2023, 8:06am

How do we want the tooling to handle this then, while the condition is in an undecided state? Wait? Roll out a provisional-ish implementation (with all the caveats associated with trying to change it later)?

My reading of the situation is that @ofek is eager to implement and ship PEP 723 in hatch now-ish to end users, whereas settling on an answer for the “[run] in a pyproject.toml file” story is something that should happen/resolve before accepting PEP 723 (i.e. the condition in a conditional acceptance).

If hatch does ship this, it means we’d end up with one-more-draft-PEP-implemented-and-exposed in hatch (PEP 639 is the other one) which, well, goes against the spirit of what going through the PEP process for these things is meant to do.

pf_moore · October 25, 2023, 9:18am

Agreed. Also, pipx currently has support for the precursor syntax to PEP 722 in an unreleased version. I would like to see PEP 723 support added before the now-superseded syntax gets released.

DavidCEllis · October 25, 2023, 11:36am

Ok, just to clarify a few points from the text of the specification:

First from this line in the PEP:

When there are multiple comment blocks of the same TYPE defined, tools MUST produce an error.

Is the intention then that to be compliant all parsers MUST parse the entire file and must not return early?

My implementation currently does return early, but this is a 1 line change so not a huge issue, I just want to know that the intention here was to force parsing of the entire file^[1].

The text description does not give any guidelines on what a TYPE consists of other than being pyproject explicitly. The regex picks out only a single word [a-zA-Z0-9-] match (alphanumeric plus -). Should this be considered an implementation detail or should TYPE be more rigorously defined.

Note that I intend to at least warn in the case somebody has written this and doesn’t understand why the block doesn’t work:

# /// pyproject.toml
# ...
# ///

From the discussion, here are examples and what I understand to be the correct behaviour:

Long block of examples

No closing line

example:

# /// pyproject
# [run]
# dependencies = ["requests"]

This block must not be parsed. It is acceptable to either ignore it or raise an exception as it is outside of the scope of the specification to state what to do when encountering something that is not a valid block.

Multiple closing lines

example:

# /// pyproject
# [run]
# dependencies = ["requests"]
# ///
# Additional comment
# ///

This must be parsed as:

[run]
dependencies = ["requests"]

Parsers must stop at the first # ///.

Multiple opening lines

# /// pyproject
# [run]
# dependencies = ["requests"]
# /// pyproject
# requires-python = ">=3.11"
# ///

This must be parsed as:

[run]
dependencies = ["requests"]
/// pyproject
requires-python = ">3.11"

This will cause an error when passed to the TOML parser at which point additional information can be given if available, but it is not considered an error of the basic block parsing.

Please correct me if I’ve understood any of this incorrectly. I think the behaviour in the case of ‘nested’ blocks intentional or otherwise will be covered by these examples.

I did a performance test by adding a block to turtle.py and per run it’s about 0.3ms for early return vs 1ms for a full read with my implementation (the regex full parse from the updated PEP PR is very slightly slower still). For context import re takes 8ms on the same machine. ↩︎

petersuter · October 25, 2023, 11:43am

Maybe a repository of “conformance tests”, containing a set of example scripts and parse results (e.g. as JSON), could be created and linked it in the PEP?

DavidCEllis · October 25, 2023, 11:52am

I was planning on writing something along those lines for my tests, although I was going to do it in the form of python code along the lines of:

Slightly long example

# /// pyproject
# [run]
# requires-python = ">=3.11"
# dependencies = [
#   "requests<3",
#   "rich",
# ]
# ///

import textwrap

output = {
    "pyproject": textwrap.dedent(
        """
        [run]
        requires-python = ">=3.11"
        dependencies = [
          "requests<3",
          "rich",
        ]
        """
    )
}

So testing would consist of importing the script, parsing __file__ and comparing against script.output.

The output could be exported as JSON for each file for other language implementations.

Note: I haven’t tested this one yet.

pitrou · October 25, 2023, 12:40pm

I’ll go further and suggest that a reference implementation is created that other tools can freely vendor or depend on. The conformance tests can be part of the unit tests for that implementation.

(yes, the regex is supposed to play the role of a reference implementation, but evidently things are not so simple)

pf_moore · October 25, 2023, 1:36pm

As there is a strong likelihood that tools will be written in languages other than Python (Rust and Javascript are known cases), having the regex be accurate is a huge benefit. If for some reason that’s genuinely not possible, then having a clear, precise reference implementation is indeed the next best thing. And I agree that conformance tests would be a useful addition.

To be fair, though, I’m typically very strict on precision when reviewing PEPs - Brett may be inclined to be a little more lenient. I think the important thing here is that we have an assurance that all consumers of this data will parse things the same, and that the handling of errors and edge cases is consistent.

holdenweb · October 25, 2023, 4:13pm

I’d like to think that ultimately the format might be extended so that any line beginning `# /// followed by a name is permitted. This would allow further sections with their own semantics to be added at a later date, should a convincing case be made for it, as the opening of a second section would (or could be made to) close the first.

ofek · October 25, 2023, 4:14pm

I made that statement before I realized that another PEP is required so I won’t be implementing that until that one is accepted I think. Then again, people truly want such a feature so maybe I should implement even if the other PEP is rejected, who knows…

I really want to get this sorted by Friday but I’ve gotten conflicting feedback in comments. Can someone please, like I asked before, condense precisely what I have to change to a few bullet points? My understanding at this point is to make the regular expression take precedence over the text specification and also to prescribe an error for unclosed blocks, is that correct?

edit: sorry for requesting such clarification but if I don’t have assistance with that I simply don’t have the time this week to implement the suggestions and also read everything here

pf_moore · October 25, 2023, 4:54pm

I’ll do my best.

I believe what Brett is asking for is that an unclosed block is an error, and nested blocks are illegal. So once you get # /// name you must get a # /// before you get another # /// name. And the last marker in a file must be # ///. Any other combination is illegal. That’s the semantics that is wanted.
The regex needs to be updated to parse that semantics.
The reference implementation needs to be updated to parse that semantics (which should be automatic as it uses the regex).
The text needs to be rewritten to describe that semantic, precisely enough that someone who wants to write a procedural implementation of the parser without using the regex directly doesn’t have to reverse-engineer the regex.

Is that enough? If you want someone to fix the regex for you, then I’m sorry I can’t help there. I’m not good at writing complex regular expressions, that’s why I expressed PEP 722 procedurally. If you want precise text, then again, I’m sorry - I know how hard it is to write precise text for things like this, as I did it for PEP 722, but I really don’t have the time or inclination to do the same for PEP 723. I will, however, proof-read whatever you write, if you need that.

I think it would be nice if someone wrote the proposed conformance suite, but I don’t think that should be on you.

DavidCEllis · October 25, 2023, 5:03pm

I’ve started on this, but my understanding - and what I mentioned in my earlier post - doesn’t quite match what you’ve suggested here.

My understanding was:

Once you get a # /// TYPE any following # /// name will just be considered a part of the TYPE block as will everything that follows unless either:

You reach a plain # /// line at which point the block ends immediately and is returned/added
You reach a non-comment line/EOF at which point the block ends and is ignored

I think this can be implemented in the regex by the change suggested earlier:
(?m)^# /// (?P<type>[a-zA-Z0-9-]+)$\s(?P<content>(^#(| .*)$\s)+?)^# ///$

Edit: My test examples can be found here.

pf_moore · October 25, 2023, 6:07pm

OK, so I guess add a further bullet point:

@brettcannon needs to confirm which of those two behaviours he wants (or if he doesn’t care, then @ofek as PEP author needs to decide).

Nice. As I say, I’m not enough of a regex expert to comment, so I’ll take your word on it.

While I’m in standards-defining nitpick mode, some questions (to which “the spec is fine” is a perfectly acceptable answer, I’m absolutely not asking for any changes unless someone else thinks it’s worthwhile):

The regex requires explicit single space characters between # and ///, and between /// and the name. Is that right, or should arbitrary whitespace be allowed? The text isn’t particularly explicit here.
There are two places in the regex that have $\s. Does that not mean that the following line must start with a space? Doesn’t that clash with the ^# requiring a # at the start of the line? It’s from the original regex in the PEP, so maybe I’m missing something? It looks like it might be needed to match the newline, if so wouldn’t \n be more explicit?
^#(| .*)$ doesn’t match #<TAB>something. Should the space be \s here, as disallowing tabs seems a bit over-strict? The text says “MUST be a space” - maybe it should be explicit that it means a space character, not just whitespace? (A careless parser might just do .strip(), for example).

ofek · October 25, 2023, 6:12pm

I will respond a bit later today but just FYI I prefer David’s interpretation because that is not just also what I already had in mind but due to the fact that we can include that in the regular expression. If we want error checking like for unclosed blocks then the reference implementation will be much more complex.

ngie · October 25, 2023, 6:17pm

Hello!

I’m not sure how close PEP-723 is to potential adoption, but if possible could the format be Markdown-based instead of using “/// pyproject”, similar to what Cargo does in rust code?

Allowing Markdown could help with existing third-party tooling and syntax checking.

Also, what about expressing the requirements in optimized byte code (.pyo)? Memory serves me correctly hash-style comments are removed in -O mode.

DavidCEllis · October 25, 2023, 6:20pm

Me either, I had to read the regex101 description to check it.

I’ll make sure I quote that in the commit where I fix this (if needed - there’d have to be a compliance test).

pf_moore · October 25, 2023, 6:20pm

It’s too late now, PEP 723 has been (provisionally) accepted, so no major changes to the syntax are allowed. Markdown format was discussed and rejected before the PEP was finalised, I believe, if you want to go and hunt for the reasons.

ngie · October 25, 2023, 6:22pm

Thank you for the reply @pf_moore !
Was that discussion added to the PEP, or should I look through the archives for more information?

brettcannon · October 25, 2023, 8:24pm

This is what I meant since it can be expressed in a regex directly and is the simplest to implement (i.e., find a line that is # /// pyproject then keep scanning until you find # /// as a line; everything in between is the data).

brettcannon · October 25, 2023, 8:27pm

It looks like it’s missing.

You can find it in the discussions, but basically a Markdown block seemed out of place since Python doesn’t lean into Markdown like Rust does for rustdoc. We also didn’t want to add custom syntax for this which Rust is doing by letting you embed a code block directly in code without comment markers.