PEP 722/723 decision

h-vetinari · October 26, 2023, 4:33am

One of the examples that @DavidCEllis gave doesn’t work with that algorithm, because it could conceivably appear in the TOML.

Note that it does work with plain regex, because that’s greedy and will keep going until the last # ///. That however leads to other issues (mentioned in the same comment), if such a line comes after what’s supposed to be the end of the embedded pyproject.toml.

If changing the comment marker is off the table (e.g. #!), then both of those issues could be solved by requiring that the embedded pyproject.toml starts and ends with # /// pyproject.

Intermezzo: I suggest we should allow trailing whitespace at the end of the reference regex, because it’d be a real pain for users who managed to accidentally get a space on the relevant line, to debug why their script doesn’t work.

In terms of the reference regex, that would mean changing the last component as follows^[1]

- REGEX = r'(?m)^# /// (?P<type>[a-zA-Z0-9-]+)$\s(?P<content>(^#(| .+)$\s)+)^# ///$'
+ REGEX = r'(?m)^# /// (?P<type>[a-zA-Z0-9-]+)$\s(?P<content>(^#(| .+)$\s)+)^# /// \1[ \t]*$'

This has the slight disadvantage that not all regex-engines support references to previous capture groups, but even before that change, popping the current reference regex into https://regex101.com/ shows that it doesn’t work on many other engines either^[2], so that doesn’t seem to have been a relevant issue so far^[3].

PS. A final regex-refinement would be to make all unnamed groups explicitly non-capturing (?:, which is compatible with all relevant engines), to avoid them showing up in the match result at all:

- REGEX = r'(?m)^# /// (?P<type>[a-zA-Z0-9-]+)$\s(?P<content>(^#(| .+)$\s)+)^# /// \1[ \t]*$'
+ REGEX = r'(?m)^# /// (?P<type>[a-zA-Z0-9-]+)$\s(?P<content>(?:^#(?:| .+)$\s)+)^# /// \1[ \t]*$'

we don’t want to use \s as we’ve switched on multi-line processing, and \s thus includes newlines. ↩︎
for some languages you need to manually escape / as \/, as the r'' string would do in Python ↩︎
also, engines without support for back-references (or named groups) can just simply hard-code pyproject in the regex, rather than leaving it variable as the PEP does with ?P<type>. ↩︎

DavidCEllis · October 26, 2023, 9:05am

Personally I’d concede that this specific case is probably fairly contrived and unlikely. If it did come up at least it would be fairly clear to work out why based on the rules.

I think changing the start/end markers is off the table at this point, but even if it wasn’t this just moves the issue to # /// pyproject having the same unexpected behaviour as # /// did before, except that it’s unclear if it’s intended to start or end a block. Also the original regex did work on some of the other engines if not all, while this change only works on the python engine. If anything I’d prefer the regex to be written to work on more engines rather than becoming more specified.

I would prefer to drop the closing marker entirely if the option was still open but as Brett stated earlier any changes here would need a new PEP:

ntessore · October 26, 2023, 9:39am

I’m not sure it’s a contrived example, since the [tool] table might contain arbitrary strings containing ///. ~~This breaks the current ducktools parser:~~ This seemingly reasonable config cannot currently be parsed with the implementation offered above: ^[1]

# /// pyproject
# [run]
# requires-python = ">=3.11"
# [tool.pep723lint]
# markers = """
# ///
# """
# ///

Edited to make it clear this is not a slight on the implementation ↩︎

DavidCEllis · October 26, 2023, 9:50am

I would disagree that that example ‘breaks the parser’. This example is invalid under the parsing rules as # /// is defined as closing the block and so it would have to be expressed in a different manner.

edit: But thanks for testing it. I’d also say that you’d express it as:

[tool.pep723lint]
markers = "# ///"

But in this specific case the PEP rules insist on # /// being the closing marker so this should never be necessary.

ntessore · October 26, 2023, 10:16am

This example was only to illustrate the point that the tool table, which the PEP explicitly permits, might contain arbitrary multiline strings, and there may be good reasons for such strings to start with three slashes. Which to me looks like a point in favour of requiring the end of the PEP723 block to coincide with the end of the comment block.

DavidCEllis · October 26, 2023, 10:39am

I agree to an extent, but I believe the # /// block was chosen as it is not a commonly used block - tools that would require such a string would likely be PEP-723 aware or would be made PEP-723 aware and know that this would be invalid and have to work around it. (That said I agree that I would prefer it to end with the comment block as it both simplifies the implementation and also resolves some of the current ambiguities. However, my understanding is that the start/end block values are final at this point.)

pf_moore · October 26, 2023, 1:54pm

Speaking from my experience of trying this sort of metadata embedding for PEP 722, I think you’re looking at it from the wrong perspective. Expecting to be able to embed completely arbitrary data in a Python file is always going to fail, because that’s essentially what “arbitrary” means.

The sensible approach to take is to define a set of embedding rules that are very clear and precise, so understanding them and writing constructs that follow them is easy. Once you have that, you should know precisely what values can’t be expressed in an embedded block, and you can ensure that you have workarounds for those cases. And ideally those cases are uncommon enough that they have a minimal impact on normal use.

PEP 723 does this, with the exception that we’re still working on making sure the rules are sufficiently “clear and precise”

And as @DavidCEllis notes, the problem cases in TOML have easy workarounds, so no-one should ever need to write something that can’t be embedded. If we add other block types, we can make sure the same is true of them when we add them (“use TOML” is a nice simple answer that should suffice for anything I can imagine).

brettcannon · October 26, 2023, 9:41pm

I will also back this up and say that I am not concerned about the syntactic limitation that you can’t put the literal # /// in the middle of your TOML when it happens to align to the first column. If you want that, move to pyproject.toml.

ncoghlan · October 28, 2023, 12:02am

It looks like things have settled down to this anyway, I just wanted to clarify a detail of the PEP process: while PEP 1 documents functional changes to Accepted PEPs as “the Steering Council (or delegate) decides”, the historical precedent has been that if implementation reveals problems that weren’t fully considered prior to acceptance, then clarifications and fixes for those issues will typically be accepted. Competing PEPs just add the requirement that the proposed changes be reviewed to see if they alter the rationale for choosing the accepted PEP as the path forward.

That means the clarification process for the PEP 723 regex isn’t unique to this PEP, it aligns with past practice in similar situations: ambiguity was identified, resolution proposed, and the PEP delegate is reviewing it (and has been clear it doesn’t affect the chosen PEP).

(Note: this comment only covers the marker regex clarification, the conditional acceptance is a separate concern)

ofek · October 29, 2023, 4:27pm

I will be updating the PEP in a few minutes but I wanted to address two things:

That’s not exactly correct. Forward slashes have no significance in regular expressions but rather may be used as a delimiter in determining the start/end of patterns in for example the sed command and JavaScript. The regular expression is compatible with everything right now.

Nicolas Tessore:

I’m not sure it’s a contrived example, since the [tool] table might contain arbitrary strings containing ///. ~~This breaks the current ducktools parser:~~ This seemingly reasonable config cannot currently be parsed with the implementation offered above: [1]
# /// pyproject
# [run]
# requires-python = ">=3.11"
# [tool.pep723lint]
# markers = """
# ///
# """
# ///

The reference implementation does parse this properly because the regular expression is greedy and will keep going until the final end delimiter without the next line being a comment. I will update the text to describe this semantic.

ofek · October 29, 2023, 5:48pm

The PR has been updated:

DavidCEllis · October 30, 2023, 10:45am

From the preview:

Precedence for an ending line # /// is given when the next line is not a comment or EOF is encountered.

With the regex this is also the case when the next line is a comment, but the second character is not a space or newline, for example the line ## will also force the block to close in the same way.

brettcannon · November 1, 2023, 6:16pm

FYI PEP 723: Mark as Provisional by ofek · Pull Request #3505 · python/peps · GitHub LGTM, so I plan to merge it on Friday unless any other comments are left on it (the PEP is still going to be provisional if we need to tweak it, but I would just rather avoid multiple PRs if possible).

pradyunsg · November 5, 2023, 7:42pm

I’m gonna poke on this question again, now that the dust around the regex semantics has settled & now that there’s a pipx PR for PEP 723 as well.

Are we OK with rolling out a conditionally-accepted/provisional PEP to users, before we have clarity that the condition^[1] is something we’re all OK with?

From experience gained through PEP 411 – Provisional packages in the Python standard library | peps.python.org, we know that people will rely on this and changes to it will need to be treated as breaking changes…

[this PEP] has also not helped prevent people from relying too heavily on provisional modules, such that changes can still cause significant breakage in the community.

i.e. [run] support in pyproject.toml files, which will require clear semantics and all that specified for them. ↩︎

pradyunsg · November 5, 2023, 7:57pm

I guess this is a related question to the one I’ve asked above… If we do move forward and implement this PEP in tooling and the future PEP for [run] in pyproject.toml goes nowhere, do we expect that we’ll rip out the [run] support from script runners and/or mark PEP 723 as rejected?

I think the two questions I’ve just poked (in this and the above post) are kinda important to have clear answers for, to avoid backing ourselves into an uncomfortable corner here in terms of user-facing functionality rollouts and to avoid erosion of end-user trust in standards-backed functionality which would be… suboptimal.

BrenBarn · November 5, 2023, 9:00pm

It’s something I’ve been wondering about as well, but I’m not sure there’s really a solution.

Whether any standard specifies something or not, tools can still implement anything they want. This is what we already see to varying degrees with conda and poetry: people are using these tools (in part) precisely because their authors decided to just go ahead and add functionality that there wasn’t a standard for.

And this isn’t necessarily an entirely bad thing. Several times in various packaging discussions, different people have mentioned that (in at least some cases) standards can be written to formalize existing practice after it’s already arisen in third-party tools.^[1]

In my view there has already been significant erosion in end-user trust in standards-backed functionality, simply because the standards-backed functionality is insufficient for what users want to do, and because it is perceived as a confusing jumble (in some cases due to layering on top of legacy mechanisms). As long as the tools provided with Python don’t meet users’ needs, third-party tools will develop behaviors that aren’t standards-backed. So I don’t think there’s much use in worrying about whether tools will get out in front of standards — because there’s no question that they have, and do, and will continue to do so.

The only thing I think that PyPA or the SC or other official Python bodies can realistically do is make official recommendations and warnings telling people to use or not use certain tools. So if Tool X is doing something outside a standard and we don’t think that’s a good idea, we can add something to some kind of “list of PyPA recommendations” that says “we don’t recommend this tool”. Or if Tool Y is doing a similar thing but in a way that is standards compliant, we can say “we recommend this tool because it follows the standards”. And I guess maybe there can be official pressure from PyPA or the SC in the form of telling a package author “Please don’t claim that your package implements this standard when it doesn’t” or “Please don’t claim that this is a standard when it isn’t”, although for those most part I don’t think tools are making any deceptive claims in that regard.^[2]

So with regard to this PEP, I wouldn’t see it as a huge disaster if Hatch were to just implement this functionality regardless of PEP approval. And if pipx implemented a different version (like PEP 722 instead), oh well. It wouldn’t be the first time tools diverged in behavior. It won’t stop us from deciding on a standard later. It will mean we have a potpourri of tools providing similar but not identical behavior, thus creating a confusing landscape for users to navigate, but that’s essentially the status quo anyway.

I do have some reservations about this, especially in that I think when formalizing things it’s often necessary to explicitly clean things up, trim off loose ends, and thus officially un-support certain things that were previously unofficially supported, and that sometimes upsets some people. ↩︎
Okay, there is one other thing we could theoretically do, which is actually provide more powerful tools with Python so people don’t feel the need to reach for noncompliant third-party tools. But that seems out of reach for now. . . ↩︎

pf_moore · November 5, 2023, 9:14pm

Personally, I’m very definitely not OK with rolling out PEP 723 before it’s finally accepted. @brettcannon has already said that if [run] support doesn’t get accepted into pyproject.toml then PEP 723 will be rejected as a consequence. The experience of tools implementing PEP 582 (Python local packaged directory) before it was accepted is IMO good evidence that we shouldn’t implement PEPs when they are in this state.

Not all provisional acceptances fall into this category, though - for example, PEP 708 (Dependency confusion mitigation) is provisional until it is implemented in two indexes (one of which is PyPI) and in pip. That clearly must be implemented before it can be made final, but my point in making it provisional was that I didn’t want to see more accepted PEPs that don’t get implemented^[1].

One omission in that provisional acceptance is that I didn’t set up any sort of process for saying “this has gone on too long with no activity, so the PEP is going to be rejected”. I need to follow up on that. ↩︎

pf_moore · November 6, 2023, 10:04am

One thought for @brettcannon, prompted because this reminded me about the situation with PEP 708 - do you have a view on how long it’s OK for PEP 723 to remain in provisional status? Particularly given that the “projects that are not intended to produce a wheel” discussion seems to be quite a long way from consensus^[1].

as well as drifting away from the idea of a [run] section ↩︎

ofek · November 6, 2023, 3:07pm

I don’t think that’s such a large issue but rather the issue is that no one has came forward to take ownership of that PEP/drive that effort and I don’t have the time to do so right now unfortunately.

henryiii · November 6, 2023, 3:16pm

One note for pipx: it already implemented PEP 722; the PR is just changing that existing, not-yet-released feature to follow PEP 723.

Also, PEP 582 was never provisionally accepted, so I’m not sure that counts as a valid comparison.

Is someone working on the PEP for run? I think it could be extremely useful for tools like Poetry and PDM.

run.requires-python finally fixes the (IMO glaring) issue with making a lock file and using the metadata to decide the range to lock for. For example, you could set project.python-requires=">=3.8" and run.python-requires=">=3.8,<3.12" and that would set the metadata to >=3.8, and would make the locking solve work for 3.8-3.12. Currently, you have to either add the cap to the metadata too (forcing this issue on all downstream users), or you have to cap your dependency that does this, forcing the next Python update to break your users. This gives a nice solution to this mess by allowing these two separate concepts (metadata and locking solver version ranges) to be separate values.

run.dependencies is what most tools call “dev-dependencies” now. Running pdm/poetry install could automatically install these, but they are not part of the package metadata. IMO, run.optional-dependencies would be be useful too.

Finally, a lot of projects aren’t intended to ever be installed, but are only a poetry/PDM project. The current solution is to add a dummy name, dummy version, and add a Private :: Do Not Upload classifier. These tools could allow a user to skip project/build-backend tables and just use the run table, since name/version/etc. are only needed when making a redistributable package.

@ofek, would this be useful for hatch too?