Improving Python Language Reference (PLR) documentation

encukou · November 7, 2023, 8:49pm

I decided that I’ll dedicate a bit of time each Wednesday to push this forward.
Last week I was on vacation, but I’ll be back tomorrow again. There’ll be another stream on YouTube (click for time in your timezone), and I’ll also stream to the Docs discord (if you join, your voice will be on the public recording).

The plan is to start consolidating Python’s actual grammar and the docs.

encukou · November 9, 2023, 7:30am

Generating just diagrams from the ground truth wouldn’t make sense if the text of the docs doesn’t match. So I’ll need to generate the text too.

I spent most of yesterday’s session learning Sphinx/docutils. I can now write a bare-bones extension that generates production lists (the grammar snippets).
The next step will be generating them from the data :‍)

It looks like rule names in the grammar file are local implementation details, technically they could be changed to the ones docs use (or vice versa).

guido · November 9, 2023, 6:32pm

Cool. Is there a reason the docs use ::= instead of =, like in Python.gram? The rest of the notation seems very similar.

Also, beware that Python.gram sometimes uses lookaheads (e.g. `&;return’). Usually those are just optimizations, but a few of them are required to disambiguate things. This is an unfortunate side effect of using PEG instead lf a context-free (LR(k)) grammar.

encukou · November 9, 2023, 9:34pm

The :== vs. : is a Sphinx thing. It has a dedicated directive type for these: it linkifies nonterminals but leaves special characters alone.
The source .rst file uses colons

There are lookaheads, negative lookaheads, but also:
Cuts (~): are these optimizations, or necessary for correctness?
Forced tokens (&&): These look unimportant since the && is removed for the docs. But I haven’t seen a description of these (and didn’t delve into the code). Should it be mentioned in Python.gram’s comment, or is that comment mainly for the docs?

Anyway, I’m thinking I’ll hide lookaheads &c. in the initial implementation. If someone wants the precise grammar they really should look at the whole file, not piece it together from examples scattered around in prose. And the diagramming library will need tweaks to support lookaheads.
(Of course, if I do this the docs should say the snippets are approximations.)

guido · November 9, 2023, 10:26pm

Okay, let Sphinx be Sphinx. (Also I misremembered what python.gram uses – it uses :, not = – sorry.)

The main problem with PEG is that with a true context-free grammar, if a particular input can match two different rules, that’s an ambiguity and this is generally considered a bug in the grammar (though most pragmatic tools also have out-of-band ways to disambiguate common cases). In PEG, however, there is no ambiguity: PEG by definition says that the first rule that matches is what the PEG grammar defines. (There are some subtleties around how “first match” is interpreted too.)

I just looked over all 6 occurrences of “cut” (~) in python.gram, and none of them are involved in disambiguation – all of them occur in places where there is no other grammar rule (in current Python) that could match. Four of them lock in for ... in after the in keyword, two lock in alternate assignment operators – augmented assignment (+= etc.) and the walrus (:=).

Regarding &&, It was introduced here as an optimization and can definitely be ignored.

I think I remember there are a few positive or negative lookaheads that meaningfully affect disambiguation. Maybe Pablo or Lysandros can help out here. Agreed that this isn’t very important for a first cut.

encukou · November 13, 2023, 10:18am

Thanks Guido! Will keep that in mind.

The next stream will be 2023-11-15T13:00:00Z on YouTube and Discord.

encukou · November 22, 2023, 8:25am

I have a conflicting event today, so no stream, but the current stage of the project – adding directives to ReST files – isn’t that exciting anyway.

Example of current state (top is the existing hand-written grammar, bottom is taken from python.gram*):

It then needs formatting (e.g. [statements] rather than statements?) cross-linking to rules defined elsewhere, reorganization and simplification. But it should make a good PR on its own, leaving diagrams as the next step.

*⁾ I’ve also tried renaming the file rule to file_input, which the current docs use – mostly to see what a change like that would break. Nothing broke, tests still pass.

encukou · November 29, 2023, 9:31am

Things came up; I’ll start today’s stream an hour later than usual: 2023-11-29T16:00:00Z on Discord & Youtube

encukou · December 11, 2023, 7:41am

Rather than plan around various end-of-year gatherings, I’m putting this on hold until January. See you in the new year :‍)

Meanwhile, here’s some current thinking:

If grammar docs are generated, then any grammar change will need a docs review to ensure the snippets still match the surrounding prose.
How to ensure that?
So far it looks like the best solution would be to put generated snippets directly in the .rst files, à la Argument Clinic.

I don’t like files that mix hand-written and auto-generated content. It makes things messy, confuses tools that want to ignore autogenerated content or track changed files, often needs ugly “start/end generated section” markers, etc…
But, here it would make changes in the grammar docs show up in review diffs, with surrounding prose as context. That is, IMO, worth the downsides.

Topic		Replies	Views
Railroad diagrams for Python Grammar Committers	2	3212	March 13, 2019
PEP 701 – Syntactic formalization of f-strings PEPs	113	13861	February 16, 2024
Preparing for new Python parsing Ideas	119	11986	September 23, 2019
Proposed Overhaul of __main__.py Documentation (Doc/library/__main__.rst) Documentation	2	808	June 30, 2021
Possibilities for improve pipelining syntax with new PEG parser Ideas	17	2277	June 11, 2023

Improving Python Language Reference (PLR) documentation

Related Topics