Improving Python Language Reference (PLR) documentation

I decided that I’ll dedicate a bit of time each Wednesday to push this forward.
Last week I was on vacation, but I’ll be back tomorrow again. There’ll be another stream on YouTube (click for time in your timezone), and I’ll also stream to the Docs discord (if you join, your voice will be on the public recording).

The plan is to start consolidating Python’s actual grammar and the docs.

4 Likes

Generating just diagrams from the ground truth wouldn’t make sense if the text of the docs doesn’t match. So I’ll need to generate the text too.

I spent most of yesterday’s session learning Sphinx/docutils. I can now write a bare-bones extension that generates production lists (the grammar snippets).
The next step will be generating them from the data :‍)

It looks like rule names in the grammar file are local implementation details, technically they could be changed to the ones docs use (or vice versa).

1 Like

Cool. Is there a reason the docs use ::= instead of =, like in Python.gram? The rest of the notation seems very similar.

Also, beware that Python.gram sometimes uses lookaheads (e.g. `&;return’). Usually those are just optimizations, but a few of them are required to disambiguate things. This is an unfortunate side effect of using PEG instead lf a context-free (LR(k)) grammar.

The :== vs. : is a Sphinx thing. It has a dedicated directive type for these: it linkifies nonterminals but leaves special characters alone.
The source .rst file uses colons :slight_smile:

There are lookaheads, negative lookaheads, but also:
Cuts (~): are these optimizations, or necessary for correctness?
Forced tokens (&&): These look unimportant since the && is removed for the docs. But I haven’t seen a description of these (and didn’t delve into the code). Should it be mentioned in Python.gram’s comment, or is that comment mainly for the docs?

Anyway, I’m thinking I’ll hide lookaheads &c. in the initial implementation. If someone wants the precise grammar they really should look at the whole file, not piece it together from examples scattered around in prose. And the diagramming library will need tweaks to support lookaheads.
(Of course, if I do this the docs should say the snippets are approximations.)

Okay, let Sphinx be Sphinx. (Also I misremembered what python.gram uses – it uses :, not = – sorry.)

The main problem with PEG is that with a true context-free grammar, if a particular input can match two different rules, that’s an ambiguity and this is generally considered a bug in the grammar (though most pragmatic tools also have out-of-band ways to disambiguate common cases). In PEG, however, there is no ambiguity: PEG by definition says that the first rule that matches is what the PEG grammar defines. (There are some subtleties around how “first match” is interpreted too.)

I just looked over all 6 occurrences of “cut” (~) in python.gram, and none of them are involved in disambiguation – all of them occur in places where there is no other grammar rule (in current Python) that could match. Four of them lock in for ... in after the in keyword, two lock in alternate assignment operators – augmented assignment (+= etc.) and the walrus (:=).

Regarding &&, It was introduced here as an optimization and can definitely be ignored.

I think I remember there are a few positive or negative lookaheads that meaningfully affect disambiguation. Maybe Pablo or Lysandros can help out here. Agreed that this isn’t very important for a first cut.

1 Like

Thanks Guido! Will keep that in mind.


The next stream will be 2023-11-15T13:00:00Z on YouTube and Discord.

1 Like

I have a conflicting event today, so no stream, but the current stage of the project – adding directives to ReST files – isn’t that exciting anyway.

Example of current state (top is the existing hand-written grammar, bottom is taken from python.gram*):


It then needs formatting (e.g. [statements] rather than statements?) cross-linking to rules defined elsewhere, reorganization and simplification. But it should make a good PR on its own, leaving diagrams as the next step.

*⁾ I’ve also tried renaming the file rule to file_input, which the current docs use – mostly to see what a change like that would break. Nothing broke, tests still pass.

1 Like

Things came up; I’ll start today’s stream an hour later than usual: 2023-11-29T16:00:00Z on Discord & Youtube

2 Likes

Rather than plan around various end-of-year gatherings, I’m putting this on hold until January. See you in the new year :‍)

Meanwhile, here’s some current thinking:

If grammar docs are generated, then any grammar change will need a docs review to ensure the snippets still match the surrounding prose.
How to ensure that?
So far it looks like the best solution would be to put generated snippets directly in the .rst files, à la Argument Clinic.

I don’t like files that mix hand-written and auto-generated content. It makes things messy, confuses tools that want to ignore autogenerated content or track changed files, often needs ugly “start/end generated section” markers, etc…
But, here it would make changes in the grammar docs show up in review diffs, with surrounding prose as context. That is, IMO, worth the downsides.

3 Likes