I have to quote below section about BNF to address my point:
The descriptions of lexical analysis and syntax use a modified [Backus–Naur form (BNF)] grammar notation. This uses the following style of definition:
name ::= lc_letter (lc_letter | “_”)*
lc_letter ::= “a”…“z”
Here, it addresses to use a “modified” BNF grammar notation. But there is no exact how far it modifies BNF, except some examples followed.
Furthermore, when browsing grammar notations around PLR, I encountered many notations that are not compatible with standard BNF/EBNF notation. Of course you may understand better with verbal explanation followed. But since there is a notation, why not get most of points through it?
I even found this thread: Railroad diagrams for Python Grammar that core committer to offer a full BNF compatible notation on “some” python grammar. Then that notation can be visualized to railroad diagrams.
You don’t know how promising the railroad diagram could help people like me having reading disorder (Dyslexia). I can grasp the meaning of a diagram in seconds, but reading a long, verbal sense describing that notion would “kill” me. So it is not about BNF itself, it is to visualize the grammar with some existing tools.
BTW, I searched to see that there isn’t any existing PEG notation visualization tool yet. So that would be a painful path to go.
So the only and final question is: Can we make all Python grammar notation in PLR as compatible to BNF as possible?
I already prove that most of the “modified” notations can be transferred to BNF compatible. And Using Railroad Diagram Generator visualization tool I can draw some really lovely diagrams. e.g.:
I would love to see railroad diagrams in the reference! I’m willing to contribute work on this if I can be helpful in some way.
I have heard that the Python parser now uses a Parsing Expression Grammar. Because selecting alternatives in a PEG rule is order dependent, I don’t know whether a diagram generated from EBNF would capture the meaning accurately? It would be up to the tool used to generate the diagram, I guess.
I love this! Nearly 50 years ago I learned Pascal mostly from a few pages of railroad diagrams photocopied from the library’s Pascal book.
IIRC we gave up on the RR diagramming tool that was used for early Python versions because the grammar was somewhat contorted due to the limitations of the original LL(1) parser generator. I believe we had to put various hints as comments in the grammar to make the diagrams more readable. As the grammar evolved we failed to maintain those hints because the tool was not widely available.
It’s 30 years later now and I’m sure we can do better. The PEG grammar should actually be more suitable, as long as we can adopt a visual convention regarding the order dependency.
The diagram as displayed above is a little hard for me to see in dark mode because the scant contrast between thin brown lines and darker brown background. Were the lines in the original (with which background) dark brown, to go with the yellow, rather than true black?
That aside, I don’t see any inherent problem with displaying PEG grammar. The main issue I see is that the alternatives must be displayed top down in their defined order rather than possibly reordered for esthetic reasons. (I don’t know if any particular RR software does this, but I can imagine it.)
I agree that diagrams generated from PEG is the best solution.
But how long does it take to create a stable PEG diagram generator?
And if it takes a couple of months or even longer, would it still be worthy to “milk” the remaining value of BNF-alike notations in PLR? After all there’s some existing, proven to work tools to generate RR diagrams from eBNF notation. Besides, now all chapters except 10 in PLR are using BNF-alike notation.
The aim of this discussion is to pick up some low hanging fruits to visualise syntax. My rough plan is:
Convert as many BNF-alike notations as possible to standard EBNF. Then generate RR diagrams can be generated with said tool. The total amount of notations is about 1000 lines. Since people are interested, I can devote my recent leisure time to this and expect to get a 1st draft in a week.
This is the hard part, someone from documentation team needs to review the draft and maybe also charts generated, too. But I presume PLR is generated from md files which use markdown syntax that masks BNF syntax out somehow. e.g.
Would that make review not that straightforward? Of course, I can provide BNF diff, and also PDF file of charts if needed.
Automate process: I can hook up the RR diagram tool to PLR html documentation generator. after generating 1st round of contents, a collector script will feed RR tool searched syntax sections to create diagrams. Then in 2nd round scan, doc generator will append diagrams to corresponding sections.
Is this plan meaningful as an intermediate process, before PEG replaces all existing BNF-alike notations?
The PLR Chapter 10 full grammar specification 10. Full Grammar specification — Python 3.12.0 documentation doesn’t cover all the grammar notations. E.g. those in Chapter lexical_analysis. On another hand, would all BNF-alike notations from chapter 1 to 9 cover all syntax, including those re-notated by chapter 10?
I absolutely LOVE this idea, but am worried that the diagrams could get out of sync with the text, which would make them worse than useless. Can they be completely 100% automatically generated from the fundamental syntax files? They would be a huge help when I’m walking someone through something.
Yes, it is planned so. It should be. And in an ideal way all the notations in doc should be extracted from CPython source code instead of manually edited. But now we are in a situation that BNF-alike is used in most of chapters except last one in PLR.
[quote=“Matt Seiler, post:3, topic:36709, username:flyinghyrax”]
I don’t know whether a diagram generated from EBNF would capture the meaning accurately? [/quote]
AFAIK, it is impossible. Like I said in previous posts, using BNF to generate RR diagrams can only be a intermediate step. The ultimate final target would be using PEG to do so, since parser was rewritten to consume PEG rules in 2019.
It’s only worth anybody’s time if it uses the standard PEG grammar file somehow, and if the process of producing the RR diagrams can be automated as part of rendering the documentation (like we currently produce HTML using Sphinx etc.).
Note that we have (of course) existing tooling to read the PEG grammar that you should be able to convert the PEG grammar to the input format accepted by the RR tool you plan to use.
(AFAIK there is no “standard” definition of “EBNF” – Wikipedia describes it as a “family of metasyntax notations”.)
Oh, I do owe this thread a clear definition to standard itself The standard EBNF I was referring in previous post is W3C’s EBNF notation, which is in used by Railroad Diagram Generator.
That tool is quite an elegant but complicated tool. Wiring it to serve that purpose by a stranger like me takes some time. But I would like to give it a try, if the tool owner doesn’t have bandwidth for this.
On another hand my initial attempt is to adjust Python BNF notation to w3c EBNF standard so they could be rendered by said tools. This is the easiest path for myself to proceed for personal usage, but it might not be a solid thing to be added to official Doc.
Regardless of how we convert the PEG grammar to EBNF rules (for input to the RR tool only), how easy would it be to automatically invoke that RR tool in our CI process? I don’t want anyone to spend time on a tool that must be manually run. It must be fully scriptable as part of a GitHub Action.
And now all notations from Chapter 1 to Chapter 9 are modified BNF. Only Chapter 10 is using modified PEG notation.
So would all chapters be unified to PEG in a near future? In that sense, a patch to PeGen could do them all.
Otherwise, if modified BNF would be there as they are, would we still need another tool to render diagrams using BNF notation?
I don’t think distributing PEG notation generated diagram after its corresponding BNF notation in 1-9 chapters is a good idea. Since they varies quite widely.
Oh, I’m sorry. The reference manual uses its own EBNF but there is no guarantee that this truly corresponds to the grammar accepted by the parser – and when there are discrepancies the parser tends to win (but this has to be litigated on a per-case basis). If you extract the grammar from chapters 1-9 I have no idea if you’ll even end up with a full grammar, or if you do, whether it matches the PEG grammar in chapter 10.
I’m not sure what to recommend – maybe someone else who works on these docs has a specific opinion?
Yeah, I have the same impression after reading more documents about PEG’s some rule needs to win, in an ordered way.
Here is the dilemma: as a non-hardcore programmer, I start to learn Python language by reading chapter 1 to 9 occasionally. When I saw those EBNF notations, I use my guts feeling to say: it could be this way…
For chapter 10, I never opened it until recently it’s mentioned by you.
So the intention of this discussion is to add diagrams after each EBNF notation section to improve the learning curve, especially for people like me favourite diagrams than literals.
I would say even EBNF notations are no longer accurate to describe the parser behaviour, they’re still quite useful for people to learn how to use Python syntax to make things to work elegantly.
Visualisation of PEG notation rules in chapter 10, deserves a separate discussion to resolve.
Let’s do this. I feel nerd-sniped.
I know how to generate SVG within Sphinx, having done it in for devguide. I “just” need to find the time.
There are several caveats to work through, but if our grammar and Sphinx maintainers are half as enthusiastic as Guido, adding the diagrams looks quite possible.
This looks like a good opportunity to share insights, so I’ll try to live-stream my process tomorrow. This YouTube link has time in your timezone, and a recording should appear there afterwards.
DM me if you want to join a voice chat.
Note that the syntax snippets in the docs (lexical, expressions, statements) are themselves not generated from the fundamental syntax files. They’re substantially simplified, and even use a (very) slightly different notation.
AFAICS, the notation differences are:
There’s much work to be done simplifying the diagrams. The current docs remove invalid_ rules, lookaheads, cuts and forced tokens. Nowadays we can also remove TYPE_COMMENT.
And then, there’s duplication to be removed – for a simple example: