Where should I add "significant comments" in the grammar?

Hi everyone,

I’d like to make some experimental modifications to the language (locally only, not talking about opening a PEP or anything) and add “comment directives”, i.e. where at the end of a line or on its own on a line, you could add a pound comment like # __some_directive, and that would tell the interpreter to do something. For now I’m just trying to add something trivial called # checkpoint that would cause the runtime to print a message when it executes that line, or something.

I was able to modify the tokenizer with a new CHECKPOINT_DIRECTIVE token that now shows up in the token stream properly, so for example print('Hello, world!') # checkpoint gets tokenized as NAME, LPAR, STRING, RPAR, CHECKPOINT_DIRECTIVE. Then, just so that the language can parse at all, I modified the simple_stmts rule in the grammar to:

simple_stmts[asdl_stmt_seq*]:
    | a=simple_stmt !';' NEWLINE { (asdl_stmt_seq*)_PyPegen_singleton_seq(p, a) } # Not needed, there for speedup
    | a[asdl_stmt_seq*]=';'.simple_stmt+ [';'] [CHECKPOINT_DIRECTIVE] NEWLINE { a }

So now I can parse Python and I have my checkpoint directives (which do nothing, for now). I’m now moving on to the second phase, actually having my directives show up in the AST. It seems to me that I’ll need to modify the _stmt struct in pycore_ast.h with a “directive” field which would probably be an enum of type directive_kind or something. What I’m struggling with is that I don’t see any kind of “single point of entry” in the grammar for modifying the core statement node structure in the AST. For example, if we look at the rule called “statement” in the grammar:

statement[asdl_stmt_seq*]: a=compound_stmt { (asdl_stmt_seq*)_PyPegen_singleton_seq(p, a) } | a[asdl_stmt_seq*]=simple_stmts { a }

It returns an asdl_stmt_seq*, not a stmt_ty, and besides it represents either an entire compound statement or a series of semi-colon separated statements on a line, so this is not the appropriate place to add my directives since the AST would then record no information about where precisely inside an if block a directive was.

Let me explain my problem by going through what I would have to do (I think) to allow directives at the end of an if statement, as in if condition: # checkpoint. It seems to me I would have to modify the if_stmt rule to say:

if_stmt[stmt_ty]:
    | invalid_if_stmt
    | 'if' a=named_expression ':' [d=CHECKPOINT_DIRECTIVE] b=block c=elif_stmt {
        _PyAST_If(a, b, CHECK(asdl_stmt_seq*, _PyPegen_singleton_seq(p, c)), EXTRA, d) }
    | 'if' a=named_expression ':' [d=CHECKPOINT_DIRECTIVE] b=block c=[else_block] { _PyAST_If(a, b, c, EXTRA, d) }

I would also have to modify _PyAST_If() to accept an optional directive token as an argument. But the problem is, I would need to do this also for whiles, fors, function definitions, every time of simple statement, etc, because the only place to add the [CHECKPOINT_DIRECTIVE] is in those rules. I would also need to modify all of the _PyAST_*** functions likewise to accept an optional directive argument.

Could someone with more experience with the grammar please advise the cleanest way to make this change?