Abstract:
Currently, the Python standard ast module discards all inline comments (#) during parsing, except for type_ignores. This poses a major limitation for code formatters, linters, and refactoring tools, forcing them to build heavy custom CST (Concrete Syntax Tree) parsers. We propose adding a comments attribute to ast.Module to store inline comments as lightweight metadata objects.
Specification:
We propose a new AST node specifically for inline comments. Since multiline docstrings (""") are already handled perfectly as ast.Constant string literals, this new node targets only single-line comments starting with #.
To maximize memory efficiency and eliminate redundant allocation, the node omits end_lineno (since a single-line comment inherently terminates at the newline) and focuses purely on inline horizontal boundaries.
The underlying C implementation would look like this:
Comment(
const char *comment, // The literal string content of the comment
int lineno, // Line number where the comment resides
int col_offset, // Starting column byte offset
int end_col_offset // Ending column byte offset
)
Rationale:
- Memory Efficiency: Unlike other full-scope AST nodes, an inline comment always spans exactly one line. Storing
end_linenois completely redundant. Omitting it prevents unnecessary memory bloat when parsing large codebases with millions of comments. - True Literal Nature: Comments are essentially non-mutating string constants. Storing the text as a constant pointer (
const char *) fits seamlessly into CPython’s literal management. - Exact Tooling Boundaries: By keeping
col_offsetandend_col_offset, modern IDEs and formatters can easily calculate the exact visual range of the comment for syntax highlighting, auto-formatting, and precise user selection.