Revisiting attribute docstrings

pawamoy · October 22, 2023, 11:51am

I’ll comment as a maintainer of a tool that uses both static and dynamic analysis to extract information from source code.

I like this proposal. Anything that reduces the need for docstring style parsing (Google-style, Numpydoc-style) is very welcome. I expanded on this in this other thread. That is also why I like PEP 727. This proposal is less extensible than PEP 727 (it won’t be able to handle all the information we can currently encode in docstring styles, such as documentation for raised exceptions, returned values, etc.), but it still a good step forward. I wouldn’t mind having both solutions available.

One thing that could become ambiguous with this proposal, is that attributes could now kinda have two docstrings:

class A:
    """Class docstring."""

a = A()
"""Attribute docstring."""

a.__doc__  # Class docstring.
module.__docstrings__["a"]  # Attribute docstring.

This could be confusing, and standard utilities like inspect.getdoc would probably need to account for that, or we would need new utilities in the standard library.

Now about the suggested alternatives/changes to the proposal

Docstrings above attributes. I’m against this for two reasons.
First, because this would create ambiguity with module and class docstrings. A docstring-attribute pair at the top of a module/class would be undecidable, given its AST. Python itself would have trouble with it (should it attach the docstring to module.__doc__ or into module.__docstrings__?). Unless we start to give meaning to spacing (blank line between docstring and attribute), which I feel is not a good idea: static analysis tools would need to build CSTs (Concrete Syntax Trees) instead of ASTs (abstract ones). That’s a huge requirement.
Second, because a lot of code is already using docstrings below attributes. That would create a lot of confusion if docstrings are suddently expected to appear above the attribute assignment.
I understand that developers are used to write comments before a section of code, but it also makes sense to me to write docstrings after the thing they document: “here’s an attribute, here’s what it’s used for”.
Using #: comments. I’m against this, because comments do not appear in ASTs. Unless there are plans to include comments in ASTs built with the standard ast module, this is a no-go to me. Picking up such comments requires tokenization of the code in addition to building an AST of it. Or, if we store the source in memory, we can re-use line numbers of attribute assignment / parameters declaration (obtained through the AST) to look for comments on the line(s) above. I consider both solutions to be hacks, and would much prefer to have attribute/parameter docstrings be part of the AST. Also, comments in general do not appear at runtime, so these comments would become kind of specific comments with specific behavior (included in __docstrings__ at runtime), which IMO feels weird and inconsistent.
Using ; as delimiter. I find this interesting. I can see one thing that might cause issues though, or at least unexpected things. ; can already be used to delimitate statements when running code with python -c. Currently, in python -c 'a = ClassA(); """String."""', the string would be ignored (no-op). With this proposal, the string would now appear in the module __docstrings__, with potential additional effects (see my comment above about standard utilities like inspect.getdoc).