Revisiting attribute docstrings

EpicWink · October 17, 2023, 2:50am

PEP 224 (Attribute Docstrings) proposed a syntax for class attribute docstrings:

class A:
    b = 42
    """Some documentation."""

    c = None

This was rejected because of ambiguity for readers about which attribute the docstring referred to.

With the prevalence of Sphinx, it is now understood that the docstring refers to the immediate prior symbol (see the docs).

Some in the community don’t like the approach introduced in PEP 727 (Documentation in Annotated Metadata), where a symbol’s documentation is a field in its Annotated annotation, and wish to introduce more Pythonic syntax to address the problems raised in that PEP. Below is a proposal which does just that.

I propose an extended form of PEP 224 to document symbols: module attributes, class attributes, and function parameters.

module_attribute = "spam"
"""A string."""

class AClass:
    class_attribute = 42
    """An integer."""

def foo(
    bar,
    """A required parameter."""

    baz: int = None,
    """An optional parameter."""
):
    ...

A docstring will always document the immediately-prior symbol at the same indentation level. Note how the comma , following a parameter definition must go before the docstring to prevent string-concatenation with string-typed parameter defaults.

The docstring’s value will be stored in a new attribute __docstrings__, defined only on usage of this proposal (injected into parent’s __dict__ after the parent is defined and processed). For module and class attributes, __docstrings__ is set on the module and class respectively. For function parameters, __docstrings__ is set on the function. ^[1]

I’ve found one usage on GitHub already using the name __docstrings__

inspect.Parameter would gain a new instance attribute docstring which has the parameter’s corresponding docstring value. A search shows some potential conflict with existing code.

perhaps type, ModuleType and FunctionType could learn a __docstrings__ getter-property; this is an implementation detail ↩︎

NeilGirdhar · October 17, 2023, 4:47am

I’d definitely use this if it were accepted since I prefer having docstrings closer to the variable, and I dislike repeating parameter names.

A few questions:

Is there any way to put the docstring on the same line as the variable or parameter? That ability alone makes comments a close competitor.
Are you making any recommendation about where the comma should with parameter docstrings (your example shows two places)?
If we have programmatic access to parameter docstrings, will they available in inspect.Parameter?

csm10495 · October 17, 2023, 5:14am

I personally can’t stand when the docstring is below a non-function-like variable. Especially a lot of the time these would be one line comments. I’d say either do them on the same line or above.

EpicWink · October 17, 2023, 5:21am

I can’t think of a good solution which looks legible, so I’d prefer to stick to the existing forms with class and module attributes.

I don’t mind what the final solution is, but the proposal in my original post explicitly allows for either

Yes, I’ll update the original post to make a comment on Parameter

The doc-string is parsed by Sphinx (and other tools, like PyCharm (an IDE)) and used as the documentation for that symbol, which a comment can’t currently (nor do I think should) do. This proposal would further make that available at runtime

csm10495 · October 17, 2023, 5:25am

I understand but I personally prefer the other way: have the docstring above the definition.

Even if they can be parsed by tooling, I have to look at it, and don’t like the way docstring below variable looks.

I don’t think I’ve ever seen docstring below variable besides class/function declaration.

If we decide to formalize a format, if prefer it be above or on the same line.

Edit:

I think I’ve seen tools parse

V = 'hello' #: this is the docstring for V

As a way of doing one liner

Similarly also for two lines:

#: this is the docstring for V
V = 'hello'

EpicWink · October 17, 2023, 5:30am

Charles Machalow:

If we decide to formalize a format, if prefer it be above or on the same line.

Edit:

I think I’ve seen tools parse
V = 'hello' #: this is the docstring for V
As a way of doing one liner

Similarly also for two lines:
#: this is the docstring for V
V = 'hello'

You’re probably referring to Sphinx’s autoattribute, which also supports docstrings after the attribute

BrenBarn · October 17, 2023, 6:23am

I’d consider that a showstopper. The comma is an explicit separator. Having a docstring after a comma grouped with a parameter before the comma seems red-alert ultra-confusing to me.

barry · October 18, 2023, 12:22am

Same, for parity for where I would normally write the comment. I’ve never seen comment-below ever referring to the line above.

NeilGirdhar · October 18, 2023, 2:03am

Would it be worth contrasting this PEP’s proposed notation with the #: notation in the PEP?

fonini · October 18, 2023, 2:27am

Placing the comma after the docstring introduces a syntax ambiguity:

def foo(
    param: str = "some default value"
    """Some documentation""",
):
    ...

Is that a string concatenation? Or is the second one a docstring? With today’s syntax, it’s a string concatenation, so if it’s now a docstring then that’s a backwards-incompatible syntax change.

On the other hand, I would agree with @BrenBarn that separating the parameter from its docstring with a comma would be egregious.

EpicWink · October 18, 2023, 8:29am

No PEP yet, but I likely will if and when I make one. The original post is simply motivation and syntax.

From GitHub searches:

symbol documentation using special comments #: is in 33 900 files (search)
symbol documentation using multi-line strings """ is in 68 200 files (search) (although this has an indeterminate number of false-positives)

I’m personally not a fan of the #: syntax as I consider all parts of the code starting with hash # to be stripped from the runtime (but potentially used by some tools: the most popular ones I can think of are black and coverage). I also prefer the consistency of having docstrings after declarations (ie the case right now with functions, classes and modules).

Pedro Fonini:

Placing the comma after the docstring introduces a syntax ambiguity:
def foo(
    param: str = "some default value"
    """Some documentation""",
):
    ...
Is that a string concatenation? Or is the second one a docstring? With today’s syntax, it’s a string concatenation, so if it’s now a docstring then that’s a backwards-incompatible syntax change.

That’s a problem; I don’t think we should change string-concatenation semantics. I’ll update the original post to remove that option.

Perhaps there’s opportunity to add a delimiter, eg semicolon ;.

tmk · October 18, 2023, 8:46am

Would it be stored as a dictionary with the attribute/parameter name as key and the doc string as value? Or what did you have in mind?

NeilGirdhar · October 18, 2023, 11:30am

A delimiter would be ideal since it would allow you to put the docstring on the same line for parameters and variables. E.g.,

bias: float = 4;  "The bias of the model"
weights: list[float]
"""The weights of model.
Initialized to zero.
"""

and similarly for parameters (with a comma after the docstring). Is that what you have in mind?

So, essentially a parameter x has an optional type annotation indicated by :, an optional default indicated by =, and an optional docstring indicated by (perhaps) ;?

davidism · October 18, 2023, 2:30pm

In Flask and my other projects, I’m moving away from #: above to """ below attributes to document them.

#: has the problem of messing with indentation levels, since it’s 3 characters before you start typing, so if you need to indent further for some reason, you have to manually add an extra few spaces after pressing tab to get things properly indented. #: also isn’t handled by IDEs well, so typing that before every line of a multiline doc is tedious.

""" avoids those two issues, and also matches what documentation looks like and where it’s found for classes and functions already. It’s also easier to modify lines and reflow text later.

I guess this sort of defeats the purpose of this discussion as opposed to the docs-in-annotations PEP, but I’d personally leave out parameters from a proposal. Aside from the ambiguity in can create, I still think parameter documentation looks better in the class/function docstring rather than next to each parameter.

barry · October 18, 2023, 3:27pm

Clearly we need TOCs - triple octothorp comments.

EpicWink · October 18, 2023, 9:23pm

Yes, specifically a mapping from strings to strings (don’t need to require all dict methods).

Neil Girdhar:

EpicWink:

Perhaps there’s opportunity to add a delimiter, eg semicolon ;.

A delimiter would be ideal since it would allow you to put the docstring on the same line for parameters and variables. E.g.,
bias: float = 4;  "The bias of the model"
weights: list[float]
"""The weights of model.
Initialized to zero.
"""
and similarly for parameters (with a comma after the docstring). Is that what you have in mind?

That’s a good option, and was something I was thinking about. I’d imagine many aren’t comfortable with slightly altering the definition of the semicolon ; though.

Parameter documentation is my entire motivation: the module and class attribute was just a bonus.

I agree that parameter docs look better in the docstring, there are situations where the docs are better suited near the parameter and unambiguous, for example when needed at runtime.

davidism · October 18, 2023, 9:57pm

My main point was a preference towards """ over #: style (or both, as Sphinx does now). If you’re confident about parameter docstrings as well, that’s fine.

pawamoy · October 20, 2023, 1:55pm

Having docstrings above attributes creates an ambiguity with module docstrings:

# ambiguous.py
"""Am I the module docstring, or the docstring of `hello`?"""

hello = "hello"

apalala · October 20, 2023, 7:39pm

We could compromise on syntax that goes in the same line as the attribute. It would also work for function arguments. I didn’t know about #:, and I like it, except that it could interfere with comments aimed at guiding commonly-used linters.

csm10495 · October 20, 2023, 11:06pm

Maybe ambiguous to a computer, but to a human the extra newline tells us.