Revisiting attribute docstrings

tmk · October 22, 2023, 10:45am

I think the definition would only need to be altered within function signatures, right? Because I would think the following two (not within a function) are already identical in meaning? (Not completely sure about this though.)

PI: float = 3.14
"""The mathematical constant pi."""

vs

PI: float = 3.14; """The mathematical constant pi."""

And I think it does look clearer with the semicolons:

def foo(
    bar; """A required parameter.""",
    baz: int = None; """An optional parameter.""",
):
    ...

pawamoy · October 22, 2023, 11:51am

I’ll comment as a maintainer of a tool that uses both static and dynamic analysis to extract information from source code.

I like this proposal. Anything that reduces the need for docstring style parsing (Google-style, Numpydoc-style) is very welcome. I expanded on this in this other thread. That is also why I like PEP 727. This proposal is less extensible than PEP 727 (it won’t be able to handle all the information we can currently encode in docstring styles, such as documentation for raised exceptions, returned values, etc.), but it still a good step forward. I wouldn’t mind having both solutions available.

One thing that could become ambiguous with this proposal, is that attributes could now kinda have two docstrings:

class A:
    """Class docstring."""

a = A()
"""Attribute docstring."""

a.__doc__  # Class docstring.
module.__docstrings__["a"]  # Attribute docstring.

This could be confusing, and standard utilities like inspect.getdoc would probably need to account for that, or we would need new utilities in the standard library.

Now about the suggested alternatives/changes to the proposal

Docstrings above attributes. I’m against this for two reasons.
First, because this would create ambiguity with module and class docstrings. A docstring-attribute pair at the top of a module/class would be undecidable, given its AST. Python itself would have trouble with it (should it attach the docstring to module.__doc__ or into module.__docstrings__?). Unless we start to give meaning to spacing (blank line between docstring and attribute), which I feel is not a good idea: static analysis tools would need to build CSTs (Concrete Syntax Trees) instead of ASTs (abstract ones). That’s a huge requirement.
Second, because a lot of code is already using docstrings below attributes. That would create a lot of confusion if docstrings are suddently expected to appear above the attribute assignment.
I understand that developers are used to write comments before a section of code, but it also makes sense to me to write docstrings after the thing they document: “here’s an attribute, here’s what it’s used for”.
Using #: comments. I’m against this, because comments do not appear in ASTs. Unless there are plans to include comments in ASTs built with the standard ast module, this is a no-go to me. Picking up such comments requires tokenization of the code in addition to building an AST of it. Or, if we store the source in memory, we can re-use line numbers of attribute assignment / parameters declaration (obtained through the AST) to look for comments on the line(s) above. I consider both solutions to be hacks, and would much prefer to have attribute/parameter docstrings be part of the AST. Also, comments in general do not appear at runtime, so these comments would become kind of specific comments with specific behavior (included in __docstrings__ at runtime), which IMO feels weird and inconsistent.
Using ; as delimiter. I find this interesting. I can see one thing that might cause issues though, or at least unexpected things. ; can already be used to delimitate statements when running code with python -c. Currently, in python -c 'a = ClassA(); """String."""', the string would be ignored (no-op). With this proposal, the string would now appear in the module __docstrings__, with potential additional effects (see my comment above about standard utilities like inspect.getdoc).

con-f-use · October 22, 2023, 1:48pm

def foo(
    bar,
    """A required parameter."""

Hm…, you’ll have a lot of errors with the unwanted comma after someone (re)moves the docstrting in a refactor or something. (Sorry if that was already mentioned, might have missed in the giant wall of text that is this thread)

pf_moore · October 22, 2023, 4:04pm

When you say you like the proposal, does that mean you’ll stop supporting existing approaches in your tool? If so, this is precisely the sort of “social pressure” to use the feature that’s hard to resist - follow the new (but controversial) approach, or you lose functionality that you previously had.

If that is what you (or any other tools) intend to do, I think the proposals (this and PEP 727) need to be very clear that this is likely to be the impact on projects that prefer the existing approaches. And no, I don’t think “we can’t dictate what tools can do, that’s outside the scope of the proposal” is a reasonable response - how the proposal affects existing practices (directly or indirectly) is very much an important aspect of any PEP.

pawamoy · October 22, 2023, 4:22pm

Not at all! I’m not planning to stop supporting any existing approach that I already support. I’m not even closing the door to adding support for existing approaches that I don’t yet support. For example, we currently don’t support #: comments, but we have a request in our backlog to support them, and this will likely be implemented (without tokenization, because we store source code in memory for easy access), even though I personally would never use this feature. So even if PEP 727 or this proposal get accepted, users who don’t wish to use them won’t be left out. Of course I understand this concern, you did right to ask.

EpicWink · October 22, 2023, 10:55pm

Python has been annotation about the number of newlines for the most part (I can’t think of any examples outside of escaped newlines and within strings). I don’t think this proposal is important enough to change that.

I suppose. It is currently very invalid to have a semicolon ; within a function declaration as it needs the trailing closing parentheses ) and colon :, so maybe that works.

Before this proposal, attributes don’t have docstrings: their values do. With this proposal, you can assign b = a and now both a and b have the same docstring, stored in module.__docstrings__, while a.__doc__ is different. It’s similar to __annotations__.

That’s my primary reason for not suggesting this form.

I’m only proposing semicolons ; within function declarations (maybe even only for one-liners).

I don’t understand this. The comma is currently syntactically valid (and stylistically recommended). Could you please provide a before and after example which causes an error you mention?

I know this isn’t exactly what you’re saying here, but I don’t think the possibility of change requests to switch to the new syntax is a strong argument from accepting these proposals, as otherwise you could say that about every new feature added to Python and its ecosystem.

I don’t think this proposal will affect most users of libraries (the location of the parameter docstring almost never affects those users at runtime, even if they read the source), rather only the library’s developers, and devs of specific applications.

I think it makes sense to recommend told to keep their existing docstring parsing however, and I think many will prefer the existing form as it is usually easier to read.

Rosuav · October 22, 2023, 11:01pm

Note that there is already one exception to that rule, and that’s type comments. (Though you have to specifically ask for them with ast.parse(..., type_comments=True) or they are stripped.)

EpicWink · October 24, 2023, 9:27pm

I had forgotten about class __slots__, which can be a mapping from slot names to their attribute docstrings ^[1]. I can think of some options:

keep them separate, and have two mechanisms to document class annotations (perhaps slots must be documented through __slots__). inspect will have to to retrieve both (if we as the functionality)
Have __slots__ inject into __docstrings__. Choose between:
- Updates to __slots__ will automatically update __docstrings__ (or vice versa)
- __slots__ becomes read-only after class definition (I think this is a breaking change, because I believe __slots__ can be reassigned)

I’m leaning toward the first option, as it’s the simplest.

In addition, collisions on attribute names need a decision: overwrite, skip, or error. I prefer the attribute docstrings to supercede the slot docstring, as it is more visual, but it is technically not backwards compatible if a class currently has slot docstrings and uses the attribute docstring syntax.

is this a misnomer in the Python documentation? Should it be called “attribute documentation” instead of docstring? ↩︎

webknjaz · December 1, 2023, 4:16pm

Oh, good to know. I actually quite like the attribute docstrings.
Though, I was considering moving to Sphinx’s #: due to linting issues as the attribute docstrings aren’t exposed in runtime at the moment (check-docstring-first false positive on attribute docstrings · Issue #159 · pre-commit/pre-commit-hooks · GitHub).

Would love to see PEP 224 resurrected!

abatea · December 13, 2023, 1:05am

Edit: I was off-topic. I migrated this comment to the original megathread

mauve · December 2, 2024, 10:34am

How about new up-arrow syntax?

TIMEOUT = 30
^ "Timeout for HTTP requests"

Any string literal is valid after the caret, including multiline (but not an f-string).

In an expression, caret is the bitwise xor operator, but it is not valid at the start of a statement. At the start of a statement it would define a docstring statement, which must follow a single-name assignment/type annotation statement, otherwise a SyntaxError is raised during compilation. Visually it is not ambiguous because bitwise xor is not defined for strings.

In parameter lists there is syntactic ambiguity with the infix ^ if you don’t leave the trailing comma, so it would need to be special cased that caret-strings in parameter lists document the previous parameter, e.g.

def length(x, ^"x coordinate", y, ^"y coordinate"):

but that would look more sensible if folded (perhaps always by black/ruff) to

def length(
    x,
    ^ "x coordinate",
    y,
    ^ "y coordinate",
):

If the original reason for rejecting PEP-224 was ambiguity about which value was documented (and possibly changing the semantics of existing code), a docstring operator would relieve this.

Edit: typos

pawamoy · December 2, 2024, 11:34am

Interesting!

I like to push things further and think not only about attributes and parameters, but also about returned/yielded values, raised exceptions, emitted warnings.

def do_everything(
    parameter: Any,
    ^ "Docs for parameter. OK.",
) -> Any:
    ^ "Docs for returned value. This location is confusing."
    """Summary."""
    warnings.warn("Deprecated function.", DeprecationWarning)
    ^ "Docs for emitted warning. OK."

    yield 0
    ^ """
        Docs for yielded value. Not working (missing information like type).
        Besides, we can have multiple yield statements in the function body,
        so after which one(s) do we write the docs?
    """

    yield 1

    received = yield 2
    ^ (
        "Docs for received value. This is confusing: "
        "are we documenting the received or yielded value? "
        "Also, missing type information."
    )

    try:
        return can_raise()
        ^ "Docs for returned value. Not working (same reasons as yield)."
    except KeyError:
        return None
    except ValueError as error:
        raise RuntimeError("Message.") from error
        ^ "Docs for raised exception. OK."

Sorry, I’m aware the thread is called “Revisiting attribute docstrings”, it’s just that IMO there’s no point in creating new syntax/standards if they don’t cover all the existing use-cases (for which we’d have to fallback again on docstring styles)

hprodh · February 1, 2025, 12:14pm

I like this proposal.

I require frequent refactoring, when re-agencing the inputs/outputs I typically focus on the chain of operations performed and do not pay attention to updating the docstrings.

This is why I usually don’t write docstrings, or very expeditive ones. However, the inputs/outputs are usually much more consistent from one factoring to another. Also I like functions correctly docstringed, especially __init__ and __call__.
I would use these attribute docstrings if available because they will ease docstring update on refactoring a lot (although I never required them). Also I would like to have them for inputs and for outputs as well.

Now what I think about the syntax, the ; is already used as a instructtion (or line) separator, and I find it difficult to read, the ^ is also difficult to read when used as one-liner.
What I would find easier is by actually having a new type of strings dedicated to docstrings, e.g. d-strings.

Example :
EDIT : I did just read PEP727 after posting this… This syntax has redundancy with PEP727 (although more “lightweight”).

class MyClass:
    """Manages things"""
    def __init__(self,
                 a d"parameter a",
                 b=None d"parameter b (optionnal)"):
    """Init the class that manages things."""
    ...
    def __call__(self, x d"value x in some units"):
        return (c_a d"value computed from a and x",
                c_b d"value computed from b and x")

Here I didn’t use any operator, assuming the d"..." strings are consistently interpreted as associated “partial” docstrings to variables (Yet the use of : or another operator can still be considered).

adiled · June 10, 2025, 6:34am

The proposals may have been rejected because this doesn’t need a long-form proposal with mathematical reasonings, t’s common pythonic sense. The docstring sits below function signatures, so where else can the docstring of its members sit? And of course the same applies to every declaration.