Revisiting attribute docstrings

I think the definition would only need to be altered within function signatures, right? Because I would think the following two (not within a function) are already identical in meaning? (Not completely sure about this though.)

PI: float = 3.14
"""The mathematical constant pi."""

vs

PI: float = 3.14; """The mathematical constant pi."""

And I think it does look clearer with the semicolons:

def foo(
    bar; """A required parameter.""",
    baz: int = None; """An optional parameter.""",
):
    ...
5 Likes

I’ll comment as a maintainer of a tool that uses both static and dynamic analysis to extract information from source code.

I like this proposal. Anything that reduces the need for docstring style parsing (Google-style, Numpydoc-style) is very welcome. I expanded on this in this other thread. That is also why I like PEP 727. This proposal is less extensible than PEP 727 (it won’t be able to handle all the information we can currently encode in docstring styles, such as documentation for raised exceptions, returned values, etc.), but it still a good step forward. I wouldn’t mind having both solutions available.

One thing that could become ambiguous with this proposal, is that attributes could now kinda have two docstrings:

class A:
    """Class docstring."""

a = A()
"""Attribute docstring."""

a.__doc__  # Class docstring.
module.__docstrings__["a"]  # Attribute docstring.

This could be confusing, and standard utilities like inspect.getdoc would probably need to account for that, or we would need new utilities in the standard library.

Now about the suggested alternatives/changes to the proposal

  • Docstrings above attributes. I’m against this for two reasons.
    First, because this would create ambiguity with module and class docstrings. A docstring-attribute pair at the top of a module/class would be undecidable, given its AST. Python itself would have trouble with it (should it attach the docstring to module.__doc__ or into module.__docstrings__?). Unless we start to give meaning to spacing (blank line between docstring and attribute), which I feel is not a good idea: static analysis tools would need to build CSTs (Concrete Syntax Trees) instead of ASTs (abstract ones). That’s a huge requirement.
    Second, because a lot of code is already using docstrings below attributes. That would create a lot of confusion if docstrings are suddently expected to appear above the attribute assignment.
    I understand that developers are used to write comments before a section of code, but it also makes sense to me to write docstrings after the thing they document: “here’s an attribute, here’s what it’s used for”.

  • Using #: comments. I’m against this, because comments do not appear in ASTs. Unless there are plans to include comments in ASTs built with the standard ast module, this is a no-go to me. Picking up such comments requires tokenization of the code in addition to building an AST of it. Or, if we store the source in memory, we can re-use line numbers of attribute assignment / parameters declaration (obtained through the AST) to look for comments on the line(s) above. I consider both solutions to be hacks, and would much prefer to have attribute/parameter docstrings be part of the AST. Also, comments in general do not appear at runtime, so these comments would become kind of specific comments with specific behavior (included in __docstrings__ at runtime), which IMO feels weird and inconsistent.

  • Using ; as delimiter. I find this interesting. I can see one thing that might cause issues though, or at least unexpected things. ; can already be used to delimitate statements when running code with python -c. Currently, in python -c 'a = ClassA(); """String."""', the string would be ignored (no-op). With this proposal, the string would now appear in the module __docstrings__, with potential additional effects (see my comment above about standard utilities like inspect.getdoc).

3 Likes
def foo(
    bar,
    """A required parameter."""

Hm…, you’ll have a lot of errors with the unwanted comma after someone (re)moves the docstrting in a refactor or something. (Sorry if that was already mentioned, might have missed in the giant wall of text that is this thread)

2 Likes

When you say you like the proposal, does that mean you’ll stop supporting existing approaches in your tool? If so, this is precisely the sort of “social pressure” to use the feature that’s hard to resist - follow the new (but controversial) approach, or you lose functionality that you previously had.

If that is what you (or any other tools) intend to do, I think the proposals (this and PEP 727) need to be very clear that this is likely to be the impact on projects that prefer the existing approaches. And no, I don’t think “we can’t dictate what tools can do, that’s outside the scope of the proposal” is a reasonable response - how the proposal affects existing practices (directly or indirectly) is very much an important aspect of any PEP.

3 Likes

Not at all! I’m not planning to stop supporting any existing approach that I already support. I’m not even closing the door to adding support for existing approaches that I don’t yet support. For example, we currently don’t support #: comments, but we have a request in our backlog to support them, and this will likely be implemented (without tokenization, because we store source code in memory for easy access), even though I personally would never use this feature. So even if PEP 727 or this proposal get accepted, users who don’t wish to use them won’t be left out. Of course I understand this concern, you did right to ask.

1 Like

Python has been annotation about the number of newlines for the most part (I can’t think of any examples outside of escaped newlines and within strings). I don’t think this proposal is important enough to change that.


I suppose. It is currently very invalid to have a semicolon ; within a function declaration as it needs the trailing closing parentheses ) and colon :, so maybe that works.


Before this proposal, attributes don’t have docstrings: their values do. With this proposal, you can assign b = a and now both a and b have the same docstring, stored in module.__docstrings__, while a.__doc__ is different. It’s similar to __annotations__.


That’s my primary reason for not suggesting this form.


I’m only proposing semicolons ; within function declarations (maybe even only for one-liners).


I don’t understand this. The comma is currently syntactically valid (and stylistically recommended). Could you please provide a before and after example which causes an error you mention?


I know this isn’t exactly what you’re saying here, but I don’t think the possibility of change requests to switch to the new syntax is a strong argument from accepting these proposals, as otherwise you could say that about every new feature added to Python and its ecosystem.

I don’t think this proposal will affect most users of libraries (the location of the parameter docstring almost never affects those users at runtime, even if they read the source), rather only the library’s developers, and devs of specific applications.

I think it makes sense to recommend told to keep their existing docstring parsing however, and I think many will prefer the existing form as it is usually easier to read.

Note that there is already one exception to that rule, and that’s type comments. (Though you have to specifically ask for them with ast.parse(..., type_comments=True) or they are stripped.)

2 Likes

I had forgotten about class __slots__, which can be a mapping from slot names to their attribute docstrings [1]. I can think of some options:

  • keep them separate, and have two mechanisms to document class annotations (perhaps slots must be documented through __slots__). inspect will have to to retrieve both (if we as the functionality)

  • Have __slots__ inject into __docstrings__. Choose between:

    • Updates to __slots__ will automatically update __docstrings__ (or vice versa)
    • __slots__ becomes read-only after class definition (I think this is a breaking change, because I believe __slots__ can be reassigned)

I’m leaning toward the first option, as it’s the simplest.

In addition, collisions on attribute names need a decision: overwrite, skip, or error. I prefer the attribute docstrings to supercede the slot docstring, as it is more visual, but it is technically not backwards compatible if a class currently has slot docstrings and uses the attribute docstring syntax.


  1. is this a misnomer in the Python documentation? Should it be called “attribute documentation” instead of docstring? ↩︎

1 Like

Oh, good to know. I actually quite like the attribute docstrings.
Though, I was considering moving to Sphinx’s #: due to linting issues as the attribute docstrings aren’t exposed in runtime at the moment (check-docstring-first false positive on attribute docstrings · Issue #159 · pre-commit/pre-commit-hooks · GitHub).

Would love to see PEP 224 resurrected!

Edit: I was off-topic. I migrated this comment to the original megathread