I think the definition would only need to be altered within function signatures, right? Because I would think the following two (not within a function) are already identical in meaning? (Not completely sure about this though.)
Iâll comment as a maintainer of a tool that uses both static and dynamic analysis to extract information from source code.
I like this proposal. Anything that reduces the need for docstring style parsing (Google-style, Numpydoc-style) is very welcome. I expanded on this in this other thread. That is also why I like PEP 727. This proposal is less extensible than PEP 727 (it wonât be able to handle all the information we can currently encode in docstring styles, such as documentation for raised exceptions, returned values, etc.), but it still a good step forward. I wouldnât mind having both solutions available.
One thing that could become ambiguous with this proposal, is that attributes could now kinda have two docstrings:
class A:
"""Class docstring."""
a = A()
"""Attribute docstring."""
a.__doc__ # Class docstring.
module.__docstrings__["a"] # Attribute docstring.
This could be confusing, and standard utilities like inspect.getdoc would probably need to account for that, or we would need new utilities in the standard library.
Now about the suggested alternatives/changes to the proposal
Docstrings above attributes. Iâm against this for two reasons.
First, because this would create ambiguity with module and class docstrings. A docstring-attribute pair at the top of a module/class would be undecidable, given its AST. Python itself would have trouble with it (should it attach the docstring to module.__doc__ or into module.__docstrings__?). Unless we start to give meaning to spacing (blank line between docstring and attribute), which I feel is not a good idea: static analysis tools would need to build CSTs (Concrete Syntax Trees) instead of ASTs (abstract ones). Thatâs a huge requirement.
Second, because a lot of code is already using docstrings below attributes. That would create a lot of confusion if docstrings are suddently expected to appear above the attribute assignment.
I understand that developers are used to write comments before a section of code, but it also makes sense to me to write docstrings after the thing they document: âhereâs an attribute, hereâs what itâs used forâ.
Using #: comments. Iâm against this, because comments do not appear in ASTs. Unless there are plans to include comments in ASTs built with the standard ast module, this is a no-go to me. Picking up such comments requires tokenization of the code in addition to building an AST of it. Or, if we store the source in memory, we can re-use line numbers of attribute assignment / parameters declaration (obtained through the AST) to look for comments on the line(s) above. I consider both solutions to be hacks, and would much prefer to have attribute/parameter docstrings be part of the AST. Also, comments in general do not appear at runtime, so these comments would become kind of specific comments with specific behavior (included in __docstrings__ at runtime), which IMO feels weird and inconsistent.
Using ; as delimiter. I find this interesting. I can see one thing that might cause issues though, or at least unexpected things. ; can already be used to delimitate statements when running code with python -c. Currently, in python -c 'a = ClassA(); """String."""', the string would be ignored (no-op). With this proposal, the string would now appear in the module __docstrings__, with potential additional effects (see my comment above about standard utilities like inspect.getdoc).
HmâŠ, youâll have a lot of errors with the unwanted comma after someone (re)moves the docstrting in a refactor or something. (Sorry if that was already mentioned, might have missed in the giant wall of text that is this thread)
When you say you like the proposal, does that mean youâll stop supporting existing approaches in your tool? If so, this is precisely the sort of âsocial pressureâ to use the feature thatâs hard to resist - follow the new (but controversial) approach, or you lose functionality that you previously had.
If that is what you (or any other tools) intend to do, I think the proposals (this and PEP 727) need to be very clear that this is likely to be the impact on projects that prefer the existing approaches. And no, I donât think âwe canât dictate what tools can do, thatâs outside the scope of the proposalâ is a reasonable response - how the proposal affects existing practices (directly or indirectly) is very much an important aspect of any PEP.
Not at all! Iâm not planning to stop supporting any existing approach that I already support. Iâm not even closing the door to adding support for existing approaches that I donât yet support. For example, we currently donât support #: comments, but we have a request in our backlog to support them, and this will likely be implemented (without tokenization, because we store source code in memory for easy access), even though I personally would never use this feature. So even if PEP 727 or this proposal get accepted, users who donât wish to use them wonât be left out. Of course I understand this concern, you did right to ask.
Python has been annotation about the number of newlines for the most part (I canât think of any examples outside of escaped newlines and within strings). I donât think this proposal is important enough to change that.
I suppose. It is currently very invalid to have a semicolon ; within a function declaration as it needs the trailing closing parentheses ) and colon :, so maybe that works.
Before this proposal, attributes donât have docstrings: their values do. With this proposal, you can assign b = a and now both a and b have the same docstring, stored in module.__docstrings__, while a.__doc__ is different. Itâs similar to __annotations__.
Thatâs my primary reason for not suggesting this form.
Iâm only proposing semicolons ; within function declarations (maybe even only for one-liners).
I donât understand this. The comma is currently syntactically valid (and stylistically recommended). Could you please provide a before and after example which causes an error you mention?
I know this isnât exactly what youâre saying here, but I donât think the possibility of change requests to switch to the new syntax is a strong argument from accepting these proposals, as otherwise you could say that about every new feature added to Python and its ecosystem.
I donât think this proposal will affect most users of libraries (the location of the parameter docstring almost never affects those users at runtime, even if they read the source), rather only the libraryâs developers, and devs of specific applications.
I think it makes sense to recommend told to keep their existing docstring parsing however, and I think many will prefer the existing form as it is usually easier to read.
Note that there is already one exception to that rule, and thatâs type comments. (Though you have to specifically ask for them with ast.parse(..., type_comments=True) or they are stripped.)
I had forgotten about class __slots__, which can be a mapping from slot names to their attribute docstrings [1]. I can think of some options:
keep them separate, and have two mechanisms to document class annotations (perhaps slots must be documented through __slots__). inspect will have to to retrieve both (if we as the functionality)
Have __slots__ inject into __docstrings__. Choose between:
Updates to __slots__ will automatically update __docstrings__ (or vice versa)
__slots__ becomes read-only after class definition (I think this is a breaking change, because I believe __slots__ can be reassigned)
Iâm leaning toward the first option, as itâs the simplest.
In addition, collisions on attribute names need a decision: overwrite, skip, or error. I prefer the attribute docstrings to supercede the slot docstring, as it is more visual, but it is technically not backwards compatible if a class currently has slot docstrings and uses the attribute docstring syntax.
Any string literal is valid after the caret, including multiline (but not an f-string).
In an expression, caret is the bitwise xor operator, but it is not valid at the start of a statement. At the start of a statement it would define a docstring statement, which must follow a single-name assignment/type annotation statement, otherwise a SyntaxError is raised during compilation. Visually it is not ambiguous because bitwise xor is not defined for strings.
In parameter lists there is syntactic ambiguity with the infix ^ if you donât leave the trailing comma, so it would need to be special cased that caret-strings in parameter lists document the previous parameter, e.g.
def length(x, ^"x coordinate", y, ^"y coordinate"):
but that would look more sensible if folded (perhaps always by black/ruff) to
def length(
x,
^ "x coordinate",
y,
^ "y coordinate",
):
If the original reason for rejecting PEP-224 was ambiguity about which value was documented (and possibly changing the semantics of existing code), a docstring operator would relieve this.
I like to push things further and think not only about attributes and parameters, but also about returned/yielded values, raised exceptions, emitted warnings.
def do_everything(
parameter: Any,
^ "Docs for parameter. OK.",
) -> Any:
^ "Docs for returned value. This location is confusing."
"""Summary."""
warnings.warn("Deprecated function.", DeprecationWarning)
^ "Docs for emitted warning. OK."
yield 0
^ """
Docs for yielded value. Not working (missing information like type).
Besides, we can have multiple yield statements in the function body,
so after which one(s) do we write the docs?
"""
yield 1
received = yield 2
^ (
"Docs for received value. This is confusing: "
"are we documenting the received or yielded value? "
"Also, missing type information."
)
try:
return can_raise()
^ "Docs for returned value. Not working (same reasons as yield)."
except KeyError:
return None
except ValueError as error:
raise RuntimeError("Message.") from error
^ "Docs for raised exception. OK."
Sorry, Iâm aware the thread is called âRevisiting attribute docstringsâ, itâs just that IMO thereâs no point in creating new syntax/standards if they donât cover all the existing use-cases (for which weâd have to fallback again on docstring styles)
I require frequent refactoring, when re-agencing the inputs/outputs I typically focus on the chain of operations performed and do not pay attention to updating the docstrings.
This is why I usually donât write docstrings, or very expeditive ones. However, the inputs/outputs are usually much more consistent from one factoring to another. Also I like functions correctly docstringed, especially __init__ and __call__.
I would use these attribute docstrings if available because they will ease docstring update on refactoring a lot (although I never required them). Also I would like to have them for inputs and for outputs as well.
Now what I think about the syntax, the ; is already used as a instructtion (or line) separator, and I find it difficult to read, the ^ is also difficult to read when used as one-liner.
What I would find easier is by actually having a new type of strings dedicated to docstrings, e.g. d-strings.
Example : EDIT : I did just read PEP727 after posting this⊠This syntax has redundancy with PEP727 (although more âlightweightâ).
class MyClass:
"""Manages things"""
def __init__(self,
a d"parameter a",
b=None d"parameter b (optionnal)"):
"""Init the class that manages things."""
...
def __call__(self, x d"value x in some units"):
return (c_a d"value computed from a and x",
c_b d"value computed from b and x")
Here I didnât use any operator, assuming the d"..." strings are consistently interpreted as associated âpartialâ docstrings to variables (Yet the use of : or another operator can still be considered).
The proposals may have been rejected because this doesnât need a long-form proposal with mathematical reasonings, tâs common pythonic sense. The docstring sits below function signatures, so where else can the docstring of its members sit? And of course the same applies to every declaration.