PEP 727: Documentation Metadata in Typing

:cry:

Quick, personal summary, with new thoughts:

  • Counter-argument: ā€œitā€™s not typingā€. Annotated was designed to annotate types, true, but also says:

    As such, Annotated can be useful for code that wants to use annotations for purposes outside Pythonā€™s static typing system.

    Unfortunately the typing module is the only meaningful location in the standard library for Doc (next to Annotated). There isnā€™t a generic documentation-related standard module :confused: (maybe we could create one :smiling_imp:?). Even if many do not use or consider typing primarily as documentation, it is still also documentation.

  • Counter-argument: ā€œitā€™s too verboseā€. Not really. Spacing taken aside, you can alias Annotated as A, then A[int, Doc("string")] is ~10 characters more than int, for each documented parameter. But from that we could substract the average length of parameter names, since we donā€™t have to repeat them in function/class docstrings.

  • Counter-argument: ā€œit makes big signaturesā€. Big signatures can be collapsed (partly or completely) with adequate tooling. Even without Doc, signatures with annotated types can grow quite big, so it would make sense to me that IDEs and text editors would allow collapsing each parameter in signatures, maybe just showing the (truncated) unannotated type and (truncated) default value.

  • Counter-argument: ā€œitā€™s not pretty or as pretty as function docstringsā€. Personal preference :person_shrugging:

  • Counter-argument: ā€œaccepting it in the stdlib puts pressure on the ecosystem to adopt itā€. Recent thought I had: True, but the opposite is true too. By not providing a standard way to structure docs data, pressure is put on the ecosystem to rely on non-standard, divergent, unspecified and sub-optimal docstring styles.


Anyway, I hope Iā€™m not starting a debate over again :sweat_smile: Iā€™m actually happy if Doc lives in a third-party library :slightly_smiling_face: Just wanted to share my perspective again as someone who works a lot with docstring styles (and grows to enjoy them less and less). I hope we can continue working and discussing towards a standard solution even after this PEP is withdrawn or rejected :smiling_face:

1 Like

It already does, as typing_extensions.Doc, and you are free to use it. Weā€™ll probably keep it in typing_extensions indefinitely even if the PEP gets withdrawn or rejected, for backwards compatibility reasons.

You are free to use it in your own code using the typing-extensions version. If usage of typing_extensions.Doc becomes widespread, that will be a good argument for accepting the PEP and putting it in the standard library.

7 Likes

Ah thatā€™s good to know, thanks!

If @ is deemed too cryptic by many, then how about -?

- can be intuitively read as em dash, used in places where a set of parentheses or a colon might otherwise be used:

def some_function(
    some_parameter: SomeType - "Some documentation goes here",
) -> SomeReturn - "Some details about this return type":
    """ Documenting the function itself here """

And as a bonus - is both a binary operator and a unary operator so thereā€™s the potential to allow a parameter to be annotated without a type:

def some_function(
    some_parameter: SomeType - "Some documentation goes here",
    **kwargs: - "Additional keyword arguments"
) -> SomeReturn - "Some details about this return type":
    """ Documenting the function itself here """

Some might argue that em dash is really a long dash, usually represented in ASCII as two dashes, which we can also consider as an alternative for better aesthetics:

def some_function(
    some_parameter: SomeType -- "Some documentation goes here",
    **kwargs: -- "Additional keyword arguments"
) -> SomeReturn -- "Some details about this return type":
    """ Documenting the function itself here """

This would make the syntax more in line with SQL comments.

2 Likes

Doesnā€™t that cause problems?

>>> string = "foo"
>>> -string
Doc("foo")

Surely it would be better to write this?

def some_function(
    **kwargs: Any - "Additional keyword arguments"
) -> None:
    ...

I donā€™t think itā€™s a problem because it currently produces a TypeError. Thereā€™s otherwise no intuitive use for it as a unary operator for a string anyway.

Yes, but my point is just so we can keep type annotation optional.

Even if this was worth pursuing, and I donā€™t think it is, @ was used in that hypothetical because there is no conceivable future to need matmul for types. Using - is problematic due to the potential for difference types in the future (ie. Iterable[str] - str, being any iterable of strings that isnā€™t just a string), paired with strings being used for forward references in typing.

Keep in mind that there were plenty of other things brought up in this discussion against this idea that donā€™t matter what the syntax actually would be.

1 Like

Good point, so type - 'string' is problematic because 'string' can be used as a forward reference of a type, but I think type -- 'string' can still work because - 'string' can be made a Doc('string'), which type.__sub__ can special-case and make Annotated[type, Doc('string')].

Yes, the syntax just occurred to me and I thought Iā€™d share it here just in case the other arguments against this idea get resolved later.

EDIT: Actually, I think I may have just resolved what I believe to be the biggest argument against this idea, namely that the original proposal makes the annotation too verbose, to the point that it adversely affects readability.

So instead of:

def some_function(
    some_parameter: Annotated[SomeType, Doc("Some documentation goes here")],
    **kwargs: Annotated[Any, Doc("Additional keyword arguments")]
) -> Annotated[SomeReturn, Doc("Some details about this return type")]:
    """ Documenting the function itself here """

It can now be:

def some_function(
    some_parameter: SomeType -- "Some documentation goes here",
    **kwargs: Any -- "Additional keyword arguments"
) -> SomeReturn -- "Some details about this return type":
    """ Documenting the function itself here """

which looks a lot less verbose and more readable to me.

Perhaps itā€™s enough to nudge this proposal back on track for a reconsideration?

2 Likes

just as a refresher of the arguments against, i read the first ~60 comments, and most of the arguments were against the syntax and social pressure.

I continue to think that the primary benefit of this PEP isnā€™t even in the syntax, so much as it is in the standardization of any location at all for this information to exist for runtime inspection.

The existence of a standard location for the information to exist enables better syntax/bikeshedding in some future PEP. It enables some standard post-processor for ā€œattribute docstringsā€ so that you dont need to traverse the ast to obtain it. It enables a post processor for numpy/google docstring parsers to put that information into the standard location.

The point being that it becomes easier for libraries/tools that need runtime inspection of the values to ā€œjustā€ inspect the annotations for Doc instances, rather than 5 different mutually incompatible options.

5 Likes

That and the potential to reuse docstrings. Nothing is more demotivating than having to document the same variable in 20 places over the code base.

2 Likes

Iā€™m actually struggling with this in a current project, to the point that I have to copy paste between functions, for fear of not being consistent. Having to type it once would be of great use.

1 Like

Iā€™d be cautious of using this to deduplicate parameter descriptions. Iā€™ve had description deduplication working before and immediately regretted doing it. Reading an API reference thatā€™s full of duplicity has all the caveats of reading code thatā€™s full of duplicity. i.e. Itā€™s hard to find what youā€™re looking for past all of the repetitive boilerplate that youā€™ve read a gazillion times already.

I found it a lot better to just write a canonical description once then rely on cross references. Then all you have to duplicate is the fairly immutable string see :func:`xyz` .

9 Likes

Eh, this seems like such an overgeneralization without having the example source code.

Take pandas for example, you know how many methods between a series and data frame have the inplace or axis argument? Would it not be more jarring for there to exist the logically equivalent but differently written description for the same argument in two methods? For what reason of course, because two different developers wrote the two different methods?

Afaik pandas handles a lot this by having runtime docstring manipulations through decorators.

What meaningful piece of information would you put in the description for a dataframe parameter thatā€™s applicable to every time itā€™s used and isnā€™t already implied by the type hint? ā€œA pandas dataframeā€? ā€œThe dataframe to processā€? Unless thereā€™s some unique property a specific functionā€™s dataframe requires, Iā€™d say any description for that parameter is more noisy boilerplate than signal.

My bad, only now I do I see youā€™re talking about a dataframe method rather than a parameter.

And to be honest, to me the runtime docstring manipulations that a lot of the scientific packages do are already leaning towards the wrong side of what Iā€™m describing. 100% of the help for numpy.sum() for example is just generic numpy.

I suppose I donā€™t consider it impossible for deduplicated descriptions to be useful. Merely that I think itā€™s more likely to be abused in the name of do not repeat yourself and leave us with an ecosystem made up of vague, boilerplatey API references.

I donā€™t consider deduplicating descriptions by making the code less useful to developers a good tradeoff. (Iā€™m not saying that pandas replacements at runtime are this, but I think we have clear examples of how noisy Doc[...] gets in diffs and merge conflicts already, as well as several hundred lines of just parameters for a single function)

Documentation tools already have a way to handle repeated notes, warnings, and descriptions without needing to repeat them literally, but instead reference them to be inserted here. If you want your documentation to improve, the best way is to take the time to set up documentation tooling and write more of it yourself. Trying to assemble more of it programmatically while trying to avoid needing to write any surrounding documentation will always be noticeable to a reader that this was stitched together, and documentation generated this way with no adjustments has always felt worse to read and reference.

Iā€™m aware this means that users need to view rendered documentation, I think thatā€™s acceptable. Libraries have gotten significantly more complex with time, and we have the tools to not only render documentation and host it, but also to make downloadable versions of that rendered documentation available, as well as ensure that users can locally build it themselves.

4 Likes

I disagree, I think Python code should always be fully understandable from itā€™s source, including docstrings. Anytime I see bespoke documentation includes [1] in docstrings, I get frustrated and my flow is ruined.

I would hope that dedication can be argued against, and taught to be avoided, but used in specific cases where the benefits outweigh the downsides, as determined by experience. I of course have not been on the receiving end of spam PRs.

Also, I would expect basically no users download offline documentation for Python libraries, and even fewer would build the docs from source.


  1. eg Sphinxā€™s directive .. literal include: FILE ā†©ļøŽ

2 Likes

Because this came up over and over again, I must 2nd this as well as add my point of view. Having taken a look at the FastAPI signature oft pointed to, I find it far more readable, in all contexts, than the old unstandardized docstring param list alternatives.

The docstring approach requires a need to jump around the source code whereas this new style keeps the documentation tightly coupled. Arguments against this tight coupling remind me of over-abstracted code bases where the proliferation of tiny functions requires the reader to jump about the text, all while maintaining a ā€œmental call stackā€.

Both docs approaches lead to good auto generated docs - though docstrings would be prone to error whenever a signature changes - but generated docs or other ā€œreadableā€ docs should never be necessary. This PEP improves the source code readability, lowers the total LOC, makes linting docs easier, eases maintenance, and reduces surface area for documentation-caused bugs.

Obviously, function signature readability is a matter of opinion, therefore I am offering a strong one. The source code is the documentation, and this PEP improves the source code.

2 Likes

Is this really the case if you put the Doc[...] element on a separate line, which I presume would be the case. Then youā€™d get

def foo(
        ...
        bar: Annotated[
            Doc["Bar to be foo'd"],
            Bar,
        ],
        ...
    ):
    ...

compared to

def foo(
        ...
        bar: Bar,
        ...
    ):
    """
    ...
    bar: Bar to be foo'd
    ...
    """
    ...

Both of these have 6 lines not counting the ellipses, so itā€™s not clear to me that this proposal would reduce total LOC.

1 Like

Thatā€™s a good point. Hereā€™s what Iā€™d expect, in a normal case:

def foo(
    ...
    bar: Annotated[Bar, Doc("A short description of this param")],
    ...
) -> None:
    """Docstring begin...

vs

def foo(
    ...
    bar: Bar,
    ...
) -> None:
    """Docstring begin...

    Args:
       bar (Bar): A short description of this param

So 6 LOC (+ remaining docstring) for new style and 9 LOC (+ remaining docstring) for Google style. The extra 3 lines are 1) newline before Args: section, 2) Args: line, 3) doc line. It becomes more of a wash if the argument docstrings are long.

Worst case scenario at scale seems to to be +1 LOC for every function parameter.

JP

PS: As much as I value explicit defs and behavior, Iā€™d probably do import typing.Annotated as A or similar if this became standard.

def foo(
    ...
    bar: A[Bar, Doc("A short description of this param")],
    ...
) -> None:
    """Docstring begin...

Or untyped Doc annotation option:

def foo(
    ...
    bar: Doc["A short description of this param"],
    ...
) -> None:
    """Docstring begin...

Or a typed Doc annotation option with doc as 2nd positional

def foo(
    ...
    bar: TDoc[Bar, "A short description of this param"],
    ...
) -> None:
    """Docstring begin...