PEP 727: Documentation Metadata in Typing

pawamoy · August 12, 2024, 3:30pm

Quick, personal summary, with new thoughts:

Counter-argument: “it’s not typing”. Annotated was designed to annotate types, true, but also says:

As such, Annotated can be useful for code that wants to use annotations for purposes outside Python’s static typing system.

Unfortunately the typing module is the only meaningful location in the standard library for Doc (next to Annotated). There isn’t a generic documentation-related standard module (maybe we could create one ?). Even if many do not use or consider typing primarily as documentation, it is still also documentation.
Counter-argument: “it’s too verbose”. Not really. Spacing taken aside, you can alias Annotated as A, then A[int, Doc("string")] is ~10 characters more than int, for each documented parameter. But from that we could substract the average length of parameter names, since we don’t have to repeat them in function/class docstrings.
Counter-argument: “it makes big signatures”. Big signatures can be collapsed (partly or completely) with adequate tooling. Even without Doc, signatures with annotated types can grow quite big, so it would make sense to me that IDEs and text editors would allow collapsing each parameter in signatures, maybe just showing the (truncated) unannotated type and (truncated) default value.
Counter-argument: “it’s not pretty or as pretty as function docstrings”. Personal preference
Counter-argument: “accepting it in the stdlib puts pressure on the ecosystem to adopt it”. Recent thought I had: True, but the opposite is true too. By not providing a standard way to structure docs data, pressure is put on the ecosystem to rely on non-standard, divergent, unspecified and sub-optimal docstring styles.

Anyway, I hope I’m not starting a debate over again I’m actually happy if Doc lives in a third-party library Just wanted to share my perspective again as someone who works a lot with docstring styles (and grows to enjoy them less and less). I hope we can continue working and discussing towards a standard solution even after this PEP is withdrawn or rejected

Jelle · August 12, 2024, 3:54pm

It already does, as typing_extensions.Doc, and you are free to use it. We’ll probably keep it in typing_extensions indefinitely even if the PEP gets withdrawn or rejected, for backwards compatibility reasons.

You are free to use it in your own code using the typing-extensions version. If usage of typing_extensions.Doc becomes widespread, that will be a good argument for accepting the PEP and putting it in the standard library.

pawamoy · August 12, 2024, 4:11pm

Ah that’s good to know, thanks!

blhsing · September 19, 2024, 9:33am

Michael H:

def some_function(
    some_parameter: SomeType @ "Some documentation goes here"
) -> SomeReturn @ "Some details about this return type":
    """ Documenting the function itself here """

If @ is deemed too cryptic by many, then how about -?

- can be intuitively read as em dash, used in places where a set of parentheses or a colon might otherwise be used:

def some_function(
    some_parameter: SomeType - "Some documentation goes here",
) -> SomeReturn - "Some details about this return type":
    """ Documenting the function itself here """

And as a bonus - is both a binary operator and a unary operator so there’s the potential to allow a parameter to be annotated without a type:

def some_function(
    some_parameter: SomeType - "Some documentation goes here",
    **kwargs: - "Additional keyword arguments"
) -> SomeReturn - "Some details about this return type":
    """ Documenting the function itself here """

Some might argue that em dash is really a long dash, usually represented in ASCII as two dashes, which we can also consider as an alternative for better aesthetics:

def some_function(
    some_parameter: SomeType -- "Some documentation goes here",
    **kwargs: -- "Additional keyword arguments"
) -> SomeReturn -- "Some details about this return type":
    """ Documenting the function itself here """

This would make the syntax more in line with SQL comments.

Nineteendo · September 19, 2024, 9:50am

Doesn’t that cause problems?

>>> string = "foo"
>>> -string
Doc("foo")

Surely it would be better to write this?

def some_function(
    **kwargs: Any - "Additional keyword arguments"
) -> None:
    ...

blhsing · September 19, 2024, 9:52am

I don’t think it’s a problem because it currently produces a TypeError. There’s otherwise no intuitive use for it as a unary operator for a string anyway.

Yes, but my point is just so we can keep type annotation optional.

mikeshardmind · September 19, 2024, 12:32pm

Ben Hsing:

def some_function(
    some_parameter: SomeType - "Some documentation goes here",
) -> SomeReturn - "Some details about this return type":
    """ Documenting the function itself here """
And as a bonus - is both a binary operator and a unary operator so there’s the potential to allow a parameter to be annotated without a type:

Even if this was worth pursuing, and I don’t think it is, @ was used in that hypothetical because there is no conceivable future to need matmul for types. Using - is problematic due to the potential for difference types in the future (ie. Iterable[str] - str, being any iterable of strings that isn’t just a string), paired with strings being used for forward references in typing.

Keep in mind that there were plenty of other things brought up in this discussion against this idea that don’t matter what the syntax actually would be.

blhsing · September 19, 2024, 1:08pm

Good point, so type - 'string' is problematic because 'string' can be used as a forward reference of a type, but I think type -- 'string' can still work because - 'string' can be made a Doc('string'), which type.__sub__ can special-case and make Annotated[type, Doc('string')].

Yes, the syntax just occurred to me and I thought I’d share it here just in case the other arguments against this idea get resolved later.

EDIT: Actually, I think I may have just resolved what I believe to be the biggest argument against this idea, namely that the original proposal makes the annotation too verbose, to the point that it adversely affects readability.

So instead of:

def some_function(
    some_parameter: Annotated[SomeType, Doc("Some documentation goes here")],
    **kwargs: Annotated[Any, Doc("Additional keyword arguments")]
) -> Annotated[SomeReturn, Doc("Some details about this return type")]:
    """ Documenting the function itself here """

It can now be:

def some_function(
    some_parameter: SomeType -- "Some documentation goes here",
    **kwargs: Any -- "Additional keyword arguments"
) -> SomeReturn -- "Some details about this return type":
    """ Documenting the function itself here """

which looks a lot less verbose and more readable to me.

Perhaps it’s enough to nudge this proposal back on track for a reconsideration?

DanCardin · September 20, 2024, 1:40pm

just as a refresher of the arguments against, i read the first ~60 comments, and most of the arguments were against the syntax and social pressure.

I continue to think that the primary benefit of this PEP isn’t even in the syntax, so much as it is in the standardization of any location at all for this information to exist for runtime inspection.

The existence of a standard location for the information to exist enables better syntax/bikeshedding in some future PEP. It enables some standard post-processor for “attribute docstrings” so that you dont need to traverse the ast to obtain it. It enables a post processor for numpy/google docstring parsers to put that information into the standard location.

The point being that it becomes easier for libraries/tools that need runtime inspection of the values to “just” inspect the annotations for Doc instances, rather than 5 different mutually incompatible options.

rsdenijs · September 21, 2024, 9:12am

That and the potential to reuse docstrings. Nothing is more demotivating than having to document the same variable in 20 places over the code base.

Melendowski · September 22, 2024, 4:23am

I’m actually struggling with this in a current project, to the point that I have to copy paste between functions, for fear of not being consistent. Having to type it once would be of great use.

bwoodsend · September 22, 2024, 12:39pm

I’d be cautious of using this to deduplicate parameter descriptions. I’ve had description deduplication working before and immediately regretted doing it. Reading an API reference that’s full of duplicity has all the caveats of reading code that’s full of duplicity. i.e. It’s hard to find what you’re looking for past all of the repetitive boilerplate that you’ve read a gazillion times already.

I found it a lot better to just write a canonical description once then rely on cross references. Then all you have to duplicate is the fairly immutable string see :func:`xyz` .

Melendowski · September 26, 2024, 3:22am

Eh, this seems like such an overgeneralization without having the example source code.

Take pandas for example, you know how many methods between a series and data frame have the inplace or axis argument? Would it not be more jarring for there to exist the logically equivalent but differently written description for the same argument in two methods? For what reason of course, because two different developers wrote the two different methods?

Afaik pandas handles a lot this by having runtime docstring manipulations through decorators.

bwoodsend · September 26, 2024, 8:24am

What meaningful piece of information would you put in the description for a dataframe parameter that’s applicable to every time it’s used and isn’t already implied by the type hint? “A pandas dataframe”? “The dataframe to process”? Unless there’s some unique property a specific function’s dataframe requires, I’d say any description for that parameter is more noisy boilerplate than signal.

My bad, only now I do I see you’re talking about a dataframe method rather than a parameter.

And to be honest, to me the runtime docstring manipulations that a lot of the scientific packages do are already leaning towards the wrong side of what I’m describing. 100% of the help for numpy.sum() for example is just generic numpy.

bwoodsend · September 26, 2024, 8:31am

I suppose I don’t consider it impossible for deduplicated descriptions to be useful. Merely that I think it’s more likely to be abused in the name of do not repeat yourself and leave us with an ecosystem made up of vague, boilerplatey API references.

mikeshardmind · September 26, 2024, 12:32pm

I don’t consider deduplicating descriptions by making the code less useful to developers a good tradeoff. (I’m not saying that pandas replacements at runtime are this, but I think we have clear examples of how noisy Doc[...] gets in diffs and merge conflicts already, as well as several hundred lines of just parameters for a single function)

Documentation tools already have a way to handle repeated notes, warnings, and descriptions without needing to repeat them literally, but instead reference them to be inserted here. If you want your documentation to improve, the best way is to take the time to set up documentation tooling and write more of it yourself. Trying to assemble more of it programmatically while trying to avoid needing to write any surrounding documentation will always be noticeable to a reader that this was stitched together, and documentation generated this way with no adjustments has always felt worse to read and reference.

I’m aware this means that users need to view rendered documentation, I think that’s acceptable. Libraries have gotten significantly more complex with time, and we have the tools to not only render documentation and host it, but also to make downloadable versions of that rendered documentation available, as well as ensure that users can locally build it themselves.

EpicWink · September 26, 2024, 11:18pm

I disagree, I think Python code should always be fully understandable from it’s source, including docstrings. Anytime I see bespoke documentation includes ^[1] in docstrings, I get frustrated and my flow is ruined.

I would hope that dedication can be argued against, and taught to be avoided, but used in specific cases where the benefits outweigh the downsides, as determined by experience. I of course have not been on the receiving end of spam PRs.

Also, I would expect basically no users download offline documentation for Python libraries, and even fewer would build the docs from source.

eg Sphinx’s directive .. literal include: FILE ↩︎

JPHutchins · September 29, 2024, 11:14pm

Because this came up over and over again, I must 2nd this as well as add my point of view. Having taken a look at the FastAPI signature oft pointed to, I find it far more readable, in all contexts, than the old unstandardized docstring param list alternatives.

The docstring approach requires a need to jump around the source code whereas this new style keeps the documentation tightly coupled. Arguments against this tight coupling remind me of over-abstracted code bases where the proliferation of tiny functions requires the reader to jump about the text, all while maintaining a “mental call stack”.

Both docs approaches lead to good auto generated docs - though docstrings would be prone to error whenever a signature changes - but generated docs or other “readable” docs should never be necessary. This PEP improves the source code readability, lowers the total LOC, makes linting docs easier, eases maintenance, and reduces surface area for documentation-caused bugs.

Obviously, function signature readability is a matter of opinion, therefore I am offering a strong one. The source code is the documentation, and this PEP improves the source code.

ajoino · September 30, 2024, 9:12am

Is this really the case if you put the Doc[...] element on a separate line, which I presume would be the case. Then you’d get

def foo(
        ...
        bar: Annotated[
            Doc["Bar to be foo'd"],
            Bar,
        ],
        ...
    ):
    ...

compared to

def foo(
        ...
        bar: Bar,
        ...
    ):
    """
    ...
    bar: Bar to be foo'd
    ...
    """
    ...

Both of these have 6 lines not counting the ellipses, so it’s not clear to me that this proposal would reduce total LOC.

JPHutchins · September 30, 2024, 8:13pm

That’s a good point. Here’s what I’d expect, in a normal case:

def foo(
    ...
    bar: Annotated[Bar, Doc("A short description of this param")],
    ...
) -> None:
    """Docstring begin...

vs

def foo(
    ...
    bar: Bar,
    ...
) -> None:
    """Docstring begin...

    Args:
       bar (Bar): A short description of this param

So 6 LOC (+ remaining docstring) for new style and 9 LOC (+ remaining docstring) for Google style. The extra 3 lines are 1) newline before Args: section, 2) Args: line, 3) doc line. It becomes more of a wash if the argument docstrings are long.

Worst case scenario at scale seems to to be +1 LOC for every function parameter.

JP

PS: As much as I value explicit defs and behavior, I’d probably do import typing.Annotated as A or similar if this became standard.

def foo(
    ...
    bar: A[Bar, Doc("A short description of this param")],
    ...
) -> None:
    """Docstring begin...

Or untyped Doc annotation option:

def foo(
    ...
    bar: Doc["A short description of this param"],
    ...
) -> None:
    """Docstring begin...

Or a typed Doc annotation option with doc as 2nd positional

def foo(
    ...
    bar: TDoc[Bar, "A short description of this param"],
    ...
) -> None:
    """Docstring begin...