PEP 727: Documentation Metadata in Typing

davidhalter · September 9, 2023, 9:36am

As a maintainer of Jedi I was asked to provide feedback here. Thanks for reaching out!

I generally agree that it would be nice to have documentation for params in a more structured
way. There are a few things that could use improvement:

I feel like there should be some sort of typing.get_doc() call for annotations at runtime. It’s just very normal that one is able to extract information from the typing module. Unlike PyCharm/mkdocs/VSCode, Jedi isn’t just a static analyzer, it can also be used within IPython/Jupyter and extract runtime information.

>>> def f(a: int): ...
...
>>> a = inspect.signature(f).parameters['a']
>>> a.annotation
<class 'int'>
>>> typing.get_doc(a.annotation)

Like others have pointed out: It feels a bit wrong to use Annotated for this. If this is a really a Python stdlib provided feature, I would probably prefer something like Doc[str, "The name of the user"], which is also shorter and more concise.

def foo(
    x: Doc[str, "Lorem Ipsum"],
    y: int,
) -> Doc[bool, "Lorem Ipsum Dolor"]:
    ...

I feel like that’s way more readable and for the tools like Mypy/Jedi/VSCode/etc, this is not a lot of extra work.

pf_moore · September 9, 2023, 11:45am

This feels like the key thing here. If I can indulge in a little history, originally annotations had no defined semantics, and they were explicitly for people to use for whatever they wanted, to encourage experimentation. Typing was always one anticipated purpose, but not the only one. But nobody really did much with annotations - there were a few experiments, but not much mainstream use.

So then the decision was made that annotations were for typing - it was always an intended use case, and no-one else had come up with other compelling uses, so let’s make the decision.

But once types became accepted, and common, people started finding other things that could be attached to variables, parameters, etc, and all those use cases that had never come up before started appearing. And so we have Annotated, marking things as deprecated, and discussions like this, about how we can cram non-typing annotations into the type system.

Maybe what we need to do is take a step back, and thing about how we can make non-typing uses of annotations sit more comfortably with type annotations? Not with Annotated, which frankly feels like a hack (and an unappealing, clumsy one at that…), but with some form of first-class way of allowing other use cases to grab a part of the annotation space for themselves, separately from typing.

Something like Doc works a bit like this, although it’s still not obvious how I’d use it to attach a docstring to something I didn’t want to declare a type for.

I guess what I’m really saying is that Annotated doesn’t really work, because it’s based on a presumption that anything that’s not a type is a “second class citizen”. And what we need is to re-think Annotated and produce something that’s less biased towards the “everyone uses types” mindset that tends to prevail in the typing community (for obvious and understandable reasons).

methane · September 9, 2023, 1:18pm

My current opinions:

document is not type. It would be nice if docstring is available without typing module.
It is really nice if docstring is available from both of runtime and statically (AST or CST).
I am worrying about annotation and Annotated are overused. I’d like annotation is just type hint.

So I prefer one of these approaches:

Add new syntax for function argument docstring.
- e.g. def (a "comment a" : int, b "comment for b" : str)
Add some formalized text structure for function docstrings.
- At least C# has this, although I don’t like XML.
- PHPDoc looks like almost standard.

stereobutter · September 9, 2023, 1:37pm

I’d also like new syntax for this, maybe

def add(a: int ("an integer"), b: int ("another integer")) -> int ("result"):
    return a + b

brettcannon · September 11, 2023, 11:47pm

Those are parsed as function calls and you can’t avoid that due to runtime introspection potentially wanting to use function calls to produce some object that represents the type.

UltimateLobster · September 12, 2023, 3:35am

I like the idea of having documentation closer to the definition of the field. It’s easier when you have both of them in the same place.

I also like the idea of having a standardized convention of defining documentation per attribute and thus a standardized way to introspect these.

However, I have a few problems with the proposed syntax:

It feels awkward to use the Annotated feature for this. The need to import something from typing in order to achieve this does not seem fun.
Documentations may be quite long, writing them in the field definition will mostly “force” you to write the annotation over multiple lines (assuming you want to adhere to PEP8).
It may disturb the ease of reading the actual definition.

While I assume it would be a harder implementation to write, may I suggest an alternative like expanding the current __doc__ feature to work in other contexts?

Something that will (roughly) look like so:

def foo():
    """
    Currently, this documentation will be automatically set on foo.__doc__
    """

class Bar:
    a: int
    """"
    The new feature allows this is documentation to be automatically set on Bar.a.__doc__
    """"

ntessore · September 12, 2023, 8:04am

Similar to the class attribute docstrings, “K&R style” declarations could be nice, as they separate the runtime and “static” (type, docstring) information, while being backward compatible. (It would still need a new place to store parameter docstrings, e.g. a new Documented[] in __annotations__.)

def frobnicate(widget, param, value):
    """Set widget's parameter to value."""

    widget: BaseWidget
    """The widget to frob."""

    param: str
    """The parameter to frob."""

    value: int
    """Parameter value to set."""

Jelle · September 18, 2023, 11:41pm

I released typing-extensions 4.8.0 yesterday with support for typing_extensions.Doc: typing-extensions · PyPI

hugovk · September 19, 2023, 5:37am

I’ve updated this PEP 727 example to use typing-extensions:

pawamoy · September 21, 2023, 7:27pm

Collapsing because it's a bit off-topic:

I find @ntessore’s suggestion very interesting.

it makes signatures ultra short and readable
typing information is immediately available at the beginning of the function body
parameters documentation as well

That doesn’t solve the case for documenting return values, or other common things like exceptions, warnings or deprecations. But it makes me wonder if this suggestion could be expanded a bit more:

def frobnicate(widget, param, value):
    """Set widget's parameter to value."""

    widget: BaseWidget
    """The widget to frob."""

    param: str
    """The parameter to frob."""

    value: int
    """Parameter value to set."""

    ...

    warnings.warn(
        "The `value` parameter is deprecated and will be removed in a future version",
        DeprecationWarning,
    )
    """value: When the `value` parameter is used."""
    # This docstring is here to document the warning.
    # The deprecation is detected thanks to DeprecationWarning,
    # and `value:` at the beginning lets the analysis tools know
    # that the subject of the deprecation is the `value` parameter.
    # `frobnicate:` instead would target the function itself.

    if condition:
        raise exceptions.CustomError("message")
        """When a certain condition is met."""

    ...

    return foo(bar)
    """optional_name: A transfooed bar."""

As much as I like it, it’s a static-only solution: none of these docstrings can be picked up at runtime.

adriangb · September 28, 2023, 4:47am

Writing up my thoughts on this after thinking about it for a bit and reading this discussion.

You can always use type aliases:

Users = Annotated[list[User], doc("A paginated list of users")]

def foo(users: Users): ...

As pointed out above this is great for reusability:

Consider the case of an APIs that accept an API key or similar, that type gets used in multiple endpoints. What Annotated lets you do is form something like:

APIKey = Annotated[
    str,
    FromHeader("x-api-key"),  # web framework metadata
    StringConstraint(pattern=r"\w{32}"),  # data validation metadata
    doc("The user's API key"),  # the doc stuff discussed in this PR
]

As long as the various tools can understand each other the web framework can also use the doc() part and the StringConstraints() part to generate it’s JSON schema. I often find this beneficial if nothing else to give gross large types a meaningful name and to clean up the function declaration.

I do recognize that it can be strange to have this information before the type is used in a function signature. But it’s not like docstrings were any closer to the function parameter (see comment above about scrolling back and forth). This the situation is still not all that grave: unlike docstrings, you can click on the : Users part and be taken to the definition, be it 2 lines above or in a completely different file. In fact sometimes you want to move that somewhere else, like in the case of the APIKey type I showed above.

Nonetheless, I do agree that as it exists right now Annotated is way too verbose. Especially with the import. I wish there was a way to make it a builtin or we could use some valid but otherwise unused syntax to avoid having to type out Annotated all over. I don’t see that as a reason not to use it, rather to the contrary: if it become popular and is used a lot we just need to figure out a way to make it less verbose to use.

Regarding standardizing this via a PEP: I empathize with both sides of the argument. I think the answer to this is to experiment as a 3rd party library first but get good buy-in from the ecosystem at the same time. I feel that the ecosystem can fall into this rut of chicken and egg: no one implements anything until it’s “official” but we can’t make it official until there’s extensive usage in the wild. I won’t sit here and say that IDE and tooling developers should all just put in more work to implement experimental proposals like this, but I will tip my hat off to the folks that do like pyright, typing-extensions, mkdocstrings/Griffe and others.

What to me would be the ideal solution (which was somewhat mentioned above) would be to preserve docstrings added to variables and parameters, thus allowing examples like this to work at runtime:

APIKey: Annotated[str, ...]  # or not using Annotated, doesn't matter
"""The API key for the user"""

class Foo:
    key1: APIKey,
    key2: APIKey
    """Overridden doc for APIKey"""
    key3: str
    """A brand new doc"""

def foo(
    key1: APIKey,
    key2: APIKey
    """Overridden doc for APIKey""",
    key3: str
    """A brand new doc""",
):
    """A docstring for the function, without documentation for the parameter"""

The class version and free variable version pretty much work and IDEs support them, there’s just no information at runtime so FastAPI, Pydantic, etc. can’t use them. The function version would require syntax changes. I think this option is better overall but harder to implement since it really does need buy in for syntax changes before it can be viable and adopted by IDEs and other tooling. So maybe doc() is a good starting point to build towards this and explore uses of Annotated.

JacobHayes · October 2, 2023, 11:58pm

I prototyped a library last year called sigdoc that implemented something very similar to this, but as runtime __doc__ generation (and obviously no static analysis/generation support).

sigdoc uses separate P(arameter) and R(eturn) types within the Annotated sections. Both support a type_hint= arg (to override very verbose runtime resolved hints) and P supports a default= arg (to tidy str representations or describe dynamic/conditional defaults). The class/function is then decorated with @document, which stitches together the main __doc__ + the Annotated metadata into a new __doc__. @document accepts a style argument that determines what format to output (numpydoc, etc). I never got around to adding a Raises annotation.

With the standardization in this PEP, I think sigdoc could either be greatly simplified (to only do the __doc__ generation) or, even better, made obsolete.

I think the current PEP’s decision to punt on additional metadata or standardizing a docstring style is reasonable.

In this PEP, should __doc__, help, or other places docstrings are rendered at runtime include the doc(...) info, as they would normally be with “traditional” docstrings?

That would probably require picking a standard/default docstring style to render and generally be a bit too magical if it were to update __doc__. Though, it might ease:

using doc() with older tools that are unaware of it (probably not worth any constraints on new tools’ ability to render it how they like though)
use in libraries without changing how users inspect/debugging (though again, help already includes func sig)

This might bring more complexity than benefit long term.

I think this was partially mentioned above, but doc(...) could be powerful with ParamSpec and Concatenate to allow documentation for even dynamically added/modified parameters. This ability to add/remove would be one other advantage over parsing from standardized (but still static) docstrings.

This could get tricky in the unlikely event this PEP does any of the __doc__ manipulation I mentioned above.

Liz · October 3, 2023, 12:22am

I hope this never becomes the norm for documentation. Paramspec and concatenate are for wrapping other functions, and just generating something saying it abstractly wraps another function instead of the author of the wrapper documenting the purpose of the wrapping seems like documentation becoming for machines and not for people.

JacobHayes · October 3, 2023, 1:15am

I’m not sure where I implied this should just generate something abstract. Isn’t the “author of the wrapper” the one who creates the specific ParamSpecs - and thus able to document what they want?

If you’re using ParamSpec to:

add a parameters like a lock, why shouldn’t the new param be documented for users?
remove a parameter that was written by hand in the """docstring""", then the docstring is now wrong

In other words, this would allow easily documenting ParamSpec params for the user.

Liz · October 3, 2023, 1:29am

I’m sorry if I misunderstood, but the use you described, seemingly in support of this, is definitely stitching together documentation.

This is the beahvior I never want to become the norm. It’s not useful to a human reader.

I’m not. Paramspec is useful for handling user provided callbacks with arguments and (for the user) ensure they match. It doesn’t do much more than that, its very limited. I use it (And concatenate) with a decorator pattern for route handling. it doesn’t make sense for me to document anything by type here, I don’t know what the user provided type is, if I did or if I was enforcing one, I’d use a protocol, not a paramspec. Paramspec doesn’t make sense to use entirely internally as it provides worse checks in the case that you know the args and kwargs already.

The best documentation I can add for such a decorator is “This decorator inspects the type hints of the provided function to generate an IPC route and register this function as handling it. The first argument must be of type: IPCContext and will be injected prior to the ipc route arguments being handled” The typing on this just warns the user if they didn’t have a parameter for IPCContext.

mikeshardmind · October 3, 2023, 1:45am

Paramspec doesn’t do this, as was pointed out already. If you’re looking for this, you actually want one of a few other proposals, and good documentation could be linked to in each relevant function

There’s the proposed ability to use typing.Unpack on typing.TypedDict for kwargs. Then if you have a bunch of functions with the same kwargs and purpose, the TypedDict can represent them and have an appropriate docstring which is local to the kwargs. This can be done without the proposed typing.Doc but is also just broadly useful even for more expressive typing.

Or to extract and re-use kwargs from functions which is explicitly about direct re-use, and may provide a good solution for that problem.

tiangolo · October 3, 2023, 7:59pm

Thanks everyone for all the feedback and discussion. Quick update:

I updated the title to: “Documentation in Annotated Metadata”
I updated the PEP to use a single class Doc(), no function doc(). It’s already implemented and released this way in typing_extensions.
I removed the section about “Additional Scenarios” as it seems it was creating more confusion than help.
I made it more explicit that “support” comes in at least two ways, editing and rendering.
I added explicit information about using type aliases and re-using documentation strings via type aliases.
I tried to clarify what Annotated documents, and how it would be overridden by wrapping type aliases in Annotated.
I made the argument to Doc() positional only.
I added a simpler bullet point list of the features this would address that are not addressed by current docstring conventions.
I added a simple note to suggest to render the docstring first and the parameter documentation later, or to allow configuring.
I updated several sections based on feedback from several in the discussion.
I updated the rejected ideas.
I added a short survey of how other languages document these things.

Now I’ll start using it in FastAPI, Typer, SQLModel, Asyncer, and that will serve as a “lab experiment” to try this proposal in the real world.

ali · October 3, 2023, 9:29pm

Thank you for the effort you put into this.
I would like to share my concerns with this proposal. While I share the concerns about readability, the effect on end-users, and social pressure as many have already pointed out, I want to add two more points that affect my typical workflow.

In my daily workflow both at work and in hobby projects, I tend to use an IDE (VSCode+ Pylance) to hover over code to see both type annotations and function documentation (docstring rendering). When I need to see the implementation of a function, I click on the “go to source” shortcut and the first thing I do is fold (hide) the docstring and jump directly into the code. I am afraid if this proposal becomes standard, it will very much complicate this use case as documentation and code become tangled and it becomes impossible to hide one without hiding the other.

My other concern is that by using Annotated and Doc from the typing module for documentation instead of the built-in docstring syntax, even the simplest function would now require importing the typing module. Previous PEPs like 585 and PEP 604 and recently PEP 695 are a huge step towards a better integration of typing into the core language. They brought improvements to the typing ecosystem in that they alleviate the need to import typing in the common case thus making typing more “natural” for the common Python user. This PEP is, in my opinion, a step backwards in this regard. If this becomes standard, I am imagining Annotated and Doc will become the new “most imported symbols from typing”.

Finally, I think that the fact that none of the popular languages mentioned in the survey section in the PEP uses a similar construct should maybe be an indicator that this may be not the best approach to tackle this issue.

ssweber · October 3, 2023, 9:54pm

How does this work in relation to typing stubs? If griffe (@pawamoy) and IDEs would pickup this Doc() information from a stub file, this could provide excellent separation of code/documentation, while alleviating some of the concerns about readability.

JacobHayes · October 4, 2023, 2:08am

How should runtime rendering/inspection of __doc__ or help(...) handle Docs - if at all? Eg: what are the impacts of runtime debugging or inspection for users if a library uses Doc?

Modifying __doc__ is likely too magical (and might conflict w/ other runtime tools wanting to parse Doc). help(...) at least includes the full function signature, so maybe that is enough?

I think Alex mentioned something similar above PEP 727: Documentation Metadata in Typing - #81 by AlexWaygood

Other Tangents

Yes, sigdoc takes the human written '''docstring''' description and formats in the Parameters and Returns sections (using your selected doc style) using the human written documentation Annotated in-line with the function signature. Stitching yes, but none of the “abstract” or “this wrapper wraps X” added to the docstring that I understood your comment to imply. Everything is written by humans except section headers while users see the exact same docstring (except they’re more likely to be in sync after refactors).

Also, to clarify - I currently do not use sigdoc with ParamSpec, just to document normal funcs/classes. I only mentioned it as an interesting idea.

I’m not saying to document the passed in function’s existing params, but new or removed params from Concatenate. Admittedly I haven’t used ParamSpec much yet, so happy to defer to your and Michael’s use there.

Sorry for missing that, I didn’t and don’t see that in this thread - do you mind linking so I can see the context/limitations?

As far as I could tell from the ParamSpec PEP (def add(x: Callable[P, int]) -> Callable[Concatenate[str, P], bool]:) and some local messing around, it seemed like I could add (positional) args to a function with ParamSpec (and a mypy reveal_type showed the hints for the new + original params). If I’m understanding the draft PEP right, this is even mentioned in there:

Parameter documentation inheritance for functions captured by ParamSpec