PEP 727: Documentation Metadata in Typing

Sorry, what project is that? There’s straberry on PYPI, but that hasn’t had an update since 2011. There’s also strawberry-graphql and StrawberryFields but I’m not sure if either of those is being referred to.

From the PEP

firstname: Annotated[str | None, doc("The user's **first name**")] = None,

I imagine that this is meant to read “Optional user name” or “User’s first name, if they have one” or something like this?

After all, what’s annotated is either str or None

This actually brings me to a more specific concern. Sorry if this is a bit long:

I feel that the proposal is a little weak with these inlined examples, however it gets stronger if the types are hoisted to the module level and reused, for example:

Surname = Annotated[str, doc("Person's family name, canonical spelling in native script. BOM is not allowed")]

async def register_birth(..., surname: Surname): ...

async def register_name_change(..., old: Surname, new: Surname): ...

async def register_historical_record(..., surname: Annotated[Surname|None, doc("Optional family name, if known")]: ...

What are the semantics of applying doc(...) again? Is the doc annotation replaces, or somehow merged?

Edit: I guess my exact case is covered by “nested annotations”, what if the type is same though, e.g.:

async def register_foreign_document(..., surname: Annotated[Surname, doc("restricted to Latin script")]: ...
2 Likes

I don’t like the idea which blends type hint and docstrings. It’s not elegant at all.

The docstring generation is simple by using github copilot or codeium. It is easy to make the docstring in sync with new code. What’s more, we can choose the style of docstrings with those AI.

I think this is a solved problem, and doesn’t need new typing solution or standard, thanks.

5 Likes

I personally have mixed feelings about Annotated, and don’t like the mixing of type and other metadata, but annotations aren’t (technically[1]) typing-specific.

I can also understand this needing some standardization. Although Annotated is explicitly for arbitrary metadata and there’s nothing stopping a library from adding doc(...) right now, the usage would be limited to cases like typer where the metadata would appear in things like CLI usage strings. Without some interoperability standard it doesn’t make sense for development tools to add support for doc(...) in all the other places docstrings get used.

What stands out to me is that there is some difference between annotations-as-types and annotations as other stuff. I feel the same way about PEP 702. Why would doc or deprecated go in the typing module? [2] It could just be me, but it would be nice to have some clarification of this in general.


  1. An annotation might be a type hint but it could be any arbitrary data, right? ↩︎

  2. …which makes me wonder about even Annotated being in the typing module… Not relevant since that’s already shipped. ↩︎

1 Like

I have already answered in email. Just this much: I would much prefer a convention which relies on doc-strings and uses e.g. @-markers or perhaps f-string {markers} to let documentation tools know that I’m talking about a parameter from the function/method signature.

Example:


def create_user(lastname: str, firstname: str | None = None) -> User
    """
    Create a new user in the system, it needs the database connection to be already
    initialized.

    @lastname and @firstname are used to initialize the user record.

    Returns a @User instance.
    """
    pass

Here, the tool would see that @lastname, @firstname and @User are being referenced, check in the annotations of the signature for details on types, being optional, default values, etc. and then go ahead with giving me a nicely formatted help screen – well, at least in theory :slight_smile:

Even without such a tool, a human would easily recognize the parameters in plain text and make correct associations.

Some other comments (YMMV):

In my experience, documentation is more concise when providing a more holistic view on what’s happening. Documentation of individual parameters isn’t always a good way to achieve this, since parameters are often used in combination, rather than on their own.

I also regard the source code doc-strings as information for fellow maintainers of the software, not really for the end user (API user). It’s good to be terse, to the point and only include what’s needed under the premises that the reader does already have a lot of context.

For the end user (API user), it’s better to have a hand written version of the documentation, similar to what we have in the Python stdlib docs.

15 Likes

I have great respect for your work with FastAPI and Typer, the latter of which I’m an avid user of, and you’ve inspired me to use type annotations for run-time use in my own code. However, I’m not a fan of this proposal for reasons that have already been brought up (mostly that it adds information in one of the most information dense parts of a function definition). It mirrors my least favorite part of Typer, that the help text goes right next to the parameters in the function header (not saying it’s a bad design choice, I honestly can’t think of any other way to do it). I think it’s ok in Typer, even if it makes working on documentation harder IMO, but I would not want to see that on the regular in other code.

For those wanting to discuss how the stdlib could be used to get the same information out of docstrings, I started a new topic. I see such an API as orthogonal to PEP 727, which would codify the existing approach to documentation.

2 Likes

Thanks for writing this PEP! :slight_smile:

I haven’t double-checked that I’m not repeating something that someone else pointed out already. Apologies if I’m repeating something that’s already been discussed/mentioned. I am commenting on the PEP as written, having done a cursory read of the discussion here.


I want to push back on this statement, albeit because :sparkles: nuances :sparkles:.[1]

pyproject.toml as a whole wasn’t “yet another standard”. It came from a place of needing to determine build-time dependencies generically without needing code execution, a problem that we had no existing solution for already.

PEP 518 has an extensive rationale for why a novel approach and new file was proposed for it. PEP 517 has a similarly long rationale for enabling an ecosystem of build-backends, rather than forcing everyone to use setuptools which had proven difficult to evolve.

(I’m likely biased, since packaging changes tend toward really long rationales for design choices, but I was surprised at how short the rationale section is in this PEP :sweat_smile:)

The piece within this that I agree on being “one more way to do things” is the [tool] table within pyproject.toml. It originated as “let’s keep build configuration and allow build tool-specific configuration too” , and it evolved into “let’s allow all tool-specific configuration” since “what is a build tool” is an arbitrary and somewhat useless line of argument to have. I don’t think a similar line of argument applies here.

(and yes, with the power of hindsight, I do wish a better job was done of setting community expectations around this new file and the tool table. :sweat_smile:)

To use the parallel drawn, even though no one outlawed having tool-specific configuration files, or using non-TOML configuration files, nearly every popular development tool has had its users request/argue for having/contribute pyproject.toml’s [tool]-based configuration support. The expectation that having this standard would not mean that library/tooling authors will be pushed to adopt it, is at odds with that experience. It’s not a 1:1 situation, obviously, but I think some of the learnings transfer.

PS: The above is all based off of memory, so I might be wrong about some detail on this. It’s been 4-7 years, and I was still a curious teen back when the initial discussions were taking place. Please correct me if I got something wrong here!


I’ll echo this sentiment.

This was my first reaction while reading this PEP as well as the initial discussion here. I think introducing one-more-way, especially as standard syntax/library code, ought to cover why improving existing solutions isn’t a viable option more thoroughly or, at least, have a stronger reasoning for that choice than what’s provided. For me, I’d set the bar at “we tried and it’s not feasible because (list of socio-technical reasons)” but I’m also cognisant that not everyone might think it needs to be that “high”.

To draw (again!) on the referenced parallel of pyproject.toml, the rationale for both PEP 517 and PEP 518 extensively discusses why the options are being considered over the existing solutions. Alternative approaches were considered and tried over the course of years before we got to the point of discussing using a new file for conveying that information.


At the cost of being reductive, the main selling points of this proposal (to me, anyway) seem to be programmatic access to the docstrings and the ability to reuse the arguments.

However, you need to either execute or pseudo-execute (i.e. implement all the type system “magic” of handling certain ifs, try-excepts etc; or, at least, keep logical track of aliases while respecting namespacing) to actually extract accurate type alias information. Compare that to using docstrings, which allows fetching all the information to be done from an AST directly; without needing to do a full execution of the module or handling any of the execution.

Both of these, of course, have a restriction: dynamic logic doesn’t work on either static analysis approaches. Arguably, there’s a tiny amount of dynamism allowed under the type system based model in exchange for significant complexity.

While Sphinx’s built-in autodoc executes code and it’d manage to adapt for this with some tractable reworking, the AST approach is how sphinx-autoapi works. (IIUC, mkdocstrings does this too – don’t quote me on that!) As currently written, this PEP would require AST-based API documentation generation tools to take on the complexity of extracting documentation information from types that might be aliased, and that those aliases might be conditionalized since the type system supports that.

Realistically, IMO this means that AST-based documentation generation tools won’t be supporting this PEP’s proposed model (not fully, anyway).

OTOH, removing support for it (type alias and/or all the conditional magic that come with the type system) would mean that the only thing this provides is an alternative syntax that could live outside the standard library and is a strictly equivalent alternative to standarding on a docstring format ecosystem-wide. It’d drastically weaken the argument for adopting this model as the standard model.

This could be, of course, reduced by adding restrictions on how the TypeAlias is assigned and managed, but that’s a symptom that we’re not using the right abstraction model here IMO. :slight_smile:

While this isn’t a showstopper issue, it is certainly an argument against the proposed model IMO.

This actually leads me nicely into…


… The open issue about mixing documentation and typing. From the PEP:

It could be argued that this proposal extends the type of information that type annotations carry, the same way as PEP 702 extends them to include deprecation information.

Yes, but it’s not a strong argument IMO. :wink:

To use PEP 702: it discusses why the type system is a good vehicle to provide that information, type checkers are actually using this information, type checkers using that information results in useful behavious and the PEP includes references to prior art in other ecosystems showing that deprecation information leveraging the typing mechanisms isn’t a novel concept.

None of those are true for this PEP, IMO. To be clear: I’m not saying this PEP needs to do the same things or make the same arguments, but that it needs provide a stronger rationale for its design choices (especially, the socio-technical choice of not trying to settle/tackle the lack of standardisation problem).


Given that one of the motivating goals with this PEP is to provide richer errors/IDE experience, with the motivation explicitly calling out lack of ability to syntax check in IDEs, it’s interesting that it doesn’t bless a markup format. By not picking a markup format, we’d be punting the problem to a different layer (no invalid markup stuff in the strings you have in annotated). While “it’s unclear what to pick” is something that I agree with, I think the decision to not pick a markup format is a consequential one.

IMO, it’d be worth calling out in a more visible manner in the rationale (or rejected ideas, or wherever the authors deem appropriate). Even a position like “we don’t think it’s as important to have syntax corrections for the markup in those strings” would be fine here – it’s a judgement call, and I think the PEP makes the right one – but it seems like an important-enough detail for implementors and users to call out more visibly.


  1. I’ve once again failed xkcd: Duty Calls because this statement stuck out to me and it’s now 1:50am. ↩︎

9 Likes

I’ve seen more than one person mention that Annotated isn’t necessarily restricted to typing, and I feel that’s only half-true at best.

You mentioned one component of this – the fact that it comes from the typing namespace – but that’s mostly cosmetic. It could also be aliased into a new home if there were sufficient appetite for that.

The other thing is that Annotated requires a type annotation. So you can’t opt in to using a feature built on Annotated unless you also opt in to applying a type to whatever you’re annotating.


Is there no value in this being a tiny package on pypi? Most of the participants in this thread could create such a thing in an hour or two – it hardly needs a complex testsuite or other difficult infrastructure.

I’m not sure I follow why it should go into typing_extensions or typing, which seems to be a key part of the rollout plan here. A package can run ahead of the stdlib, and then transform into a rolling backport if it is pulled in.


Regarding rst vs markdown… As an older millennial (I can’t say I’m a younger developer anymore! :sweat_smile:), markdown is way more comfortable to me. But I would still argue for rst, on the grounds that it is much better defined. Markdown is to rst as yaml is to toml.

If markdown is used, I think that also adds the burden of defining which flavor and subset of markdown is allowed. And I just can’t see all that effort as being worth it, in exchange for, mostly, single backticks rather than double.

4 Likes

Thanks for the PEP, @tiangolo!

I’ve used Annotated as a way of providing per-parameter documentation in the past, and I can see the attraction of standardising it so that third-party tools can understand this kind of pattern. However… I’m afraid I also just find the proposed syntax here much too verbose. It wouldn’t be something I’d consider using in my own code, unfortunately: I think readability of code is pretty important. I also think this sentence in the PEP isn’t entirely true:

Nevertheless, this verbosity would not affect end users as they would not see the internal code using typing.doc() .

End users would very much be affected by the increased verbosity if they called help() on a function to get documentation in the interactive REPL (unless the plan is to have inspect.signature or pydoc strip the documentation from the function signature before displaying it in the output of help()).

Similar to Laurie’s comments in PEP 727: Documentation Metadata in Typing - #50 by EpicWink, I wonder if a better solution might be adding new special forms to typing? Even if you don’t like the idea (there are lots of disadvantages – see below), it might be worth discussing in the “rejected ideas” section. Perhaps you could have something like this, where Param and Attr are both treated identically to Annotated when it comes to type-checking (the metadata is completely ignored by type checkers), but at runtime, you could enforce that only a single metadata element is provided, which is a string, with the expectation that people provide a documentation string for tools like Sphinx to use:

from dataclasses import dataclass
from typing import Param, Attr

def foo(
    x: Param[int, "must be less than 5"],
    y: Param[str, "must be four or more characters"]
) -> None: ...


@dataclass
class Foo:
    x: Attr[str, "needs to be at least five characters"]
    y: Attr[int, "Must be less than 10"]

Pros of this alternative idea:

  • It’s less verbose; I find it much more elegant

Cons of this alternative idea:

  • We’d need to add new special forms to typing that type checkers would need to add support for, whereas the current proposal works “out of the box” with type checkers
  • Annotated[] is supposed to be the general-purpose way of adding metadata to annotations that’s irrelevant for type-checking; adding special forms for this purpose would be sort-of an implicit admission that Annotated is too unwieldy for some use cases
  • The specification would probably be more complex – how would these new special forms interact with Annotated[], or with get_type_hints?
2 Likes

As a maintainer of Jedi I was asked to provide feedback here. Thanks for reaching out!

I generally agree that it would be nice to have documentation for params in a more structured
way. There are a few things that could use improvement:

  1. I feel like there should be some sort of typing.get_doc() call for annotations at runtime. It’s just very normal that one is able to extract information from the typing module. Unlike PyCharm/mkdocs/VSCode, Jedi isn’t just a static analyzer, it can also be used within IPython/Jupyter and extract runtime information.
>>> def f(a: int): ...
...
>>> a = inspect.signature(f).parameters['a']
>>> a.annotation
<class 'int'>
>>> typing.get_doc(a.annotation)
  1. Like others have pointed out: It feels a bit wrong to use Annotated for this. If this is a really a Python stdlib provided feature, I would probably prefer something like Doc[str, "The name of the user"], which is also shorter and more concise.
def foo(
    x: Doc[str, "Lorem Ipsum"],
    y: int,
) -> Doc[bool, "Lorem Ipsum Dolor"]:
    ...

I feel like that’s way more readable and for the tools like Mypy/Jedi/VSCode/etc, this is not a lot of extra work.

1 Like

This feels like the key thing here. If I can indulge in a little history, originally annotations had no defined semantics, and they were explicitly for people to use for whatever they wanted, to encourage experimentation. Typing was always one anticipated purpose, but not the only one. But nobody really did much with annotations - there were a few experiments, but not much mainstream use.

So then the decision was made that annotations were for typing - it was always an intended use case, and no-one else had come up with other compelling uses, so let’s make the decision.

But once types became accepted, and common, people started finding other things that could be attached to variables, parameters, etc, and all those use cases that had never come up before started appearing. And so we have Annotated, marking things as deprecated, and discussions like this, about how we can cram non-typing annotations into the type system.

Maybe what we need to do is take a step back, and thing about how we can make non-typing uses of annotations sit more comfortably with type annotations? Not with Annotated, which frankly feels like a hack (and an unappealing, clumsy one at that…), but with some form of first-class way of allowing other use cases to grab a part of the annotation space for themselves, separately from typing.

Something like Doc works a bit like this, although it’s still not obvious how I’d use it to attach a docstring to something I didn’t want to declare a type for.

I guess what I’m really saying is that Annotated doesn’t really work, because it’s based on a presumption that anything that’s not a type is a “second class citizen”. And what we need is to re-think Annotated and produce something that’s less biased towards the “everyone uses types” mindset that tends to prevail in the typing community (for obvious and understandable reasons).

16 Likes

My current opinions:

  • document is not type. It would be nice if docstring is available without typing module.
  • It is really nice if docstring is available from both of runtime and statically (AST or CST).
  • I am worrying about annotation and Annotated are overused. I’d like annotation is just type hint.

So I prefer one of these approaches:

  • Add new syntax for function argument docstring.
    • e.g. def (a "comment a" : int, b "comment for b" : str)
  • Add some formalized text structure for function docstrings.
    • At least C# has this, although I don’t like XML.
    • PHPDoc looks like almost standard.
11 Likes

I’d also like new syntax for this, maybe

def add(a: int ("an integer"), b: int ("another integer")) -> int ("result"):
    return a + b

Those are parsed as function calls and you can’t avoid that due to runtime introspection potentially wanting to use function calls to produce some object that represents the type.

I like the idea of having documentation closer to the definition of the field. It’s easier when you have both of them in the same place.

I also like the idea of having a standardized convention of defining documentation per attribute and thus a standardized way to introspect these.

However, I have a few problems with the proposed syntax:

  1. It feels awkward to use the Annotated feature for this. The need to import something from typing in order to achieve this does not seem fun.

  2. Documentations may be quite long, writing them in the field definition will mostly “force” you to write the annotation over multiple lines (assuming you want to adhere to PEP8).
    It may disturb the ease of reading the actual definition.

While I assume it would be a harder implementation to write, may I suggest an alternative like expanding the current __doc__ feature to work in other contexts?

Something that will (roughly) look like so:

def foo():
    """
    Currently, this documentation will be automatically set on foo.__doc__
    """

class Bar:
    a: int
    """"
    The new feature allows this is documentation to be automatically set on Bar.a.__doc__
    """"
1 Like

Similar to the class attribute docstrings, “K&R style” declarations could be nice, as they separate the runtime and “static” (type, docstring) information, while being backward compatible. (It would still need a new place to store parameter docstrings, e.g. a new Documented[] in __annotations__.)

def frobnicate(widget, param, value):
    """Set widget's parameter to value."""

    widget: BaseWidget
    """The widget to frob."""

    param: str
    """The parameter to frob."""

    value: int
    """Parameter value to set."""
7 Likes

I released typing-extensions 4.8.0 yesterday with support for typing_extensions.Doc: typing-extensions · PyPI

2 Likes

I’ve updated this PEP 727 example to use typing-extensions:

3 Likes
Collapsing because it's a bit off-topic:

I find @ntessore’s suggestion very interesting.

  • it makes signatures ultra short and readable
  • typing information is immediately available at the beginning of the function body
  • parameters documentation as well

That doesn’t solve the case for documenting return values, or other common things like exceptions, warnings or deprecations. But it makes me wonder if this suggestion could be expanded a bit more:

def frobnicate(widget, param, value):
    """Set widget's parameter to value."""

    widget: BaseWidget
    """The widget to frob."""

    param: str
    """The parameter to frob."""

    value: int
    """Parameter value to set."""

    ...

    warnings.warn(
        "The `value` parameter is deprecated and will be removed in a future version",
        DeprecationWarning,
    )
    """value: When the `value` parameter is used."""
    # This docstring is here to document the warning.
    # The deprecation is detected thanks to DeprecationWarning,
    # and `value:` at the beginning lets the analysis tools know
    # that the subject of the deprecation is the `value` parameter.
    # `frobnicate:` instead would target the function itself.

    if condition:
        raise exceptions.CustomError("message")
        """When a certain condition is met."""

    ...

    return foo(bar)
    """optional_name: A transfooed bar."""

As much as I like it, it’s a static-only solution: none of these docstrings can be picked up at runtime.

1 Like

Writing up my thoughts on this after thinking about it for a bit and reading this discussion.

You can always use type aliases:

Users = Annotated[list[User], doc("A paginated list of users")]

def foo(users: Users): ...

As pointed out above this is great for reusability:

Consider the case of an APIs that accept an API key or similar, that type gets used in multiple endpoints. What Annotated lets you do is form something like:

APIKey = Annotated[
    str,
    FromHeader("x-api-key"),  # web framework metadata
    StringConstraint(pattern=r"\w{32}"),  # data validation metadata
    doc("The user's API key"),  # the doc stuff discussed in this PR
]

As long as the various tools can understand each other the web framework can also use the doc() part and the StringConstraints() part to generate it’s JSON schema. I often find this beneficial if nothing else to give gross large types a meaningful name and to clean up the function declaration.

I do recognize that it can be strange to have this information before the type is used in a function signature. But it’s not like docstrings were any closer to the function parameter (see comment above about scrolling back and forth). This the situation is still not all that grave: unlike docstrings, you can click on the : Users part and be taken to the definition, be it 2 lines above or in a completely different file. In fact sometimes you want to move that somewhere else, like in the case of the APIKey type I showed above.

Nonetheless, I do agree that as it exists right now Annotated is way too verbose. Especially with the import. I wish there was a way to make it a builtin or we could use some valid but otherwise unused syntax to avoid having to type out Annotated all over. I don’t see that as a reason not to use it, rather to the contrary: if it become popular and is used a lot we just need to figure out a way to make it less verbose to use.

Regarding standardizing this via a PEP: I empathize with both sides of the argument. I think the answer to this is to experiment as a 3rd party library first but get good buy-in from the ecosystem at the same time. I feel that the ecosystem can fall into this rut of chicken and egg: no one implements anything until it’s “official” but we can’t make it official until there’s extensive usage in the wild. I won’t sit here and say that IDE and tooling developers should all just put in more work to implement experimental proposals like this, but I will tip my hat off to the folks that do like pyright, typing-extensions, mkdocstrings/Griffe and others.

What to me would be the ideal solution (which was somewhat mentioned above) would be to preserve docstrings added to variables and parameters, thus allowing examples like this to work at runtime:

APIKey: Annotated[str, ...]  # or not using Annotated, doesn't matter
"""The API key for the user"""

class Foo:
    key1: APIKey,
    key2: APIKey
    """Overridden doc for APIKey"""
    key3: str
    """A brand new doc"""

def foo(
    key1: APIKey,
    key2: APIKey
    """Overridden doc for APIKey""",
    key3: str
    """A brand new doc""",
):
    """A docstring for the function, without documentation for the parameter"""

The class version and free variable version pretty much work and IDEs support them, there’s just no information at runtime so FastAPI, Pydantic, etc. can’t use them. The function version would require syntax changes. I think this option is better overall but harder to implement since it really does need buy in for syntax changes before it can be viable and adopted by IDEs and other tooling. So maybe doc() is a good starting point to build towards this and explore uses of Annotated.

6 Likes