PEP 727: Documentation Metadata in Typing

Hello all!

I present to you PEP 727: Documentation Metadata in Typing.

This adds an optional way to document things in Annotated, particularly useful for documenting function parameters and similar, complementing docstrings, as in:

def hi(
  to: Annotated[str, Doc("The name of a user")],
) -> None: ...

Your feedback is very much appreciated.

10 Likes

I don’t feel that Annotated should be used by other typing functions but rather be for 3rd parties to add additional metadata, leaning on Annotated for this type of feature feels like a hack.

Furthermore I don’t see the appeal of this over a docstring, you are already adding this to your own code so it should be trivial to include this information inside the docstring instead.

This doesn’t feel like a feature which should be included in the stdlib.

16 Likes

While I also appreciate the motivations of the PEP when it comes to programmatic access (And thank you again for the clarification!), I think placing this in Annotated is not the correct place. Have other alternatives for additional syntax that does not place this inside of Annotated been considered?

Off the top of my head, there are a few things that do not currently have a meaning, and likely never will in annotations that could be used to put this into a metadata field that isn’t attached to the typing, but is attached to the parameter.

For example

def some_function(
    some_parameter: SomeType @ "Some documentation goes here"
) -> SomeReturn @ "Some details about this return type":
    """ Documenting the function itself here """

I believe placing this in the type system rather than having syntax specifically for per-parameter documentation in Annotations alongside the type system would be limiting and detrimental long term, and perhaps set a poor precedent for the standard library’s use of typing.

I also don’t see that much a benefit over parsing existing docstring formats. If I use sphinx, my tests can use sphinx to pull it out as sphinx see it (And so on), so I’m lukewarm on the idea of yet another way to spell the same thing, even if I can appreciate the motivations for it.

5 Likes

I hope we don’t end up with syntax like this, it seems very surprising. Annotated may not be the most elegant, but it is self-describing.

I’ll write up my thoughts on the PEP as the maintainer of Sphinx soon (after I’ve written them!)

A

9 Likes

I like the idea, because I feel it makes more sense to have parameter documentation closer to the parameters in larger APIs.

I don’t see this as another way to document parameters, as parameter documentation in the docstring is kind of a hack (I see it as a way to document the function (etc), not it’s parameters or return-value).


typing.doc could be a subclass of str, reducing the new symbols introduced by the PEP (you get the “value” by taking the str of the instance). This is a minor optimisation, but I don’t see any drawback.


The one alternate use I would like to see implementers implement is the union, as often the different types can mean a substantial difference in the functionality of the function.

4 Likes

There’s also another problem for the adoption of something like this in terms of developer ergonomics.

prior to any indentation, with just an empty documentation,

parameter: Annotated[AGeneric[SomeType], doc("")]

This is already 49 characters wide, and I didn’t use some longer, real-world types. Part of the major existing benefit to parameter documentation being in the function docstring is it isn’t interspersed with code and vertical space use in a docstring is rather forgiving. Most parameters that aren’t actually self-documenting in the first file I opened in a project I maintain are multiple lines describing what is expected.

Expanding an annotation across multiple lines or past even modern screen widths like 120 characters messes with many things, including git diffs.

9 Likes

I’m fond of this style and have been using this style heavily in internal libraries. For classes that represent configs/schemas similar to pydantic models/protobuf messages it has been very helpful having each attribute have associated documentation.

docstrings have formats, but it’s simpler to programmatically access Annotated with simple string vs being careful on documentation linting of “param: x”. I’m also hesitant on documenting parameters in docstring as can feel duplicating info. A lot of docstring formats had type information for parameters in their and I prefer keeping type defined once that type checker can use then having type vs explanation separate.

On tooling side it’s been on my backlog to explore something similar sphinx-autodoc-typehints type extension that extracts Annotated docstrings and includes them in generated documentation pages.

I’m less sure on standardization benefits. The biggest argument for me is on what tooling would be motivated to use this because it’s PEP vs without PEP? Rationale mentions this but it’d be helpful to have IDE maintainers/other documentation tools comment on if this PEP was accepted/became used in several popular libraries if they would want to support this. Without PEP for documentation generation it seems reasonable to have small sphinx plugin you maintain handle it. Editor support maybe less likely without parameter documentation standard?

Another question is for libraries that do follow docstring format strictly how likely are they to be interested to move to this? Can some of the migration from numpy/google/etc docstrings → Annotated doc be automated and would they want it?

edit: One benefit to this style is you can have aliases to re-use same documentation for common flag used by many functions/classes. For example in tensorflow each optimizer shares name + learning rate argument. They all have same meaning. With docstrings not clear how to share documentation besides runtime manipulations. With Annotated you can have,

LearningRateT = Annotated[float | LearningRateSchedule | Callable[...], "Learning rate for an optimizer. Can be a constant number or a function that describes how it changes over training progress."]

class MyOptimizer1:
  def __init__(self, learning_rate: LearningRateT):
    ...

class MyOptimizer2:
  def __init__(self, learning_rate: LearningRateT):
    ...

I have file full of aliases like these include type + documentation (+ extra parameter annotated metadata).

3 Likes

Looking at the PR implementing the PEP (Add Doc from PEP 727: https://peps.python.org/pep-0727/ by tiangolo · Pull Request #277 · python/typing_extensions · GitHub), it does feel a little ugly that the doc() function doesn’t do anything other than create a DocInfo instance. Maybe we should combine the two?

I don’t like subclassing str though; subclassing builtins often tends to get confusing, and I don’t see much benefit. In the future we may want to add additional attributes to the DocInfo class (whatever it ends up getting called).

4 Likes

I think it makes sense to create a new package on PyPI (eg typing_doc) just providing the proposed doc right now. Implementers may (but don’t have to) use this for isinstance checks for parameter documentation as description in the PEP.

This would allow us to get a feel for the popularity of this idea without any Python version requirement and a lot faster. When the PEP gets accepted, this package would simply re-export doc from typing_extensions.


I don’t think the screen-width issue is much of a concern, as it’s basically the same width as a Google-style docstring line already. Personally, I would put the documentation on a separate line anyway, because otherwise it is to short to justify documentations each parameter individually.

2 Likes

Sphinx supports this syntax:

@dataclass
class Foo:
    """Docstring for class Foo."""

    baz = 2
    """Docstring for field baz."""

So why not make this available at runtime?

It wouldn’t help with function parameters, but as the PEP says, there are existing solutions to that.

5 Likes

It’s already possible to do

def hi(
  to: Annotated[str, "The name of a user"],
) -> None: ...

and from what I can tell, the meaning of the string literal here is up to the tooling; it would certainly make sense for it to interpret a string used to annotate the type as documentation. Unless you’re proposing that Pydoc should be able to scan for DocInfo instances and insert their wrapped strings into help results… somewhere/how?, I don’t see how this adds anything. By my reading, the PEP proposes no such thing.

(Quickly edited because I read a little more carefully. I agree that having a separately named function that simply creates instances of the class is generally pointless; why not just use the constructor?)

1 Like

Because a) the value needs to be put somewhere (which can be bikeshedded quite a bit) and b) there needs to be a rule as to why the line baz = 2 is “special”. What happens to methods (and properties), for example? Should it be possible to give them a docstring after the end of the definition, independently of the usually placed docstring for a function? If not, why not? What about other code that appears within the class body (yes, all sorts of strange things are possible)?

This idea seems very cool and in hindsight I almost feel a bit surprised the Python community did not come up with something like this sooner. Two things:

  1. Is the doc function necessary? Couldn’t a doc param be added to Annotated? (I suppose you already thought of that, but I don’t catch why it couldn’t be feasible.)
  2. I am afraid function signatures will become really long. Of course, if that is the case, people should just use the regular docstring.
2 Likes

If this PEP is accepted, this would be a recommended way to document parameters.
This format is used for not only methods requiring runtime docstring for parameters, but many general methods. Am I right?

When I read it first time, I was afraid about runtime cost of the documents. But PEP 563 and PEP 649 will save it.

Apart from performance issues, I am concerned about the readability of the code.

(s: str, a: int) looks much cleaner than (s: Annotated[str, "doc of s"], a: Annotated[int, "doc of a"]). This difference would become bigger when document is multiline.
I want to keep function signature simple and clean as possible.

So I prefer to write the argument descriptions in the docstring of the function, except when the docstring is needed at runtime even if this PEP is accepted.
I don’t want to use Annotated in every functions. I want to use Annotated only if the function needs additional runtime metadata.

Overall, I’m +0 to doc() and DocInfo in type_extensions, but -1 on discouraging argument document in function docstring.

16 Likes

Hello, maintainer of mkdocstrings here, and more specifically Griffe, which provides parsers for the common Google-style and Numpydoc-style docstring standards.

I personally like this proposal and already implemented basic support for it in a Griffe extension. I even had fun imagining what an extended version of the proposal could provide.

There’s an issue with docstrings standards like Google-style and Numpydoc-style: parsing them is pretty fragile, because they are not specific enough. They actually don’t even have a specification at all: both have a style guide (Google-style, Numpydoc-style) but no specs, so it’s up to each tool to guess and decide how they should be parsed. In Griffe, the ambiguous aspects are configurable with options, like whether to allow blank lines in Numpydoc-style sections. I recently had to make the Google-style parser more strict (blank line required before section, no blank line allowed after section title) because it had false-positives. None of this is officially specified.

These two styles, Google and Numpydoc, also have divergent features and designs: the Google-style seems to use/recommend Markdown, while Numpydoc is based on rST and designed only to work within Sphinx. Numpydoc supports “See also” section that should be parsed a certain way, while the Google-style does not. Numpydoc requires types to be added to Returns sections, forcing user to duplicate what’s already in the signature. Numpydoc supports multiple named items for Returns sections, while the Google-style does not. Many things are up to interpretation, and make it hard to elaborate a common ground of features and data structures to store the information.

There’s also what I call the Sphinx-style (not sure it has even a name), using :param foo: Hello. to describe parameters for example. This style does not support most of the sections supported by Numpydoc and the Google-style: no yields, no receives, no warns, no kwargs. It also seems to be based on / tied to rST. Not that rST is a problem, but a style/standard that is tied to a particular markup cannot IMO be reliably used by different tooling. The only truly markup-agnostic style that I know of is the Google-style, and that’s one of the reason I use it myself.

With type annotations in signatures, types are not required in docstrings anymore. It means that a pure docstring parser won’t provide all the necessary information to its consumers. It has to also run static/dynamic analysis to obtain types and default values from signatures (for example), or to put this responsibility on the consumer. It means tooling must constantly merge two sources of information: structured data obtained from static/dynamic analysis, and structured data obtained from parsing docstrings, and it’s sometimes hard to do right.

For these reasons, I like what PEP 727 proposes. It structures the information in the code itself (in annotations) so that we don’t have to design, write or parse custom syntaxes in docstrings. Docstrings can then use plain markup, chosen by maintainers (Markdown, AsciiDoc, etc.), in a consistent manner. If it is accepted, and later gets extended to support other kinds of information, it will also standardize what kind of information can be added to code, which I think is very important.

The PEP has one limitation I would like to mention: structuring information in the code won’t allow to intertwine prose and sections in docstrings anymore (like parameters, then text, then exceptions, then text again, then returns section), though it was never really specified either, and could be alleviated by tooling, for example with placeholders in docstrings that get replaced with rendered information.

I won’t comment on the verbosity aspects as I think the PEP addresses it, but will add that documentation generators can very well strip annotations of the surrounding Annotated when rendering docs. The data is there to collect, and can be hidden when exposed to users. Next version of mkdocstrings will do just that (I tried, it works) :slightly_smiling_face:

5 Likes

Very nice project idea. I think it is definitely possible to automate the migration from docstrings to annotations. The “get data from docstrings” part is already done, now we need the “write data back to annotations” part. There are multiple libraries allowing to transform concrete syntax trees, so it shouldn’t be too hard?

This is my biggest concern as well. Functions where the arguments have type annotations can already be rather long, and Annotated on its own is rather verbose, so I’m generally glad it’s rare. By making it the recommended way of documenting arguments, this makes use of Annotated much more common, and furthermore, an argument docstring will itself be pretty long, making the problem worse.

Combined with the long-standing pressure we get from people wanting to be able to write “one-liners” in Python, we’re reaching a point where a typical Python function will consist of a huge declaration, combined with a highly compact body. And I can’t think of anything less readable than this.

Here’s an example from a recent function I wrote:

def download_pip(
    version: Annotated[str, doc("The version of pip to download")],
    hash: Annotated[str, doc("The SHA256 hash of the wheel file that will be doenloaded")],
    target: Annotated[Path, doc("The path where the downloaded wheel should be stored. Any directories in the path must already exist, and it is the caller's responsibility to use a valid wheel filename"]
):
    url = wheel_url("pip", version, "", "py3", "none", "any")
    with urlopen(url) as f:
        content = f.read()
    if hashlib.sha256(content).hexdigest() != hash:
        raise ValueError("Downloaded file has invalid hash")
    target.write_bytes(content)

Try quickly finding the parameter types from that - I deliberately marked the code as “text” to demonstrate the issue if you don’t have some sort of syntax highlighting support. Also, the restrictions on what you can pass as the target argument are pushed way off the screen to the right (at least on my display) meaning that the docstring for the parameter isn’t readable without scrolling the page. And have fun noticing the spelling mistake in one of the docstrings!

The version that you get if you run black on this is even worse:

def download_pip(
    version: Annotated[str, doc("The version of pip to download")],
    hash: Annotated[
        str, doc("The SHA256 hash of the wheel file that will be doenloaded")
    ],
    target: Annotated[
        Path,
        doc(
            "The path where the downloaded wheel should be stored. Any directories in the path must already exist, and it is the caller's responsibility to use a valid wheel filename"
        ),
    ],
):
    url = wheel_url("pip", version, "", "py3", "none", "any")
    with urlopen(url) as f:
        content = f.read()
    if hashlib.sha256(content).hexdigest() != hash:
        raise ValueError("Downloaded file has invalid hash")
    target.write_bytes(content)

The actual code is almost entirely lost, it’s barely more than half the number of lines of the function header…

And all of this is just with 3 arguments, all with basic types. Here’s a function from pip, which I picked at random so can be considered more typical of “real world” code:

    def send(
        self,
        request: PreparedRequest,
        stream: bool = False,
        timeout: Optional[Union[float, Tuple[float, float]]] = None,
        verify: Union[bool, str] = True,
        cert: Optional[Union[str, Tuple[str, str]]] = None,
        proxies: Optional[Mapping[str, str]] = None,
    ) -> Response:

Add docstrings to that, if you dare :slightly_frowning_face:

Of course, the counter-arguments are “this is optional, you don’t have to use it” and “it’s up to the user to ensure their code is readable”. But honestly, those responses are at best naïve, and at worst dismissive of a real issue. People do demand policies like “your arguments must have types” and “your code must have docstrings” because IDEs can give a much better user experience to the user of the code if that data is available. It’s important, therefore, to make sure that including such information is possible without making things worse for the developer of the code, so that we encourage good practices.

If this were being proposed as an interim solution that could be used until a more developer-friendly solution was devised, then I’d be less concerned. But I don’t think we’re even close here to a solution that I’d want to advocate as the “one obvious way” to document function parameters.

29 Likes
Dared!
def send(
    self,
    request: Annotated[
        PreparedRequest,
        doc(
            """Lorem ipsum dolor sit amet...
            
            ...consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
            Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
            Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
            Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
            """
        ),
    ],
    stream: Annotated[
        bool,
        doc(
            """Lorem ipsum dolor sit amet...
            
            ...consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
            Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
            Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
            Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
            """
        ),
    ] = False,
    timeout: Annotated[
        Optional[Union[float, Tuple[float, float]]],  
        doc(
            """Lorem ipsum dolor sit amet...
            
            ...consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
            Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
            Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
            Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
            """
        ),
    ] = None,
    verify: Annotated[
        Union[bool, str],
        doc(
            """Lorem ipsum dolor sit amet...
            
            ...consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
            Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
            Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
            Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
            """
        ),
    ] = True,
    cert: Annotated[
        Optional[Union[str, Tuple[str, str]]],
        doc(
            """Lorem ipsum dolor sit amet...
            
            ...consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
            Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
            Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
            Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
            """
        ),
    ] = None,
    proxies: Annotated[
        Optional[Mapping[str, str]],
        doc(
            """Lorem ipsum dolor sit amet...
            
            ...consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
            Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
            Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
            Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
            """
        ),
    ] = None,
) -> Annotated[
    Response,
    doc(
        """Lorem ipsum dolor sit amet...
        
        ...consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
        Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
        Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
        Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
        """
    ),
]:
    ...

For functions with a lot more parameters, I think it does make the whole thing more readable. For example, in one of my project, I provide callables for common CLI tools. The pytest callable currently has 82 parameters. That’s without extensions, just pytest. When reading the code, the parameter list is so long that I have to scroll anyway to look at the docstring. So I have to scroll back and forth between the signature and the docstring to check the type and the description for a given parameter. Being able to write the description right next to the parameter would improve the readability a lot. Now I understand this is very rare case (I don’t hope to find more functions with 82 parameters in the wild), so this counter-argument does not have a lot of weight :slight_smile:

4 Likes

Thanks for the input @Zomatree !

I don’t feel that Annotated should be used by other typing functions but rather be for 3rd parties to add additional metadata, leaning on Annotated for this type of feature feels like a hack.

As this is not altering the semantics of Annotated, I think it could be considered working in the same way as a third party library. The same way dataclasses are something that could perfectly be a third-party library (it was even inspired by one), but it makes sense to have a centralized option.

A side note: having a centralized option doesn’t mean enforcing it, the same way dataclasses are not enforced nor the only option, many people prefer to use attrs or pydantic.

If you’re concern is about using a space (Annotated) that should be somewhat reserved for third-party libraries that use Annotated, like FastAPI, Pydantic, Strawberry, Typer, SQLModel… we are the main ones asking for this, and we intend to support re-using this now (hopefully) “standardized” way to document parameters and types, instead of only relying on each of our custom solutions for this.

Furthermore I don’t see the appeal of this over a docstring, you are already adding this to your own code so it should be trivial to include this information inside the docstring instead.

I would encourage you to read the document. Most of the document is arguments for why this PEP would make sense.

In short, docstrings for parameters have several disadvantages, including:

  • info duplication
  • info resynchronization, because of the duplication… which in turn becomes a type of manual code cache invalidation for developers maintaining those docstrings
  • editor support for editing this information
  • runtime access to the information
  • simplification for tooling implementers to extract the information (no custom syntax parser needed)
  • No new micro syntax for developers to learn (not everyone knows all these microsyntaxes, but everyone knows Python)

This doesn’t feel like a feature which should be included in the stdlib.

There are several reasons why it would belong in the standard library (again, I encourage you to read the PEP).

Some reasons:

  • Bring consensus and standardization, this would make it easier for developers to adopt it for new projects, and for implementers to decide to support it (the implementation would be minimal).
  • No need for external libraries for this, developers of new packages could use it without installing something external (if they decided to do so).
2 Likes

I feel that you have this point backward. Putting this feature in the stdlib should only be done if there is already consensus. It doesn’t bring consensus, it acknowledges it.

In this case, I don’t think there’s consensus in the overall Python developer community that this is the best way to document function parameters. Apart from anything else, it doesn’t offer a usable approach for developers who don’t want to add type annotations in the first place…

9 Likes