PEP 727: Documentation Metadata in Typing

ofek · August 30, 2023, 5:42pm

I think this should be its own section in the PEP.

The effects of social pressure in this regard cannot be overstated and is something I’m very concerned about. Not just for myself but also everyone else that feels this is now “the way” and over the course of a few years all code becomes more difficult to read and the maintainers that choose to not do that will be strongly encouraged to conform.

It is nice to see Timothée here since I use mkdocstrings in all of my personal and work projects! I however will continue using Google-style because that looks the nicest to me and I hope that won’t be deprecated due to this PEP

pawamoy · August 30, 2023, 5:51pm

Nice to see you here too Ofek

No, of course support for Google-style and Numpydoc-style docstrings won’t be removed, even if this PEP is accepted I plan on using the PEP on one or two of my own projects to see how it goes, but I’ll definitely keep using Google-style docstrings too in general.

mikeshardmind · August 30, 2023, 7:49pm

Eric Traut:

Confusion about what the doc string is documenting
In places, the text of the PEP seems to be confused about what the doc string is actually documenting. I think it should be made clear that a doc string annotation has meaning only when used in conjunction with the declaration of a symbol, and it documents the intended meaning and use of that symbol. There’s already a well-defined way to provide docstrings for classes, functions and modules. This PEP is proposing a standardized way to provide doc strings for other types of symbols: parameters, type aliases, class-scoped variables (class vars and instance vars), and local variables. Such documentation is useful because it can be presented to users when they hover over identifiers within their code. It can also be associated with that symbol for runtime introspection.

Sphinx actually has a convention for documenting other symbols as well (I could check for numpy and google having it, not actually sure from memory), so I’d like to be clear that this isn’t something that needs to exist in this specific form. It might be significantly easier to add support to multiple common IDEs for all common documentation conventions, with configuration settings, than the friction of all existing code, including those using tools that don’t have a lack solved by this, to need to be updated for this due to the social pressure of a standard and tooling uniting on that new standard.

#: doc
symbol: annotation = value

sirosen · August 30, 2023, 7:53pm

I think there’s a real use case here, but it seems that it only relates to typing in that annotations are a place where arbitrary metadata can be stored.

I’ve used click for years, documenting parameters, sometimes far away from their usage sites, with helptext. It has sometimes been useful to be able to pull that data dynamically (e.g. to test against helptext). A similar situation exists for dataclasses, attrs, and pydantic, as well as some other sophisticated cases like sqlalchemy.

Putting the documentation into an annotation may make backporting to older pythons easier, but it seems to me like it’s taking the language and language standards in a bad direction.

Click parameters are stored in a list attached to decorated functions. Could something similar be done here in a package to prove out some new space for documentation as generic metadata attached to classes?

BrenBarn · August 30, 2023, 8:30pm

+1000 to this. I’ve gotten to where I mostly just ignore typing-related discussions because I feel it’s spun so out of control there’s no real point in arguing against anything anymore. But the trend towards the kind of social pressure you mention is very worrisome, and every attempt to add another typing feature just seems to me like an increasingly huge tail wagging a tiny dog.

mdrissi · August 30, 2023, 9:05pm

I think main thing that makes this relate to annotations beyond Annotated existing is want to have this associated to specific arguments of a function/field. Adding a decorator that has list doesn’t connect to function arguments where they are stated. And how would decorator apply for dataclass fields? Some of motivating use cases here are not functions, but attributes that similarly use documentation. These two classes,

class Foo:
  def __init__(self, x: int, y: str):
    ...

and

@dataclass # Or pydantic model/sqlmodel/etc
class Foo:
  x: int
  y: str

Are two different common ways to define fields x and y. Both could use documentation associated to that exact field not any specific function. How would decorator handle both cases? How would decorator allow documentation definition for x to be shared as alias defined once? I have hundred+ classes like this sometimes written as dataclass, sometimes as __init__ class where I use Annotated to attach documentation metadata directly to intended field/argument.

At moment annotations are mostly typing. Annotated while a typing pep motivated as a way to allow defining field/argument annotations for non-typing use cases so this feels like a strong fit.

sirosen · August 30, 2023, 11:04pm

Even if the approach doesn’t port directly, I think click’s existence as a library which has solved this problem fairly well – and since python2 – suggests that there are solutions which are already available without doing it through type annotations.

What would the solution be if Annotated didn’t exist? How does it compare?

At the very least, a new dunder could be added dedicated to this purpose. Or sphinx-style #: comments could be standardized via some tooling to parse those out at runtime.

By starting the discussion from the assumption that the data should be baked into Annotated, it seems that some comparative analysis is skipped.

This prompted me to reread PEP 593, since I thought it carved out Annotated as a place explicitly for third parties to develop new ideas, not the stdlib. It turns out to be a lot less clear than that.

The phrasing doesn’t suppose that the stdlib will never use Annotated, but various details lead me to believe that the PEP authors intended Annotated to be purely or primarily for 3rd party extensions.

As far as I know this PEP would be the first time for the stdlib to make use of Annotated.

In 593, implementations are told to simply ignore Annotated contents which are not recognizable to them. That basically means that only one library can use Annotated at a time unless special care is taken to keep the usages fully compatible.

If the stdlib is going to start shipping things which are purpose-built to be put into Annotated, then I think that needs to be adjusted or at least not assumed.

How are libraries expected to deal with the stdlib functionality living alongside their own custom data? Should they iterate over the contents of the annotated metadata skipping any DocInfo objects?
Should a library be expected to filter the Annotated metadata to recognizable contents?

Another hint that this was not in people’s heads at the time of the PEP:

Namespacing annotations: Namespaces are not needed for annotations since the class used by the annotations acts as a namespace.

That seems to be very strongly predicated on the idea that Annotated only has one user at a time.

Maybe this is all a necessary evolution of Annotated to make it more usable and portable across the ecosystem of libraries, but I think there’s a bigger issue to be resolved in that case. Possibly an amendment to PEP 593 to stipulate a better way for libraries to process Annotated data which doesn’t match their known usages.
Or possibly language in this PEP which states that libraries should trim the metadata to __metadata__[1:] if isinstance(__metadata__[0], Docinfo).

Jelle · August 30, 2023, 11:25pm

It’s really the opposite. Consumers of annotated should iterate over the metadata and ignore any metadata that they don’t recognize. For example, with PEP 727, you could write

def my_function(
    x: Annotated[int, doc("this is x"), MyAwesomeMetadata(), SomeRandomValue()]
) -> None: pass

Now, a tool that is interested in the documentation for parameter x would iterate over the metadata from the Annotated with something like [data for data in anno.__metadata__ if isinstance(data, typing.DocInfo)], and simply ignore anything else. Similarly, a tool that is interested in something else might look only for instances of MyAwesomeMetadata.

brettcannon · August 31, 2023, 12:28am

I think I might be the “teams behind VS Code” that Sebastián is refer to since he emailed me. For at least the Python extension itself, we have no opinion as we don’t actually read anything out of Python code where this would be used. Maybe the Pylance team has an opinion, but that’s not my team. Otherwise you would want to get feedback from Jedi as the other code completion tool used by folks and that may provide inline docs for things.

EpicWink · August 31, 2023, 1:03am

Some alternate proposals which could inform a final design of this PEP:

I think this has been suggested years before, but not as a topic I could find: having a new Doc annotation:

def fn(
    x: Doc[int, "Documentation here"],
    y: Doc[Annotated[int, Custom()], "More docs"],
) -> Doc[int, "Returned thing"]:
    """Function docstring."""
    return x + y

Note this PEP would work on return types as well.

Going full Sphinx (and I think PyCharm already supports this), the strong literal after the symbol:

def fn(
    x: int,
    """Documentation here."""

    y: Annotated[int, Custom()],
    """More docs."""

) -> int:
    """Function docstring.

    Returns:
        Returned thing
    """"

    return x + y

Outside of alternatives, there are cases where the docstring just sucks for parameter (and return-value) documentation. People on this thread against a more structured approach to this are really (but likely unintentionally) saying “your problem isn’t worth making easier to solve, Python will never have an easy and reliable way to solve it”.

Perhaps there is a way to sell this functionality as a tool for specific scenarios, and certainly not the blessed way in the majority of cases.

Also nit, technically this is purely annotations, not typing (especially as one of the two main problems is extraction at runtime) .

I think I will create a package today implementing the initial proposal (unless someone else already has), which I suggested earlier. This can be used to get better telemetry.

Edit: done: docannometa

andrew222651 · August 31, 2023, 1:14am

Is Annotated supposed to be for general metadata about a name or is it just supposed to be for refinements of a type? In docstrings, a parameter’s description may include not just what kind of value it should be but also what it’s for, what the function is going to do with it or to it, etc.

AA-Turner · August 31, 2023, 1:40am

Preface: I’m somewhat commenting as a sometime maintainer of Sphinx & Docutils here, rather than as a PEP editor (or just as myself), as I usually do.

I’d also second this concern – with a properly documented function, we might get in to a situation of having a function declaration being over a screen long! Perhaps not the most common case, but having read through some functions in e.g. NumPy that are exhaustivley documented, I think it is a legitimate one.

Sphinx would implement support for this, mainly because we’d have to. It would bring challenges though, including that people often put typing imports behind TYPE_CHECKING blocks & Sphinx uses a runtime importer, making extracting the documentation harder.

Importantly, we’d also need to somehow decide where to put the documentation – most people want parameter descriptions after a prose overview of what a function does, but reliably getting the location to insert this new text is challenging.

Personally, I think that the arguments in the PEP against standardising e.g. Google or NumPy style docstrings are fairly weak, especially given many tools and IDEs currently work with structured text inside docstrings.

This is a more valid criticism, but it could be a valuable contribution for this PEP to codify such a standard (and take the good bits from numpydoc)

This should have a strong counter argument in the PEP, which currently I don’t think exists.

I think this should be included as a rejected idea, with rationale. For example, why not lift up a string literal using the current rules for docstrings after any varable or parameter into a object.__doc_attrs__?

That would allow for using recognisable syntax:

class Spam:
    ham: int
    """How much ham"""
    def breakfast(
        eggs: bool,
        """Eggs?"""
        spam: int,
        """Spam, spam, spam, spam, and spam"""
    ) -> Meal: ...

The PEP notes “And the cost of dealing with the additional verbosity would only be carried by those library maintainers that decide to opt-in into this feature.”. I commented in a review of the PEP that I don’t think that this is entirely true – if it became the blessed feature, library authors would feel compelled (or: get several PRs) to change their documentation to this new model. We should act as if this will be the default scenario for documentation when evaluating this PEP.

One final thing that I don’t believe has come up is that a docstring is currently structured prose. A docstring can define what a function is for, why it exists, how it relates to other parts of a package, etc. I think there is a risk that by moving only parameter documentation out into “structured metadata” we relegate the docstring to a “legacy solution”, and loose the ability to capture the nuance we’re currently able to.

A

BrenBarn · August 31, 2023, 4:21am

My problem with this is that I just don’t think Python’s syntax is flexible enough to go down this road. (To be frank, I think the language would be better if we had stopped much further back on the typing-annotations road.)

The annotations have to be Python expressions. That means all nesting and relation-marking (i.e., what goes with what) has to be done with parentheses. I find that unpythonic; we’re supposed to be have nice indentation-based structure, not parenthespaghetti! In my view, there is simply no way to make plain expression syntax clean enough to be worth using for anything except extremely short and simple annotation (like arg: int). I’m also not entirely sure it is worth the complexity to attempt to separate out different bits of metadata like this at all, because it seems to just invite complex signatures with intricately-specified metadata rather than just good, up-to-date documentation that a human can read to understand how to use the function.

If we wanted to annotate function arguments or attributes or local variables with extended metadata, we’d need a new syntax, similar to what @tmk mentioned. Something like:

class Blah:
    some_attribute: int
        some_arbitrary_metadata = 2
        """Here is the docstring for this attribute"""

    def method(
        """Here is the method docstring (yes, here!)"""
        returns int

        arg: str
            """Here is the docstring for this argument"""
        other_arg
            type = SomeLongType[SomeOtherLongType]
            arg_metadata_who_knows_what = "stuff"
            doc = """Here is the documentation for another argument"""
    ):
        ...

I’d want to be have the option of putting the method metadata (return type and docstring) first, instead of at the bottom after all the arguments. I’d want to express the metadata relationship with nesting instead of parentheses. I’d want to be able to pull the type annotation onto a separate line (named with type = ) if it gets too long and cumbersome. I’d want the indented metadata to automatically separate arguments so I don’t have to remember where the comma goes. I’d want to be able to add arbitrary metadata instead of having a bunch of separate PEPs about adding this or that additional kind of data.

I would want all those things. . . if I wanted to do anything like this at all. I’m sympathetic in theory to the idea of having documentation available as structured metadata, but in practice I just feel like it would lead to stuff like the above. And even though I think the above would be better than shoehorning everything into expression syntax, I still think it’s less readable than just putting the documentation inside the function docstring and accepting that, yes, everyone who uses the function will just need to read the entire docstring, and everyone who makes a change to the function will need to re-read the entire docstring and make sure it’s accurate in light of whatever changed.

Basically, I just feel that going the road of adding separate structured metadata at finer and finer levels and granularity, in the hope of facilitating processing of that metadata by programs (e.g., IDEs) will cause a net reduction in readability, compared to just writing less fragmented documentation that is targeted directly at human readers.

sirosen · August 31, 2023, 4:30am

Thanks for clarifying. I misread the note about “iterating through annotations” in the PEP. It sounded to me like it was talking about walking the annotations on a class, but I see now it means walking the metadata tuple.
Correctly reading that makes the original spec much more cooperative in nature, so I retract many of those concerns.

(This does still feel odd in that __metadata__ is just a tuple, so users of it may assign meaning to the position and order of its members.)

I still think that there are possibilities for resolving this outside of Annotated worth exploring. Documenting class and instance variables is valuable for untyped code too. For example, there’s not much use for annotating fields in marshmallow schemas with types, but they would benefit from being annotated with documentation.

DanielNoord · August 31, 2023, 10:34am

That would allow for using recognisable syntax:

class Spam:
    ham: int
    """How much ham"""
    def breakfast(
        eggs: bool,
        """Eggs?"""
        spam: int,
        """Spam, spam, spam, spam, and spam"""
    ) -> Meal: ...

This has been proposed before in PEP 224 for classes. Sadly it was rejected, but also sort of codified in PEP 257. (see “attribute docstrings”). I can’t find the link anymore, but I know that I have read (I think on discuss.python) that people would be in favour of reconsidering 244 as the rejection was only based on the syntax and not based on the idea.

It seems to me that the issue of adding documentation metadata to attributes, parameters, type aliases has come up before and never really solved. To me referring to this problem as “Documentation Metadata” mentally complicates this issue. What most people seem to be looking for is to attach a docstring to some parameter or definition of some object. We have syntax for this that some code editors already support (see Pylance) but that was never codified. I would be much more in favour of standardizing that practice and reusing the syntax of docstrings than to add a new concept.

This would also provide a solution for the “Doc is not typing” issue described in the PEP. What is different about a piece of text describing a function compared to a piece of text describing a parameter of that function? Why is one of those part of the stdlib and why should the other one live in the typing module? I don’t think the arguments presented in the PEP are sufficient to explain that difference.

EpicWink · August 31, 2023, 11:23am

I’ve created an implementation of the initial version of PEP 727: docannometa. I hope we can gain insights on its usage (or lack thereof).

I would also be happy to revisit PEP 224 (Attribute Docstrings), with an extension for function parameters. Function return-values would still have to be documented inside the function’s docstring, but that’s more manageable than parameters.

hugovk · August 31, 2023, 12:07pm

Indeed, and I regularly click through in an IDE from my code to third-party code to check the parameters or read the code, where I have no choice on opt-in, but need to deal with the cost of additional verbosity.

bryevdv · August 31, 2023, 4:36pm

Please recognize that internal structure of library code is entirely geared towards making it possible for the maintainers to maintain, and any other consideration (e.g whether you regard it as verbose or not) is a distant, distant second, at best. As a library developer who has had to create and maintain extensive meta-programming solutions and custom sphinx extensions to automate many documentation tasks, that I would all dump in the bin in a heartbeat if I could, the idea of a fully supported structured approach to attribute documentation is very appealing. ^[1]

All that said I agree 100% with @AA-Turner about the importance of structured narrative documentation. ↩︎

BrenBarn · September 1, 2023, 3:56am

You’re free to apply that philosophy for libraries you develop. But, as for me, part of my concern about the proposed feature here is that I fear it would enable exactly what you describe. In considering whether to add such a feature to Python as a language, we should keep in mind everyone who uses Python, not just library authors who (reasonably enough) want to make their lives easier and don’t care if it makes life harder for other people trying to read their code. I sympathize with your perspective, but I’m leery of adding features that would tend to widen the gap between what is convenient for library maintainers and what is comfortable for the (much larger) group of people who aren’t library maintainers but would still like to be able to read Python code they find in the wild.

tiangolo · September 1, 2023, 9:11am

Thanks all for the feedback, concerns, or support.

Pressure to adopt this

It’s true that there could be a potential pressure to adopt this just for being a PEP, yep, I understand it, I can see how it could happen.

And indeed, if editors and tooling ended up giving better support to this just for being a PEP than the support they currently give to alternatives, resulting in a better developer experience, I can see how that could easily become something users would want and put pressure to have it.

And for people editing the documentation part, as editors already have good support for standard Python syntax, there’s a high chance they would end up preferring this, despite it meaning moving the verbosity from the docstring in the function body to the signature.

Yet another standard

About it being “yet another competing standard”, it’s true, it is, just like pyproject.toml. But it has a few key differences with current approaches, including standard Python syntax and direct runtime access, I’ll update the PEP to make that a bit more clear with a list of the specific “features” of this proposal.

Readability

About comparing the readability and editability of one of the common docstring formats vs this, I think that’s quite subjective. To me, this is actually more readable, with syntax highlighting separating the types and their docstrings. I would expect this could be more readable and editable for newcomers that still don’t know with full expertise any of Sphinx, Numpydoc, or Google-style.

So I would personally consider this a bit more welcoming and open to newcomers, the same way editing pyproject.toml is so much more convenient than it was using setuptools (one of the big reasons why I never built something before pyproject.toml).

But again, if this is a friendlier and better developer experience for newcomers or not is deeply subjective. And no one here is a newcomer, only seasoned experts with well-defined mental models and docstring expertise, so we don’t have a way to know how this would feel for others before trying. I just have an intuition that this would be useful for a wider audience than just me, the same intuition that has led me to build anything else I have.

Fortunately we have typing_extensions just for that:

Enable experimentation with new type system PEPs before they are accepted

That way we can see how useful (if at all) this is and if the drawbacks are worth it.

Go to definition

I also commonly go to the definition of libraries I use, and I have contributed to/or explored other libraries, e.g. SQLAlchemy, which has very long docstrings. And I remember scrolling pages of docstrings before arriving at the implementation. I remember I ended up editing the docstring for another function just because it was so far way from the signature that I ended up scrolling past it and ended up in the next function.

So, I would think readability and usability can be a bit subjective. I think several comments imply that if the documentation is in the docstring it would no longer consume characters and space, or it would no longer be verbose, or would be more readable. Also, when I read a docstring, I don’t have full certainty that it’s updated and has the right documentation for the current parameters. Sometimes it can get out of sync, missing a parameter or keeping one that was removed. That has also happened to me when using mainstream libraries. If you could simply omit all docstrings, then yes, all the code would be quite short, but I personally don’t consider docstrings and documentation in general as something independent and maybe far away from the implementation… If I was reading the implementation of someone’s library, I would actually like to be able to see the docs for some parameter or return type right there, next to the implementation, I would find that more readable. But again, that’s subjective. In the end, it’s about tradeoffs and priorities. And what has higher importance for each person.

I can see that, for some, it has a much higher priority to quickly read/edit the signature of a function, without seeing its documentation in the same place, than anything else. In your own library or in external libraries. Nothing wrong with that, just personal preference.