PEP 727: Documentation Metadata in Typing

saper · November 15, 2023, 10:48pm

I was trying to rebase FastAPI pull request 5168 (no link due to Discourse limitation) to the master branch.

Here is a screenshot (my apologies) of the edit conflict in the editor:

If you can't see the screenshot

Try the following commands to reproduce it:

git clone https://github.com/tiangolo/fastapi
cd fastapi
git fetch origin pull/5168/head
git checkout -b change FETCH_HEAD
git rebase 480620372a662aa9025c47410fbc90a255b2fc94

and then examine the confict in the fastapi/applications.py file

In the top part (the code from the pull request based on the older FastAPI master), I have condensed lines of code, one per argument to the function.

With annotated docs, this all got expanded heavily, to get to the next function argument I have to scroll at least half a screen down.

757 lines of code for __init__ parameters, plus extra 90 lines of code for some instance variables.

Maybe everyone is now working with dual high-DPI 4K displays now with a tiny font, but the amount of screen/mental estate needed to wade through this is significant to me.

@ntessore has a very good point that this proposal should be taken at the face value and not necessarily to be analyzed in the “FastAPI-way”, but that’s the only example I have.

I think that some of the issues people raise here are also inherent to the potential verbosity of typing.Annotate we might have already.

I also don’t like the verbosity of something : type = field(lost_of_things) in dataclasses - recently had to re-think my line length strategy to accommodate the use of dataclasses_json field metadata configuration, so maybe we try to put too much into something that logically and mentally used to be a single line of code at most?

ofek · November 15, 2023, 11:00pm

Wow, that’s even harder to parse in practice than I imagined.

dimaqq · November 17, 2023, 4:14am

I’ve been mostly lurking in this thread.
Now having read this, I have a clear opinion.
It’s something that someone else mentioned above:

yes to annotations used to document stuff
no to type annotations used to document stuff

It would be awesome to have syntax to combine different classes of annotations.

ajoino · November 17, 2023, 4:33am

Combining type and other kinds of annotations is the explicit purpose of Annotated right? You can, atm, just add a string after type like this Annotated[str, "documentation tbd"] and tools could pick that string up as documentation (but I’m not aware if any tools do that). Are you saying you want a way to do this without Annotated or am I misunderstanding?

dimaqq · November 17, 2023, 6:20am

There a couple of issues with current PEP version wrt. type and doc annotations:

name: Annotated[str, Doc("The user's name")] — requires type, what if someone only wants a doc? There’s no standalone Doc in the PEP.
UserName: TypeAlias = Annotated[str, Doc("The user's name")] — confuses what is being annotated. Arguably, some function arg name: UserName should get doc slapped on it; instead, doc is slapped on an alias.

What would be awesome is being able to express any of the below:

name: "User's preferred personal name"
name: str @ "User's preferred name" 
name: "User's preferred name" @ str
name: str

Specifically, have doc decoupled from type by default.

(I’m abusing matrix multiplication operator because as a stand-in for hypothetical syntax that would allow several annotations on a single object)

kknechtel · November 17, 2023, 10:36pm

It couldn’t use bare strings for this, because type-checkers will understand those as forward references and/or because it would interfere with PEP 563 and/or 649 (i.e., attempts to treat a string as describing a type to check).

If we had & to define type intersections (something that meets both type qualifications), then it would be natural to define e.g. Doc such that subscripting it with a string gives a result functionally equivalent to object (I think this is correct and that it shouldn’t be Any) and then use & to combine it with other type annotations. (Maybe we’d need to define that Any & x → Any?)

webknjaz · December 1, 2023, 4:09pm

I didn’t read all replies (maybe somebody’s pointed it out already?), but this specific thing already works in Sphinx today (not exposed in runtime, though). The idea is derived from an old rejected PEP but had probably been in use since Sphinx got created. Other cased, though, may not work, that’s true. That said, I personally prefer docstring-like interface much more, than sticking it into typing since it’d have questionable DX…

pradyunsg · December 1, 2023, 8:36pm

More generally, the discussion here seems to have settled down for the most part and gone around in circles a few times.

@tiangolo @Jelle Do either of you have any thoughts on when you’d want to submit this PEP for a decision to the SC or otherwise move past the draft state here?

tiangolo · December 1, 2023, 10:09pm

I’m still waiting for @erictraut’s feedback after my last response. There are probably more updates to the PEP to be done, but I’ll wait for that first.

And I’m in no particular hurry to submit it for a final decision, there’s a couple of my projects where I want to use it first, that will help as a real-world test experiment to gauge how useful or problematic it really is.

erictraut · December 2, 2023, 12:38am

Apologies, I didn’t realize you were waiting for a response from me. I’m not sure I have anything to add beyond what I said in my last post. I don’t think your response addressed any of the core concerns I raised, so my perspective hasn’t changed.

Do you have any thoughts about my alternative proposal? I think it addresses all of the stated problems you were attempting to address with your proposal, and it avoids all of the downsides that have been raised by people on this thread. Based on the large number of “heart” reactions to my post, I presume that many other community members agree with this perspective, so you’d probably get a lot of support if you pursued this direction.

srittau · December 2, 2023, 12:00pm

(I thought I had already posted in this thread, but it seems I was mistaken.)

I am sympathetic with the need to easily parse documentation in a structured way, both when generating docs, but also at runtime. That said, I don’t think this proposal is the right way forward and would harm source code readability and maintainability for marginal gains. I think most of the points below have already been made, but I wanted to summarize my thoughts on it regardless:

While originally PEP 3107 added support for arbitrary function annotations, since PEP 484 type annotations are the de-facto standard for them. And while PEP 593 (Annotated) opened an avenue for adding custom annotations, all examples in that PEP use it for supporting custom type annotations. Using annotations for multiple purposes confuses their purpose and makes them harder to read and understand.
So far, in Python, there is a standard for documenting functions in Python: docstrings. Adding documentation to annotations would mean that we now have to document a function in two different ways: a docstring for the general function information and type annotations for parameter documentation.
This would also mean that the documentation in the source code is now split into two and I’d have to look in two different places to grasp how a function works: The docstring and then I’d have to scroll up(!) to read the argument docs.
On the other hand, it means that the arguments are now split over multiple lines, sometimes even several screens. Type annotations are sometimes problematic already, because they can increase the size of a function signature, but adding natural language strings - sometimes quite extensive ones - would add to this problem significantly. This is especially bad for functions that only take a few arguments and are often understandable just by looking at their signature. Now the function definition will be much longer and harder to parse by eye.
All this leads to function definitions becoming much harder to read as some of the examples in this thread have shown.
This has definitely already been mentioned, but using Annotated feels like an unnecessary extra step. If this proposal would be accepted, at least use a plain Doc annotation, without the unnecessary wrapper, which adds more visual clutter.

I would rather like to see a standard for a parseable docstring standard emerge and possibly extend docstrings to class attributes and module-level variables.

ManiMozaffar · December 5, 2023, 2:57pm

I’m sorry this thread has gone quite large, I was looking at FastAPI source code after updates with the suggestion here, here’s my input:

I think Doc makes the code very dirty.
I see the value Sebastien is trying to achieve

so I think the best solution, is to have something defined with “doc” the same way we have with overload?
Imagine this case:

class T:
    @doc
    def something(val: str = Doc("....")): ...

    def something(val: str):
        # actual code
        ...

same way, maybe if within same file, then it should also work

@doc
def something(val: str = Doc("....")): ...

def something(val: str):
    # actual code
    ...

Might sound too magical, but it keeps the code out of Annotated, and docs can be accessed in an automated way. @tiangolo what are your thought about this design?

bryevdv · December 5, 2023, 6:00pm

As a large library maintainer, this means I have to duplicate every function declaration, and worse, spend time and effort to ensure they don’t drift out of sync. More work is the opposite of what I’d want, personally.

ManiMozaffar · December 5, 2023, 6:36pm

Not really tho, if you don’t respect overload, then IDE already showing that. same can apply here.

That is surely a drawback but doesn’t stub file means has the same? or basically any other doc string convention can be argued to be kinda duplicated.

Thinking out of box, it has to be either one of these scenario:

Using Doc with typings such as annotated types
pros: 1. no duplication, docs is where the code is.
cons: 1. makes code ugly and dirty. 2.Annotated itself is also confusing for beginners as argued above

Using the way i recommended
pros: 1. docs is where the code is
cons: 1. still a function declare duplication

Using with sub file
pros: 1. can be checked by mypy already
cons: 1. docs located in different place than code 2. confusing for beginners 3. duplication

I might be wrong, if there’s any cons/pros to consider, feel free to contribute, but generally I think it might be valuable to narrow down the pros/cons, then consider the way of going forward, not statically accepting only one solution.

jamestwebber · December 5, 2023, 6:59pm

There’s also the status quo: put documentation in a docstring.

MichaReiser · December 8, 2023, 9:05am

Hy everyone.

I’m working on Ruff’s formatter, and we started working on formatting code examples inside docstrings. One concern with this proposal is that formatters would need to implement (limited) symbol resolution to ensure Annotated resolves to the typing model rather than any other type named Annotated if it they want to format code examples in parameter docstrings (although I imagine that writing them will not be much fun). I’m not worried about the complexity but this will harm the performance of developers tool, something many users started to care about a lot.

It may also be worth thinking about how such docstrings should be formatted in a readable way, to ensure people read the documentation.

tiangolo · December 11, 2023, 12:27am

@erictraut Not sure if you got the chance to read this reply above: PEP 727: Documentation Metadata in Typing - #118 by tiangolo

In there, I comment about a few specific things, in particular, I ask for your guidance about how to proceed.

I ask for more info about the numpy use of type alias that you mentioned. As this would be a very relevant and important use case.

I clarify and argue about the actual motivation and value of the PEP, which is not what you describe. I wrote a lot about it, updated the PEP to include several bullet points summarizing it, and included in the post above.

Anyway, if you think that’s a useful thing to have in the PEP and would make it implementable, I’m willing to update it to cover that.

I remember you said at some point that to consider implementing something in Pyright it would have to have a PEP, and that you wouldn’t implement something without a standard.

Nevertheless, it seems Pyright already implements support for strings under type annotations, which is not formalized in any PEP.

I would like to see what would need to be updated in this PEP for you to consider adding (experimental) support for it.

I mentioned that I would think it’s then better to leave that door open, it’s better to underspecify than to overspecify.

I thought the use case of transferring information in type aliases was very valuable (as also highlighted by several in this thread), but if you consider that unimplementable, then I’m okay with not defining that here, nor prohibiting it, in case future PEPs define it further.

For me, the base use case of documenting function parameters with Annotated (without type aliases) is already useful. That, with the minimum requirements considerable for this PEP, is already implemented in FastAPI and already powers the docs reference: Reference - FastAPI.

It would be great to see what needs to be updated in the PEP for you to consider adding experimental support for that, in a similar way to how you have support for strings under variables.

That’s a perfectly valid alternative for someone else to pursue. Unfortunately, that doesn’t solve any of the use cases I’m interested in solving, so I wouldn’t work on that myself.

I added the features I’m interested in to the PEP, to the comment above, I’ll add them here again for completeness:

Sebastián Ramírez:

Editing would be already fully supported by default by any editor (current or future) supporting Python syntax, including syntax errors, syntax highlighting, etc.

Rendering would be relatively straightforward to implement by static tools (tools that don’t need runtime execution), as the information can be extracted from the AST they normally already create.

Deduplication of information: the name of a parameter would be defined in a single place, not duplicated inside of a docstring.

Elimination of the possibility of having inconsistencies when removing a parameter or class variable and forgetting to remove its documentation.

Minimization of the probability of adding a new parameter or class variable and forgetting to add its documentation.

Elimination of the possibility of having inconsistencies between the name of a parameter in the signature and the name in the docstring when it is renamed.

Reuse of documentation for symbols used in multiple places via type aliases.

Access to the documentation string for each symbol at runtime, including existing (older) Python versions.

No microsyntax to learn for newcomers, it’s just Python syntax.

Parameter documentation inheritance for functions captured by ParamSpec.

The items marked with , are not doable with current docstring conventions. That is the main motivation.

All those points are already achieved with the current state of the PEP (or whatever reduction of it results) and typing_extensions.Doc, except for in-editor rendering.

Your proposal requires someone to:

I’m not sure who would be willing to put the effort to try to bring consensus among the several current conventions, pick one as the “one true format”, and convince everyone else to convert. Or otherwise, formalize not one but three standards.

This sounds like an enormous amount of work, a library that would have to do lots of things, a lot of functionality, sounds like a combination of mypy, libcst, and pyright. It’s all work that hasn’t been done, and would take a long time, effort, and engineering. So, from PEP to usability would be a very long path. I wouldn’t be the one to do it as an individual, but maybe a big organization with resources like Microsoft could be the one to take that massive effort and actually make it work.

Same argument as above.

My expectation was that given my PEP uses the same pure Python syntax and that editors and language servers would only have to extract values from their already existing AST nodes, it would have been much easier to have at least some experimental support for this.

I wouldn’t think what you propose would make it any easier to get support from IDEs and language servers.

I think many like the general concept of what you propose, but with very different ideas about what this proposal would be, some would want sphinx, some numpydoc, some google, some would want everything, some would one single winner.

There’s of course the main issue of, who would take that gigantic engineering effort described above, and the even bigger effort to try to bring consensus about all that.

Other ways forward

Now, let’s step back for a second, maybe there are other ways we can approach this.

One of the main things I wanted was for the (at least) 250K+ monthly FastAPI users (growing linearly) to get in-editor docs support, in a way that can still be maintainable (e.g. allowing me to check with pytest that docs are consistent in CI, which I’m doing, etc, all the bullet points above).

I had understood that the only way to achieve that was with a PEP. I’m trying to figure out the best way to achieve that while still solving the original needs in the PEP (bulletpoints above).

On the other hand, as I see that Pyright now supports some non-standard formats (e.g. strings under variable names), that makes me thing that maybe it could then now be possible to have Pyright support for something outside of a PEP, removing the problems of having something look like “the blessed option” with all the drawbacks, while still being able to achieve and solve the same original needs. And this way we could have an experiment of this approach/idea only in the constrained community of FastAPI and friends, which would make the experiment much less invasive than what I understand many are feeling it currently is by adding it to typing (despite my efforts for marking it as optional).

I’m willing to simplify and reduce the PEP as much as necessary to make it work. I would rather not “close doors” forbidding future changes, but I’m willing to reduce what is specified and expected to the minimum.

But I would also be willing to publish a package annotated-doc with only this Doc class, if there was a way to get (at least experimental) support for it in Pyright/Pylance/VS Code.

In case anyone from JetBrains reads, of course, I would love PyCharm support as well, but as no one from that team has publicly commented (only in private), I’m focusing on VS Code here.

@erictraut please let me know what you consider would be the best way to proceed.

mikeshardmind · December 11, 2023, 1:36am

For the record, this is just patently incorrect when it comes to writing standards that other things should adhere to, and with regard to typing specifically, has become a large enough pain point that the typing council was formed and there’s an ongoing effort to clear up differences in behaviors all over between type checkers and clear up anything underspecified. Overlyspecified, when it comes to defining how something can be used is significantly easier to relax after the fact than it is to make language stricter after the fact. Outside of python, the IETF reached a similar conclusion with regard to robust definitions

mdrissi · December 11, 2023, 4:23am

My view having used the style this PEP proposes for years in practice, the goal is not to move documentation from docstring to Annotated for equivalent functionality, but that current docstring formats do not support templating for parameters. Concretely how do you document code like,

class Foo1:
  def __init__(self, dropout):
    ...

class Foo2:
  def __init__(self, dropout):
    ...

class Foo3:
  def __init__(self, dropout):
    ...

where there may be dozens/hundreds of classes that share same parameter and you want all of them to have consistent documentation. These classes can be spread out across files or even separate packages entirely (tensorflow ecosystem has many packages with common parameters). Where you may change implementation of dropout logic that lives in shared helper/class and want docstring generated to stay in sync. A fixed static docstring can not handle this. Even if google/numpy/etc docstrings were standardized that would not support this use case. Concrete libraries where classes/functions that share parameters dozens/hundreds of times would matplotlib/tensorflow/pytorch/etc. I mainly work with those kind of libraries and see certain parameters that have some underlying math meaning repeat heavily. Being able to annotate all of them as,

DropoutT = Annotated[float, "Probability of keeping unit from (0, 1) where 1 is fully keep (like no dropout), while 0 would be to fully erase."]

class Foo1:
  def __init__(self, dropout: DropoutT):
    ...

...

For any existing docstring format to be viable solution, not only would one need to be standardized, it would need core extensions that support templating with variable aliases which requires import/variable resolution and is not something a lot of docstring tools handle.

Edit: I’m fine with an argument of existing docstrings handle many use cases well and usage of Annotated/similar to support these new features is not worthwhile standardizing today. But I think an argument that existing docstrings are equivalent in power to this proposal feels like ignoring key points.

mikeshardmind · December 11, 2023, 5:18am

But several documentation tools like sphinx do. When your project grows to the point of that amount of complexity, and shared parameters, you write prose documentation. (Edit: specifically, you can have a preface to a module that describes how common parameters are used, and is linked to by your document generation tool of choice.)

There’s also the option of designing your APIs to take in a common set of kwargs, document a TypedDict once, and use unpack… or a config class that’s documented once and linked to, etc