PEP 727: Documentation Metadata in Typing

Jelle · October 18, 2023, 2:48am

That’s a strange claim. Sebastián’s argument is essentially that the existing way to do this is error-prone, so we should provide a new approach that is not prone to this category of errors. That’s a common argument for new features: for example, the Rationale section for PEP 498 (introducing f-strings) opens with an account of how % formatting is prone to a certain category of mistakes. Similarly, PEP 616 (str.removeprefix) is motivated by the fact that str.strip()-like methods are prone to certain categories of mistakes.

It’s absolutely Pythonic to design the language in such a way that mistakes are as unlikely as possible.

pf_moore · October 18, 2023, 12:17pm

The question is whether these mistakes are significant enough to warrant the proposed solution. IMO, they aren’t that common, their consequences aren’t that major, and the solution is hugely disruptive for the many people who either already use a different approach for documenting this information, or don’t want to link documentation and type annotations this closely.

It’s also important to consider the downsides when looking at any proposal to fix a percieved issue - is the cure worse than the disease? IMO any proposal that considers a 700+ line function signaure a legitimate approach is imposing a huge readability cost on consumers - or at a minimum, has a vastly different view of the value of conciseness than I do (and I assume my opinion is fairly average on this matter, based on the function signatures I routinely see in open source projects I’ve worked with).

EpicWink · October 18, 2023, 9:17pm

My understanding is the disruption you’re talking about is the new pull-requests opened which move parameter docs from function docstring to parameter annotation.

First, I wouldn’t consider pull-requests (PRs) disruptive; at worst a nuisance. For PRs which add new functions, I would say the use of doc annotations is like a styling decision, and the PR’s author can be requested to switch to putting docs in the docstring. Perhaps a linting tool can learn a config to disallow doc annotations for a project.

Second, if there are to be many PRs which use doc annotations, perhaps that’s a sign that a large number of devs consider them to be better than in-docstring, their experience notwithstanding. I hope this won’t be the case, because I don’t think it’s worth the feature-less refactor.

I recall you mentioning you’re worried about IDEs dropping support for in-docstring parameter docs; is there a way to signal to IDEs and other devs that in-docstring is never going away? I know many like to follow PEP 8.

The initial proposal mainly mentions scrolling, which means more than one screen (30-40 lines?), and that’s including the docstring. There’s a few functions I use very regularly which satisfy this, mainly auto-generated API commands (eg new AWS EC2 instance or S3 object, new Docker container) or old functions with many updates (eg Scipy optimise methods).

In any case, I’m more interested in this proposal because of its easier access at runtime to parameter docs, without needing to guess a format and parse the docstring.

flyinghyrax · October 18, 2023, 10:08pm

This is how I view the proposed benefits I’ve seen so far:

Documentation consistency (deduplication, name mismatch, undocumented parameters, etc.)
Documentation reuse (type aliases, ParamSpec)
Runtime introspection of parameter documentation

Is there anything I’ve missed that doesn’t fit in one of those categories?

For whatever it’s worth, my perspective on those proposed benefits:

This is something that can be handled right now with a linter or a documentation generation tool that inspects the AST, without even standardizing a format beyond particular tools. It could be further improved by formally specifying a docstring format in the manner of Javadoc, JSdoc, C#, etc.
Somewhat compelling, though I don’t have any insight into how useful it would be in practice.
This seems specialized enough that it could be left up to particular tools. How common of a problem is this? I understand the use case for FastAPI and Typer, since those reuse docstrings for user-facing documentation. But that is a specialized use case that I don’t feel generalizes to function definitions broadly. And e.g. Typer is already using Annotated for additional parameter configuration, and there’s no reason that Doc couldn’t be added to that without language-wide standardization.

In regard to very long function signatures: yes, such functions exist, and in widely used libraries. But I feel that we should not “optimize” for such cases, because extremely long function signatures are bad APIs. Speaking to Laurie’s examples, signatures like this show up in autogenerated libraries like the AWS and Azure SDKs - I’d argue both bad APIs, generated by dumping every possible configuration option for an HTTP API endpoint into a function definition (mostly sans type annotations too, for boto3 anyway…). I’m less familiar with the SciPy case, but I assume this is for radical backward compatibility - it is not how one would design an API from scratch for usability, given the chance. Often there are groups of parameters that are mutually exclusive - which are best documented as groups and not individually; and deprecated aliases for other arguments - which you might want to omit or seclude in documentation, not reuse an existing description.

Lastly, I think Paul’s points about how this could affect social dynamics are very important. If a feature gets “blessed” with a PEP, there is significant community pressure toward that feature solely because it was blessed, whether or not it is optional. It will quickly appear in blogs, style guides, and linters, and it’s likely a cohort of developers will learn and use it just because it is The Latest Standard.

pf_moore · October 18, 2023, 10:16pm

The disruption I’m talking about is the social pressure to “follow the standard approach”.

So you prioritise runtime access over user ability to read the source code? Sorry, that’s not what you said, I know. But I’m happy to have easier runtime access to parameter documentation, I just don’t think it’s important enough to make my function declarations unreadable, which in my personal opinion this proposal does.

Ultimately, an awful lot of this debate is about personal preferences. And I don’t think a PEP that expresses one particular personal preference with no community consensus demonstrating that the majority of Python users agree with that preference, should be approved. It can be a tool-specific design choice, certainly, but making it a language standard adds a huge amount of pressure to conform to that particular subjective choice. Ask anyone who disagrees with a PEP 8 rule whether they think ignoring PEP 8 is easy…

BrenBarn · October 18, 2023, 10:56pm

I mean, you’re right of course, but so am I. Any proposed change can be viewed either as a way to avoid bad things or to provide good things. All I’m saying is that a framing of “we want to eliminate problem X” isn’t convincing to me, because problem X won’t be eliminated; all we’re doing is adding a new thing that we hope is better.

That’s also the case for PEP 498, and, rereading the PEP, I don’t see that it was couched in terms of “the goal is to eliminate these problems”. Rather it was “there are some problems with % formatting, and str.format has some advantages but is too verbose, so we’re adding a new way that tries to build on what went before”. If we wanted to “eliminate” the problems with the old formatting mechanisms, we’d have to propose removing those old mechanisms. And likewise if we wanted to “eliminate” problems of mismatches between docs and runtime behavior, we’d have to propose some kind of mandatory annotations (and even that wouldn’t do it, since no tool can read the doc strings and tell us if they will correctly communicate what they intend to communicate to a human).

And that’s not being proposed here. So my point is just that claiming this change will “eliminate the possibility of inconsistencies” in docstrings is overstating the benefit. It will provide a possible way to avoid some such inconsistencies — at a cost of readability and time to write the function.

No doubt. But we also want to avoid making non-mistakes difficult as well. As @pf_moore said, a lot of this comes down to preference. It’s a trade-off between certain up-front costs (e.g., everyone is expected to take extra time to craft more verbose function signatures and take extra time to read and understand them) and certain future benefits (e.g., writing those signatures will make a certain class of runtime errors less likely), and different people have different preferences on that. As I mentioned on another thread though, my own feeling is that this kind of tradeoff isn’t ultimately going to provide all the benefits that people hope for in Python, because the whole world of such annotations is disconnected from the actual runtime behavior. Either there will be too many holes through which errors can pass, or the chore of patching them all with increasingly verbose annotations will become too burdensome.

beauxq · October 20, 2023, 8:26pm

I first got interested in this PEP because...

I had a situation where I put a lot of information in a doc string, and then I made a new class and I thought “I want that same information in the doc string for this class.”

But, of course I don’t want to duplicate information - violating DRY - maintaining the same info in 2 places. So how can I have the same information in 2 doc strings without having to maintain it in 2 places?

It’s not clear to me whether this PEP even addresses that, but that’s why I got interested in it.

When I first saw the proposal, I was like “err… um… maybe…”,
but then seeing this example https://github.com/tiangolo/fastapi/blob/df4c501136c76a2ef83e3c7e8330c15b5f84491b/fastapi/applications.py#L51-L646
I’m more like “no”.

How am I supposed to see the parameters to the function? I don’t want to have to scroll through 600 lines of documentation to just get an idea of what parameters the function takes. It’s a significant amount of work for my eyes to pick them up. And the default values are so far away from the parameters.

I imagine someone might respond to this pointing out that I can collapse the doc strings. But the amount of work to collapse them, or uncollapse them when I want to see them, is more than putting my mouse over the parameter name to see the information that a tool parsed from the function doc string for this parameter.

more on length

Annotated[,Doc("""""")]
That’s 23 characters repeated for every parameter that add practically no information.

Even if we use import Annotated as A and Doc as D
A[,D("")]
That still seems like a lot of extra junk to have to look through.

And that isn't a good solution because

some people will use import Annotated as A, and
some people will use import Annotated as An, and
some people will use import Annotated as Ann, and
some people will use import Annotated as Ad,

and it will be different in different code bases, and then there’s still no standard way to write the documentation.

I think it would be better to take one or multiple of the existing solutions of putting the parameter information in the function doc string, and having tools that know how to parse that and show me the parameter documentation in a pop up mouse-over when I move my mouse over the parameter name.

I think similar tools could make the same information available at run time.

fonini · October 20, 2023, 10:29pm

I’d like to emphasize this. In the linked FastAPI example, scanning the 600-line function signature it’s pretty easy to miss the argument defaults. Also, black makes the situation worse because it introduces so much indentation and so many lines of just ),.

Also – and maybe this is just nitpicking, but it troubles me – the docstring convention of separating the title line from the text body with an empty line makes it so that there are empty lines inside the docstring but no empty line between parameters. At least to my eyes, this breaks the nested hierarchy of information. ^[1]

Nested hierarchy of information, what’s even that? Sorry, I don’t know how else to put it, but I hope you know what I mean. Please let me know if there is a proper word for this ↩︎

ods · October 21, 2023, 2:54pm

How is it supposed to be used when you don’t want to declare a type? arg: Annotated[doc("doc is here")]?

jp-larose · November 4, 2023, 5:36am

I seem to be late to the party. I read the PEP and the top few pages worth of comments, but there are way too many for me to care to read.

I stumbled upon this when looking at the code for FastAPI and seeing how they documented their parameters. I immediately thought this is a great way of putting the documentation closer to the thing it’s documenting and without having to repeat the name. Fantastic application of the DRY principle.

What don’t like is the oft mentioned verbosity of embedding the Doc in a Annotated. The other aspect I like even less though is that the default value of the parameter (or whatever is being documented) is lost at the end. It then becomes less obvious whether or not there is a default value provided.

Several commenters have suggested the much cleaner solution proposed in PEP 224, i.e. docstrings below the variable. While this was rejected, this was over 20 years ago, and the language, libraries, and interpreter have all changed since then. It might be worth revisiting? The reason this keeps being proposed is that it’s by far the most similar to how classes, functions, methods, and modules are documented. There’s tremendous value in this uniformity.

Here are a few other ideas. Admittedly none are really fleshed out:

If we allow a minor change to the language, one idea to mitigate the problems identified in PEP 224 is to add a d prefix to strings intended to be documentation.

def my_func(
    a: int
    d"The first parameter",
    b: str | None = None
    d"""
    An optional string to do something with `a`.
    """
) -> MyClass:
    d"""
    This function does something and returns an instance of MyClass.
    """
    ...

Use the decorator syntax. (I don’t fully understand the objection to this as written in the PEP.)

def my_func(
    @doc("The first parameter")
    a: int,
    @doc("""
    An optional string to do something with `a`.
    """)
    b: str | None = None
) -> MyClass:
    """This function does something and returns an instance of MyClass."""
    ...

tmk · November 4, 2023, 11:19am

I once proposed this on the mailing list: Mailman 3 Runtime-accessible attribute docstrings – take 2 - Python-ideas - python.org You might find some relevant discussion there. (Though re-reading it, my proposal was slightly different – I was proposing moving the docstring before the thing it’s documenting.)

jp-larose · November 6, 2023, 3:00am

Thanks for the link. I read through several of those ideas (again, too much to read in one sitting). The biggest takeaway I get from reading that thread and this one is that Python programmers are looking for a better way to document var-like declarations than what we have now. And it needs to be a way that tools can use to extract the information, so simple comments don’t cut it.

I like another of the ideas proposed on that mailing list thread, i.e. var : vartype = value : docstr. It’s not my favourite, but here’s what I like about it:

It clearly links the doc string to the variable it’s describing.
Far less verbose than Annotated[vartype, Doc(docstr)]
The value / default is still closely linked to the variable name (provided the type, if provided, is not to lengthy)
Allows for the doc string to be inline with the rest of the variable/parameter/field/etc
It’s a simple extension that can piggy back on slice notation.

That last point may also be a source of problems however, so more thought is required.

Someone on that mailing list thread had suggested that this notation could be syntactic sugar for var: Annotated[vartype, Doc(docstr)] = value as proposed in this PEP. One of the biggest virtues of this PEP is that it doesn’t require changes to the language. There is plenty of precedence for advances in type hints to influence future language changes. So, if we’re not willing yet to change the parser to support inline documentation of var-like declarations, this PEP is very much a good alternative.

Overall, my preference is still the cleaner idea of having the docstring be the first string after the var-like thing it’s describing. It’s more similar to the supported use of doc strings elsewhere in the language.

EpicWink · November 6, 2023, 3:37am

As mentioned in Revisiting attribute docstrings - #10 by fonini , there’s a conflict with string default values and whitespace-separated string concatenation. You’ll have to separate the docstring in some way.

jp-larose · November 6, 2023, 4:50am

Oh, right. Thanks for pointing it out. It’s one of the unfortunate side effects of having implicit string concatenation (as opposed to requiring an operator like + to split strings across different lines). And I doubt that indenting the string would suffice to differentiate it as a doc string. So that leaves needing some sort of explicit notation.

ntessore · November 6, 2023, 12:33pm

Reading the PEP again, I note there’s quite a difference in tone between the text of the PEP and the discussion here. The PEP could more neutrally propose a container typing.Doc to store parameter docstrings in typing.Annotated types. I think that in itself is a good and needed addition.

What the discussion has focussed on is mainly the direct use of the new Doc annotation in the “FastAPI-style”. Maybe it would help the PEP to put more emphasis on the fact that the Doc could in the future be filed by other means, if someone came up with a neat syntax. There might also be adoption by tools for parsing other parameter docstring formats into Doc annotations for runtime use. For example, the PEP might mention that e.g. numpydoc could provide a @numpydoc decorator, say, that parses existing numpydoc-formatted docstrings into the new format:

@numpydoc
def frob(widget):
    """Docstring in numpydoc format here"""

Other than that, I would have liked to see a mention in the PEP about obtaining the parameter docstrings from inspect.signature().

pf_moore · November 6, 2023, 12:40pm

This is a reasonable idea, but I still think that tying parameter documentation to type information is a bad idea. For example, how would parameter documentation get stripped using -OO? Unlike all other docstrings, this proposal puts parameter docstrings in a location that can’t be easily stripped by the core interpreter.

ntessore · November 6, 2023, 12:47pm

Is that so different from setting func.__doc__ manually at runtime? Perhaps we might consider this here only a first non-syntactic approach to parameter docstrings, where everything is constructed programmatically. If parameter docstrings turn out to be a great idea, then maybe there will be a time to come up with “proper” syntax for the feature, at which point the parser itself can strip out the docstrings when -OO is given.^[1]

If -OO survives that long. ↩︎

pf_moore · November 6, 2023, 1:00pm

Yes, in the sense that the normal, recommended, way of writing a function, class or module doocstring is statically, in a way that -OO can handle. If we’re now proposing to add a way to write docstrings for function parameters, or for variables, we should either stick to that principle, or be very clear that parameter/variable docstrings are handled and viewed very differently than other docstrings.

Also, I assume that tools like pydoc and sphinx will want to display parameter/variable docstrings. They will likely want to do so statically^[1] and in a way that doesn’t reqire them to implement a type parser…

Anyway, I’m just repeating things I’ve said before at this stage.

to avoid having to execute arbitrary user-defined code ↩︎

jp-larose · November 6, 2023, 11:52pm

Seems like this is where Revisiting attribute docstrings discussion comes in.

dimaqq · November 7, 2023, 4:05am

True for plain code, gets rather complex for massively decorated functions.

Arguably that’s also the case for type hints, but I gather the proposal expectation was to inspect the callable f_aaa rather than AST, although I’m not sure what’s actually harder.

I guess that’s a general problem: someone may want to document function body and someone else a resulting callable.

Topic		Replies	Views
Type annotations, PEP 649 and PEP 563 Core Development	25	6699	October 4, 2023
Revisiting attribute docstrings Ideas documentation	29	3369	December 13, 2023
PEP 7xx: Dataclasses - Annotated support for field as metainformation PEPs typing	14	1892	July 29, 2023
Docstrings for new type aliases as defined in PEP 695 Typing documentation	5	1751	December 6, 2023
PEP 8: docstring question Python Help	5	313	February 12, 2024

PEP 727: Documentation Metadata in Typing

Related Topics