PEP 727: Documentation Metadata in Typing

erictraut · December 11, 2023, 5:59am

I ask for your guidance about how to proceed.

As I previously mentioned, my recommendation is to abandon the use of Annotated for documentation and build on the existing docstring mechanisms that are supported by existing tools.

Here’s another idea that addresses most (but perhaps not all) of the issues you’re trying to solve for FastAPI. You could stick with one of the three standard function docstring formats (all of which are already supported by existing tools) and then write a simple CI script that validates that the docstrings are consistent with the parameters in your function signatures. This is similar to what I suggested but without the added work of writing a spec or publishing a library. You could do this entirely within your own CI/build system with relatively little work.

I’m not sure who would be willing to put the effort to try to bring consensus among the several current conventions…

I think you may be overestimating the work involved in documenting, standardizing, and parsing one of the existing formats, but I understand if you’re not interested in championing such an effort.

For what it’s worth, I suspect that you would receive quite a bit of community help if you were to pursue this approach.

I added the features I’m interested in to the PEP…

Yes, I saw those, but I didn’t think they were compelling.

Editing & Rendering: These are already supported by all popular IDEs and language servers for existing docstring formats. The proposed PEP doesn’t add anything here other than a redundant mechanism.
Deduplication and elimination of inconsistencies: As I suggested in my alternative proposal, there are other (arguably better) ways to address this through tooling, and this would benefit the thousands of library maintainers and millions of developers who have already invested in existing docstring formats.
Reuse of documentation using type aliases: As I’ve said, the latest draft of the PEP is very problematic and inconsistent in how it’s treating type aliases. If you were to fix this (what I think is a) significant design flaw, the proposal would no longer address this problem.
Access to documentation at runtime: This is also addressed by my alternative proposal.
No microsyntax to learn for newcomers: The proposed PEP requires newcomers to learn yet another way to specify documentation — one that differs from existing conventions. The microsyntax rules are not onerous to learn, and AI-powered copilots already know how to generate them. As mentioned in my alternative proposal, tooling could further simplify this through automation and validation.
Parameter documentation for ParamSpec: This is already handled today with traditional function docstrings. The proposed PEP adds no new value here.

I presume that only a subset of these are problems that you personally face as the maintainer of FastAPI. For example, the “microsyntax” issue is presumably not a problem for you. It would be useful to understand which of these problems represent pain points for you personally. That might help us find a solution that meets your needs. I’m guessing that you’re primarily focused on “deduplication and elimination of inconsistencies”?

I see that Pyright now supports some non-standard formats (e.g. strings under variable names)

Pyright has long treated string literals that appear immediately after an attribute or type alias declaration as a docstring for that symbol. Other IDEs do this as well.

Attribute docstrings are discussed in PEP 257. It says String literals occurring immediately after a simple assignment at the top level of a module, class, or __init__ method are called “attribute docstrings”. PEP 257 was written prior to the introduction of type aliases, but I think it’s reasonable for tools to treat type aliases consistently in this regard.

It’s important to understand that attribute and type alias docstrings rely only on constructs that are built in to the language. They have no dependency on third-party libraries or their (non-standardized) behaviors. Pyright has no knowledge of any third-party libraries or their behaviors beyond the type information provided by them. That’s a hard-and-fast rule that I’m unwilling to violate — for the same reason that it would be completely inappropriate for the TypeScript compiler to have intrinsic knowledge of certain third-party libraries.

Third-party libraries do not go through the same level of design scrutiny that stdlib does, they are not guaranteed to retain consistent semantics over time, they are not version-locked with Python releases, and they do not have the same backward compatibility guarantees as stdlib. If you want pyright to add support for a feature that requires knowledge of specific classes, these classes must be part of the stdlib, and their intended usage and semantics must be documented in the official Python documentation.

It’s possible that you could convince the pylance team to add support for an annotated-doc library at the language server level. They’ve done this in a (very small) number of cases where there is demand voiced by many pylance users over a sustained time period. However, adding support at the pylance layer is less robust than something implemented at the type checker layer. For example, it wouldn’t be possible for pylance to retain parameter comments for decorated methods that use ParamSpec because that requires the type checker to track the documentation across type transforms. I can’t speak for the pylance team, but given the process they use for prioritizing features, I think it’s unlikely that you could convince them to do this as an experimental feature. You could reach out to them and gauge their interest.

I appreciate that you are trying to address a real problem that you face in maintaining FastAPI. Unfortunately, I don’t think your proposal is aligned with what is best for the Python community.

Implementing your current proposal in FastAPI with the hopes that you can force tools vendors to support another non-standardized documentation mechanism is not the approach I would use. You might succeed in the long run, but you will create additional fragmentation and potentially generate ill will in the process.

I want to make it clear that the statements above are my own opinions. I’m not speaking for the recently-formed Typing Council, of which I’m a member. There may be other viewpoints among the other TC members. We haven’t had any discussions on this topic yet because it hasn’t been submitted to us for consideration.

tiangolo · December 11, 2023, 6:59am

This doesn’t really sound like a simple script to write.

Actually current popular IDEs support rendering, not editing. Maybe PyCharm, but VS Code doesn’t provide any help while editing docstrings.

All the other solutions you propose to these points mean building something new from scratch, a new library, for a problem that is solvable by using Python syntax (as in this PEP). It’s a valid point that it’s solvable, to write a parser, linter, refactoring tools, new editing features, etc. But all that is work that hasn’t been done, it can’t be compared to the work that was already done to parse, lint, refactor and edit standard Python syntax.

The microsyntax definitely is a problem for me. Editors don’t provide me good support for editing these formats, nothing else on top of allowing me to type characters inside of a multiline string.

I understand PyCharm has somewhat better support (if not perfect), but I wanted to be able stay in VS Code, and I also wanted not to depend on an extended feature provided by some editors that is not really writing Python code.

I want syntax highlighting, underline red lines for syntax errors, missing names, extra names, refactoring and renaming a parameter should update the name in the docstring, removing a parameter should remove its docstring, altering the order of a parameter should alter the order in the docstring, checking that two similar functions have the same documentation for a parameter should be doable without having to write and/or use an entire custom docstring parser.

I understand, thank you. Maybe this is a valid approach that wouldn’t affect the rigorousness required for Pyright. Totally understandable it wouldn’t be as robust as Pyright, but it might be a simpler and more volatile way to try the experiment of using this idea.

This is a biased and very strong claim against my intentions and I’m sad to hear you say that.

I have mentioned several times that it can work as a test bed for this experiment. Since the beginning, the problem has been that for something new like this, it would require a standard, but a standard would require consensus, consensus would require some real world usability, and real world users would need tooling support, so it’s a cyclic dependency. I took the hit in FastAPI of putting the work to use it despite not having the wanted tooling support to try this out.

Of course, I hoped tools would consider adding at least some experimental support for it given it would only affect a controlled group (FastAPI users) but at the same time it would provide user data for the experiment from real users. I’m taking the risk of making my project and users the A/B test group.

Your claims completely disregard any possible good intentions as motivation.

Thank you for the clarification.

About the PEP, I understand what you are asking me is to retire it. To abandon the idea of putting documentation in Annotated.

That’s definitely the shortest path of action.

Nevertheless, I would be willing to go through a few more edition iterations before disregarding the idea completely.

That’s where I ask for your guidance.

I’ll remove the mentions of transferring type alias documentation.

From your point of view, what else in the current state makes the PEP immediately discardable from your point of view? I need to know what else to remove before editing.

erictraut · December 11, 2023, 8:17am

From my perspective, the inconsistent treatment of type aliases is the biggest issue. Thanks for offering to make that change.

The other issue that I mentioned earlier is that the PEP doesn’t indicate whether the documentation should be rendered as markdown or plain text. I recall that you were reluctant to take a stand on this, but without specifying it, library maintainers and IDEs will inevitably make inconsistent assumptions.

tiangolo · December 11, 2023, 12:15pm

This is great feedback, I’ll work on it now.

Meanwhile, let’s talk about Markdown.

I personally would favor Markdown, but I know this topic is probably more controversial. I’m trying to think how to conciliate both ideas, of having a defined expected behavior and not ruling out other preferences.

I imagine people favoring non-Markdown flavors would probably be already comfortable with their current in-docstring systems, so there’s a lower chance that non-Markdown users would use this, which makes me think most potential users of this would favor Markdown, like me.

So, how about adding another optional argument format to define the format of the string, similar to how pyproject.toml defines the format of the README, using Markdown as the default.

An argument content_type and explicit MIME types sound very long to be typing it in code (what is supported in pyproject.toml), that’s why I propose format. I would think a shorter syntax with the main predefined formats with short names and a default of Markdown.

E.g.:

from typing_extensions import Annotated, Doc

def say_hi(name: Annotated[str, Doc("The user name", format="rst")]): ...

# OR

def say_hi(name: Annotated[str, Doc("The user name", format="md")]): ...

# Default to Markdown

def say_hi(name: Annotated[str, Doc("The user name")]): ...

What do you think @erictraut? (or anyone else).

pf_moore · December 11, 2023, 12:22pm

Frankly, I think that if a documentation string for an argument needs to use Markdown or ReST, it’s already way too long for me to be happy seeing it in the argument list of a function (as opposed to the docstring, which is intended for long-form textual information). And equally, if you’re writing enough text to care about ReST vs Markdown, saving a few characters doesn’t seem worth the cost of not conforming with the existing (pyproject.toml) approach.

tiangolo · December 11, 2023, 12:53pm

I agree that in most cases it will be a very simple string, that would actually be the same content in Markdown, ReST, or anything else.

I can only think of its usefulness for adding bold or italics.

But still, here we are, it has to be defined in some way, I guess.

@erictraut here’s the PR updating the usage of this for type aliases. Would be great if you could give it a check to confirm it addresses your concerns around type aliases. PEP 727: Specify `Doc` in type aliases documents the type alias symbol, update rejected ideas by tiangolo · Pull Request #3581 · python/peps · GitHub

Regarding Markdown or anything else, I’ll update it afterwards in subsequent PRs as needed, depending on what we conclude here in the conversation, but I wanted to start with the type alias as the conclusion for that is quite clear already.

pf_moore · December 11, 2023, 12:59pm

So why do you even need editor support if the typical case will be a short string of text with (almost) no markup? I’m confused.

jamestwebber · December 11, 2023, 2:08pm

The intent of my comment was just to point out that the status quo is an acceptable outcome here. The previous post was written as if to suggest we must change things in some way and it was just a matter of details.

The proposals in this thread are definitely more powerful, but they come with downsides.

DanielNoord · December 12, 2023, 1:44pm

pylint has had support for this for many years, with support for all three main styles described here. You could use it to create a CI check that everything is still consistent or just copy the code and change it to your desired format. (Contributions are also welcome of course)

abatea · December 13, 2023, 1:12pm

Putting doc per-arg: cue the arguments about how to render them, sortation, etc
Hand-write the docstring. Arguments don’t live in a vacuum (they relate to each other, are sometimes constrained by each other, etc). The function is the relationship between inputs and outputs, so it should be documented as a whole. Docstring is also the only reasonable way to document *args and **kwargs

I probably wouldn’t use this PEP and I’d resent the pressure to use it if it were accepted

johnthagen · December 13, 2023, 2:55pm

Since it hasn’t been mentioned, I thought I’d share the PyCharm features here as prior art of how Google Style docstrings can be parsed and provide a good user experience (it wouldn’t be hard for Pylance/VS Code to implement this I bet).

PyCharm supports setting your docstring format to “Google”:

Python Integrated Tools | PyCharm Documentation

When this is enabled, the first thing PyCharm will do is that when you type """ after a function, it will pre-populate the docstring Args:

If you are missing/mispell a parameter, you get a warning:

Screenshot 2023-12-13 at 9.37.34 AM

Pressing Alt+Enter quick fix on the missing parameter will allow for auto-inserting it into the docstring:

The docstrings are rendered nicely for the user when using Quick-doc:

Screenshot 2023-12-13 at 9.39.09 AM

Argument names are autocompleted:

The section title blocks are autocompleted:

Google Style Docstring Community Support

Google style docstrings are already supported in a variety of other tools:

webknjaz · February 21, 2024, 12:18am

I’d like to add that one of the reasons I use the built-in Sphinx RST fields-based style is because those other styles cause performance hits during parsing. As an example, the darglint tool that validates them can get thousands of times slower as the style is switched (Performance issues in `google` and `numpy` style parsers · Issue #186 · terrencepreilly/darglint · GitHub)

OTOH, it seems like there’s a pylint plugin that might not suffer from this problem.

JarroVGIT · February 24, 2024, 6:12pm

After dedicating several hours to thoroughly reviewing this PEP, along with the accompanying discussion and related PEPs mentioned in the comments, I felt compelled to share my perspective. My experience includes a strong involvement in the FastAPI community, where I gained substantial familiarity with its codebase and design principles. However, due to a shift in personal priorities, my contributions diminished in the past year. Recently, as I revisited FastAPI to update myself on the latest developments, I was struck by the significant increase in code complexity, notably in files that expanded from approximately 900 lines to 4500 lines. While I deeply respect the efforts of @tiangolo, I found the blend of extensive documentation within the codebase to hinder readability to the point I just wasn’t able anymore.

Upon examining this PEP, I gained a better understanding of its objectives. Nevertheless, I respectfully hope it does not proceed for the following reasons:

Practicality: The proposal will adversely affect code readability.
Conceptual clarity: Integrating documentation within the codebase may dilute the coherence of the code itself. Jumping between Python syntax and natural language creates a rather large mental burden I have found going through FastAPI again…
Social implications: The proposed changes could impose undue professional pressure to adopt practices that, despite being optional, may not seem realistic in practice.

That said, I acknowledge the PEP’s intention to enhance documentation accessibility. I was particularly intrigued by the discussion on type aliases, considering how they could potentially streamline documentation at the beginning of files, albeit at the expense of adjacency to its symbol (which was after all coined as a benefit in this PEP).

I align with @erictraut’s suggestion that formalizing docstrings could capture the PEP’s benefits without compromising code clarity. Despite some resistance to microsyntaxes within the PEP, I believe refining docstring practices could address current gaps for various stakeholders, including editors, developers, and documentation specialists, without necessitating documentation placement adjacent to symbol declarations, which could ultimately complicate the codebase.

Maybe offtopic

Although I don’t have any notable credentials that would lend credibility to such an effort, I would definitely be open to help out. There are many more benefits in having a formalized docstring convention (although the discussions on implementation details will probably be… interesting ). I have collapsed this because it is more for a different topic

huonw · May 21, 2024, 2:02am

(Reviving this old thread! Thanks for tolerating the necro.)

Is there an example for how enum values might be handled?

I note that the PEP explicitly considers class variables as dataclass-style declarations, where there’s usually a type annotation to be wrapped in Annotated. However, enums also use class variables for their values, and these typically don’t have individual type annotations, so it seems like there’s nowhere to annotate. From reading the PEP and this thread, this appears to not have been explicitly considered yet.

Example: we have a code-base that uses the string-under-definition style with enums and seems to work well with IDEs and doc-generators:

import enum
class Fruit(enum.Enum):
    banana = enum.auto()
    "actually a berry"
    
    tomato = enum.auto()
    "really is"

My editor will then preview the Fruit.banana docs when relevant.

irgolic · May 27, 2024, 6:21pm

I think using Doc in function signatures does not look so nice, but I believe it has a good use case in defining schemas with python types.

In this example, a union of literals is defined, wherein each literal group has a different doc.

github.com

asynchronous-flows/asyncflows/blob/c1ea94ab34f48e5ed01f955f1575e5817deaafa6/asyncflows/models/config/model.py#L7-L58


      
          ModelType = (
              # ollama models
              Annotated[
                  Literal[
                      "ollama/llama3",
                      "ollama/llama3:8b",
                      "ollama/llama3:70b",
                      "ollama/gemma",
                      "ollama/gemma:2b",
                      "ollama/gemma:7b",
                      "ollama/mixtral",
                      "ollama/mixtral:8x7b",
                      "ollama/mixtral:8x22b",
                  ],
                  Doc(
                      "Run inference on [Ollama](https://ollama.com/); defaults `api_base` to `localhost:11434`"
                  ),
              ]
              |
              # openai models

This file has been truncated. show original

lod · June 20, 2024, 4:14am

Jumping on this because I feel it’s a great example of why sphinx’s #: syntax is useful and should be expanded to function definitions. (And formally adopted).

def send(
	self,
	request: PreparedRequest,
	stream: bool = False,  #: Pass through content for large requests
	timeout: Optional[Union[float, Tuple[float, float]]] = None,  #: seconds
	verify: Union[bool, str] = True,  #: ssl certificates, path or disable
	cert: Optional[Union[str, Tuple[str, str]]] = None,  #: client side certificate
	proxies: Optional[Mapping[str, str]] = None,  #: Proxy server map
) -> Response:

The comments don’t interfere with the existing code, no negative impacts to readability or flow. They also already have editor and viewer highlighting support, as shown here.

I believe that most function argument documentation is relatively simple and can be mapped into one line. When that isn’t the case then users should fall back to using verbose docstrings - just as they do now.

pawamoy · June 20, 2024, 11:27am

That wouldn’t address the need for docstrings to be available at runtime though.

DanCardin · July 10, 2024, 11:55pm

I dont necessarily love/prefer Doc as the recommended way of defining documentation in general. What I do think is worthwhile, is that it provides a standard location for that information to live in the first place. And the type system is the only preexisting location that would work for both function attributes and class attributes.

One could imagine “attribute docstrings” being something that python itself (or else a library) formalizes and preprocesses into Doc Annotations automatically (seems like as good a place as any, if this were officially accepted). Ditto docstrings. And if/when that happens, it’s a lot easier for a tool that wants to introspect that information, than needing to manually traverse each option individually.

pradyunsg · July 15, 2024, 7:51am

What’s the next steps here?

I see that this PEP has not been updated in 7 months now (according to peps/peps/pep-0727.rst at main · python/peps · GitHub) and there doesn’t seem to be any movement in the discussion here (arguments made earlier in the thread are being repeated and a few opinions/omitted details are being mentioned but not provoking additional design discussion for the PEP).

It seems to me that this is in a position where the SC can be asked to make a decision on accepting/rejecting this PEP.

@tiangolo @Jelle Is there something that needs to happen before the submission of this PEP to the SC?

Jelle · July 15, 2024, 2:15pm

I think the discussion in this thread makes it clear that the PEP will not be accepted, and therefore I’d recommend that @tiangolo mark this PEP as withdrawn.