PEP 727: Documentation Metadata in Typing

Thansk for the feedback @mikeshardmind (also pre-discuss).

The problem with another Python syntax being supported would be that it would be quite invasive, it would “reserve” that syntax exclusively for this purpose, it could require implementation changes for Python or the types themselves (e.g. supporting __matmul__ in types), and that would make it non-backwards compatible.

The idea with this proposal is that it would be as little invasive as possible, and as optional/opt-in as possible as well.

Parsing docstring formats is non-trivial (more about it in a previous reply by @pawamoy). If in your libraries and use-cases you can afford to be tightly-coupled to sphinx, have it as a dependency and use it to extract the info, etc. that’s perfectly fine, and I would definitely not expect people in that position to adopt this, the problem is that it doesn’t apply to everyone else. For example, in FastAPI I can’t depend on Sphinx to extract parameter documentation (the same with Pydantic, Typer, etc).

I wouldn’t expect people that are already comfortable with their workflows to change or migrate. I consider it purely opt-in, as are for example TypeDicts, not everyone bothers to use them, but it’s there in the standard for those that want to use them.

Additionally, although yes, this is another way to spell the same thing, this would be the first one to be somewhat standardized, all the other docstring formats are microsyntax conventions for each specific tool that also require quite some learning. This would hopefully be fairly easy to learn, for those that want to adopt it, as it’s the same Python syntax.

Edit: I just realized I can reply/quote multiple things in a single post, so I’ll reply to the other things here.

Yep, if you consider the docstring independent of the parameters, then the verbosity of the parameters would increase, at least for those using this.

This would be beneficial mainly for people writing and maintaining docstrings that have to scroll back and forth when dealing with and documenting a specific parameter.

At the same time, if this feels more cumbersome to use than to maintain the docstring independently, even with the info duplication, etc. I wouldn’t expect it to be used in those teams and projects.


I particularly don’t expect existing codebases and projects that are comfortable with their setup to migrate to this.

But I would expect new tools and projects to find this a bit simpler to use than to learn and acquire the expertise and workflows needed to use previous tools based on different microsyntax formats.

Thanks @EpicWink!

I think the main problem with this is that it would imply that finding a raw str in Annotated would now have a specific pre-defined meaning. By having a specific object (class), tools would be able to use raw strings there as they want (if they have been doing that), which they currently could.

1 Like

Long before this I had contacted the teams behind VS Code and PyCharm, they had expressed their concerns about verbosity, but also that they would consider implementing support for this only if it was a PEP.

I just emailed them, it would be great to hear their opinions.

I wouldn’t expect people that are comfortable with their current docstring formats to migrate. By the point they are fully using them, they probably already learned all the microsyntax details, adapted their workflows, etc. They might be more productive with their current system (each of them).

Nevertheless, when I’ve had to deal with and maintain/update docstrings for parameters I’ve always struggled, in particular when there are a lot of parameters and documentation, as it can end up pages below. For teams struggling with this, it could be interesting to have it.

Funny enough, writing a migration tool for each microsyntax would imply a parser for that syntax. Although writing valid code for this PEP would be much straightforward (as it’s standard Python). And for the same reason, it would be much more straightforward to migrate away from this proposal than to migrate away from each microsyntax.

I agree this would be quite useful, but I marked this use case as explicitly not required for conformant implementers, suspecting the implementation would be more difficult. But if editors and tooling thing this would be acceptable to be required, I would change it (e.g. @pawamoy’s Mkdocstrings, Sphinx, VS Code, PyCharm).

Standards are rarely actually opt-in in the real world. The problem becomes “well, this became the standard, and now tooling is supporting this method, so we have to use this method to use modern tooling” This is fairly invasive even if the syntax was good and there was consensus on it being good, as it pushes the ecosystem towards churn on docs. I don’t believe this to be an improvement in the proposed form (or in the off-handed alternative I presented as an example of alternative means when asking about alternatives being considered) as it’s been shown above to get extremely verbose while interspersing code and non-code elements in ways that may not be quick and easy to review.

2 Likes

This is a good point.

I had stronger opinions about having a function in previous versions when it was also expected to be used as a decorator, but that’s not the case anymore.

My main concern was that once this goes into stdlib it would be pretty much “set in stone”, and I considered a function would be more flexible, but again, most arguments are quite weak now.

My main reasons for a function were:

  • It’s easier to tweak the implementation afterwards, but this makes more sense for things like typing.deprecated() that needs to do more things. I am not finding any imaginary future scenario where doc() could need to do something else than to return the class.
  • It’s easier to change the type annotations of a function than those of a class, I had to change from classes to functions in FastAPI a long time ago for this, to override the return to Any for some things. Nevertheless, this would only affect here if Annotated acquired any additional semantics about the possible return types of the things inside. So, again, a weak argument.
  • I considered it would be easier to extend doc() with multiple parameters, but this applied before when it had a **kwargs, now it would be equivalently “problematic” to add more fields than to extend the DocInfo class.

@Jelle does any of these cases seem potentially useful/valid for you in the future? If not, then I think we could just have a pure class and that’s it.


In that case, removing doc(), I would rename the class to Doc or to doc. Do you have (or anyone else has) opinions about this?

Thanks for the PEP!

I’m curious, what are ways other (modern) programming languages deal with documentation metadata. Maybe it’s worth adding a comparison?

4 Likes

That’s the main point to have something additional to a raw string. Currently a raw string is valid to mean anything. Having a function/class dedicated to this would affirm “this string is actually intended for docs”.

Thanks @baggiponte for the warm comments!

Yeah, the problem with that is that it would require changing Annotated, it’s semantics and its meaning. By not changing anything but just using what is already there this would be less invasive.

Now, specifically about params in Annotated, I imagine you refer to something like Annotated[str, doc="the name of the user"], right? That would mean keyword arguments in generics, and it’s not supported, although there have been some discussions about that, but that’s a much larger topic than this.

Indeed, signatures would be longer for those adopting this. But ideally, library authors would only adopt this if they are okay with the extra verbosity.

I think the point of view would also depend on if it comes from someone maintaining the docstring or only the signature, as this (potentially) improves things mainly for people maintaining docstrings.

1 Like

I wouldn’t expect/intend for this to be strictly recommended. I would consider it as opt-in, as with TypedDict, it’s already there, but not everyone is forced to use it (and not many do, but that’s fine).

Agreed, I think this would handle that concern.

I agree that the version without Annotated is much cleaner. Nevertheless, the version without it also doesn’t have any docstring, I think it would be a bit more fair to compare the version with a docstring.

For functions that don’t need a docstring, I wouldn’t expect them to have doc() either.

At the same time, I wouldn’t expect this to be considered the only way to do things or the exclusively recommended approach, the same way I wouldn’t consider unittest the recommended library for testing. But it would be just one possible way to do it, and for now, the only way that doesn’t require an external tool and a microsyntax that requires learning and adapting.

I wouldn’t expect this to be heavily adopted by existing libraries and codebases that are already okay with their current internal docstring formats and workflows. But I would expect this to be a bit easier to learn and adopt for new codebases.

And I would hope this would be an easier approach to maintain docs for parameters than with the current docstring microformats.

I would only expect teams and codebases that are struggling with those issues to migrate to this, not everyone.

Maybe it would be useful to document that this is not the only way to document parameters, and maybe to show more the other options, not just by names but also with links?

Ah! I didn’t intend to discourage argument documentation in functions docstrings, only to show an alternative that I would consider more convenient in some cases, with the drawback of verbosity. But maybe I could paraphrase some of the content to not make it sound as strictly discouraging documenting parameters in docstrings if that’s how it’s sounding.

Which section of text do you consider would be directly discouraging documenting parameters in docstrings? And what alternative text would you think would be more aligned with that intent?

IMO, while arg: type1 | type2 = defaultval is often a documentation improvement, arg: Annotated[type1 | type2, "docstring"] = defaultval is bloat. I would not use this. If I need more than the function name and protocol to document what it does, I write docstrings.

Somewhat off-topic discussion of how "optional" this will be:

black doesn’t have a PEP, but its mere existence and standardishness has led people to submit “helpful” PRs where they completely reformat my projects[1], or auto-format files they touch and turn 1-5 line PRs into messes where the semantic changes are buried.

If there’s a PEP saying “this is the way to document arguments”, I guarantee somebody will write a tool that parses numpydoc or similar docstrings and converts them to this format. And then someone will batch run it on every project that they vaguely interact with and submit PRs that I’ll have to reject, saying “This project doesn’t adhere to PEP 727.”.

As an outsider, I’ve seen maintainers of projects that don’t want to move their configuration into pyproject.toml, which is not a standard, get brigaded and accused of being rude, when the only alternative to rudeness is to respond with sufficient length to the most recent person to make a demand of you.

I’m at +/-0 on this practice, but -1 on this PEP.


  1. I’m generally okay with black, and use it in a number of projects ↩︎

14 Likes

Now, specifically about params in Annotated , I imagine you refer to something like Annotated[str, doc="the name of the user"] , right? That would mean keyword arguments in generics, and it’s not supported, although there have been some discussions about that, but that’s a much larger topic than this.

Of course, makes sense.

This change would imply I either add a doc inside Annotate or a default parameter (e.g. with Depends(...) in FastAPI?

I wouldn’t think this would be the exclusively recommended way to document arguments (I just commented more on that in reply to Inada).

Maybe I could rephrase some of the text to make more explicit that this is not the only way, and that it’s fine for projects to adopt a different pre-existing docstring microformat. Is there some way I could say that that you would feel is acceptable?

Now, nevertheless, although I agree it’s definitely more verbose, I would consider it much easier to maintain documentation for parameters this way than in docstrings, further away from the function signature. I’ve had to deal with long signatures and their docs in docstrings, far, far away from the actual signature, and I have to scroll back and forth.

I think that if writing and maintaining a docstring is considered a different job than writing the code itself, or in some way isolated from it, then this would affect negatively those in charge of only writing signatures and internal code and not docstrings. But for people writing and maintaining docstrings, I would hope this would be a net positive, despite the extra verbosity.

I wouldn’t think that people wanting to write simple one-liners and would bother much writing docs for them, in docstrings or in Annotated, so I wouldn’t think this would affect them.

It’s true that if you don’t care much about the docstring, then having the docs intermixed with the signature would get in the way. Now, about having some sort of syntax highlighting, I haven’t seen many (or any?) use cases where there’s no syntax highlighting available, not recently (in the past 5 or 10 years).

Do you think it’s very common for people to have to deal with code without any syntax highlighting? In which use cases? Maybe I’m missing something or not having in mind some scenarios.

For this reason multiline strings are allowed, the same as with docstrings.

I wouldn’t consider it harder to notice a typo in Annotated than in a docstring, but I guess that depends on taste.

How would you write the same function including a docstring that documents all the parameters? Wouldn’t the body of the function be dominated by a multiline string as well?

I would consider @pawamoy’s example fairly readable.

I agree I wish there was a better way to achieve all this, but dealing with docstrings for parameters in multiple formats in different projects, and always struggling with the same problems, I would be much happier dealing with the verbosity here than with what I had to deal in those cases.

I would consider this much better than current approaches, at least from the point of view of someone writing and maintaining docstrings.

But then again, how would you imagine a better way that could be achievable in Python? What would be the ideal in your point of view?

1 Like

You’re right. This wouldn’t (and shouldn’t) be accepted before it’s proven to be useful.

Having it in typing_extensions will help a lot to figure out if this is actually useful or not, while we continue the conversation here. But I had to start here to see if the parties involved would be interested (and benefited from this).

For now, I know that it would be adopted at least by FastAPI, Typer, SQLModel, Asyncer (from my side), Pydantic, Strawberry, and others (from other teams). And it would be supported by tools like Mkdocstrings, maybe others.

I would like to hear the opinions from Sphinx, VS Code, PyCharm, Jedi, Jupyter (I’ve contacted them all).

This is definitely true. And the same as with type annotations, it has to be marked as strictly optional and opt-in. Is there something in the text that you think could be changed (and how) to bring this point across better?


A good thing is that we can use my projects (FastAPI, etc.) as a testbed, and see if this has any real value. Probably others can try it as well, and we can see how the perception of this is.

But yeah, as this had never been done before, we don’t have a way to know how it feels to work with it, and if it improves or damages efficiency, developer experience, workflows, etc.

This belies the problem right here. you value your use being easier over what other projects are already using. You keep calling this opt-in, but at the same time, you say that there is tooling and IDEs that are ready to jump at implementing whatever is standardized. This isn’t opt-in, no matter how much you claim it is, people familiar with the situation with typing who only adopted it due to IDE completions can attest as much.

The 3 most common standardized docstring styles are sphinx, numpy, and google. If the problem you are having is that these aren’t formalized enough, wouldn’t it be less friction to existing projects and not disrupt people already using those styles to reach out to each of these projects to more rigidly standardize the parameter section of their docstrings to the point where a formalized parsing is easy?

And as was pointed out prior to this discussion thread being made, other people have found it possible to parse all three of these for parameter docs: https://github.com/Rapptz/discord.py/blob/f74eb14d722aa1bc90f9d0478199250d2eb4e81b/discord/app_commands/commands.py#L168-L186

7 Likes

I would actually disagree. I don’t consider unittest the recommended tool, I would think it’s pytest. I don’t think type annotations are enforced, they are opt-in and there are code bases that don’t use them. Not everyone using dicts is creating TypeDicts.

But having a simple and centralized way to achieve that could be useful.

I think something that plays a lot in this is how important documentation is for a codebase. If reviewing docstrings is important, then having them with their parameters would be useful.

Having the docstrings below, far away from their respective signatures, would also be difficult to review. I would argue that even more difficult, as you wouldn’t have a way to quickly check if the docstrings for lastname is indeed using the right parameter, instead of last_name. But if there’s one single place where that information, the name of the parameter, is located, then the problem just doesn’t exist.

I think you saw @pawamoy’s example of documenting the long function. How would you document it using docstrings? I would consider it to be at least as complex to review as the version with Annotated.

I strongly disagree about this being hard to review. Parameter names are rarely, if ever actually changed and only need to be kept in sync when they are. If you’re worried about types ending up out of sync, for frequently reused and updated types, you can just use type aliases.

    async def schedule_event(
        self: Self,
        *,
        dispatch_name: str,
        dispatch_time: str,
        dispatch_zone: str,
        guild_id: int | None = None,
        user_id: int | None = None,
        dispatch_extra: object | None = None,
    ) -> str:
        """
        Schedule something to be emitted later.

        Parameters
        ----------
        dispatch_name: str
            The event name to dispatch under.
            You may drop all events dispatching to the same name
            (such as when removing a feature built ontop of this)
        dispatch_time: str
            A time string matching the format "%Y-%m-%d %H:%M" (eg. "2023-01-23 13:15")
        dispatch_zone: str
            The name of the zone for the event.
            - Use `UTC` for absolute things scheduled by machines for machines
            - Use the name of the zone (eg. US/Eastern) for things scheduled by
              humans for machines to do for humans later
        guild_id: int | None
            Optionally, an associated guild_id.
            This can be used with dispatch_name as a means of querying events
            or to drop all scheduled events for a guild.
        user_id: int | None
            Optionally, an associated user_id.
            This can be used with dispatch_name as a means of querying events
            or to drop all scheduled events for a user.
        dispatch_extra: object | None
            Optionally, Extra data to attach to dispatch.
            This may be any object serializable by msgspec.msgpack.encode
            where the result is round-trip decodable with
            msgspec.msgpack.decode(..., strict=True)

        Returns
        -------
        str
            A uuid for the task, used for unique cancelation.
        """

This is an example from a public hobby project involving correct asynchronous scheduling of future serialized tasks including proper timezone support. Tossing all of this information for users next to the types is not useful to a developer. The documentation of public APIs generally isn’t for the person maintaining the function, but the one using it.

6 Likes

I think that people who use typing have a very incomplete view of just how “optional” typing is. And I think this PEP will be similar. The social pressure on projects to adopt these features is very significant - particularly as IDEs like PyCharm and VS Code use the information they provide to make the experience of the consumer of the library so much better. But that improved consumer experience has a cost on the developer experience of creating the library, and speaking as a developer, being pressured to add typing (and in the future, parameter documentation) can be very demotivating, as the request is often worded to give the impression that “your project isn’t good enough” :slightly_frowning_face:

This is exactly what I mean, and I’m not at all sure it’s off-topic. An awfol lot of the responses from @tiangolo boil down to “don’t use it if you don’t have the problems it addresses”. And I wish it was that simple, but it really isn’t.

So experiment as a 3rd party library. Find out what works. Don’t standardise things until you know.

If VS Code and PyCharm won’t support this unless it’s a standard, then good. Let’s see if it works as a feature for other use cases, and if it doesn’t, then we have valuable information about how useful the proposal is. If it does, then it can be standardised on that basis, and PyCharm/VS Code can support it then. Or if it’s sufficiently popular, they can reconsider their “only if it’s a standard” position. You mentioned dataclasses, as a standard version of attrs. So why can’t this feature be the attrs of argument documentation? If it works, tools will special-case support for it and it can then be standardised. If it doesn’t, we lose nothing.

Not relevant. There’s no PEP defining unittest as the Python testing approach.

I was pushed to add type annotations for editables even though I had no use for them, or interest in them. My views have changed over time, but I had to put time and effort into saying “no” which I’d have preferred to avoid.

You’re putting a lot of emphasis on runtime accessible documentation. For most of my use cases, human-readable documentation is what is key. Whether that’s docstrings, hand-written documentation, or whatever, machine-readability is almost never a key requirement.

As a plain text description. Why is it so hard to accept that many people prefer human-written text, over machine-generated content stitched together from a collection of machine-readable parts?

15 Likes

Thanks. I don’t really foresee a case where using a class would be problematic in the future. If we want to add new parameters later (for e.g. the version of the library when the parameter was added, or something like that), we can simply add new constructor parameters and new attributes to the class.

I think I mildly prefer Doc over doc as the name of the class, because usually things that you use only in annotations have uppercase names, but I don’t feel strongly. As precedent, we can see that GitHub - annotated-types/annotated-types: Reusable constraint types to use with typing.Annotated uses camel case for the names of objects that get put in Annotated.

3 Likes

I like the general idea but am also not keen on the verbosity, and it feels a bit strange this comes from typing.

Here’s an example diff converting from Google-style to PEP 727-style:

Especially for codebases using Black, I find it much less readable compared to a compact docstring. And then we still have the docstring for the intro and return values, now separated from the parameters.

In one larger example, we went from this:

def naturaltime(
    value: dt.datetime | dt.timedelta | float,
    future: bool = False,
    months: bool = True,
    minimum_unit: str = "seconds",
    when: dt.datetime | None = None,
) -> str:
    """Return a natural representation of a time in a resolution that makes sense.
    This is more or less compatible with Django's `naturaltime` filter.
    Args:
        value (datetime.datetime, datetime.timedelta, int or float): A `datetime`, a
            `timedelta`, or a number of seconds.
        future (bool): Ignored for `datetime`s and `timedelta`s, where the tense is
            always figured out based on the current time. For integers and floats, the
            return value will be past tense by default, unless future is `True`.
        months (bool): If `True`, then a number of months (based on 30.5 days) will be
            used for fuzziness between years.
        minimum_unit (str): The lowest unit that can be used.
        when (datetime.datetime): Point in time relative to which _value_ is
            interpreted.  Defaults to the current time in the local timezone.
    Returns:
        str: A natural representation of the input in a resolution that makes sense.
    """

To this:

def naturaltime(
    value: Annotated[
        dt.datetime | dt.timedelta | float,
        doc("A `datetime`, a `timedelta`, or a number of seconds."),
    ],
    future: Annotated[
        bool,
        doc(
            "Ignored for `datetime`s and `timedelta`s, where the tense is always "
            "figured out based on the current  time. For integers and floats, the "
            "return value will be past tense by default, unless future is `True`."
        ),
    ] = False,
    months: Annotated[
        bool,
        doc(
            "If `True`, then a number of months (based on 30.5 days) "
            "will be used for fuzziness between years."
        ),
    ] = True,
    minimum_unit: Annotated[str, doc("The lowest unit that can be used.")] = "seconds",
    when: Annotated[
        dt.datetime | None,
        doc(
            "Point in time relative to which _value_ is interpreted. "
            "Defaults to the current time in the local timezone."
        ),
    ] = None,
) -> str:
    """Return a natural representation of a time in a resolution that makes sense.
    This is more or less compatible with Django's `naturaltime` filter.

    Returns:
        str: A natural representation of the input in a resolution that makes sense.
    """

Here’s mkdoctrings parsing the original Google-style docstrings: Time - humanize

I feel it may be better to try and standardise (with adjustments/improvements where necessary) one of the existing docstring-based methods. (PEP 722 comes to mind, which is proposing to standardise a comment-based technique.)

8 Likes

Thanks for writing the proposal, Sebastián.

I share some concerns about the utility of this mechanism over existing docstring pseudo-standards. (I’m reminded of this xkcd cartoon.) Those concerns have already been well articulated by others in this thread, so I won’t repeat them. Instead, I’ll focus my response on the technical aspects of the draft proposal.

Confusion about what the doc string is documenting
In places, the text of the PEP seems to be confused about what the doc string is actually documenting. I think it should be made clear that a doc string annotation has meaning only when used in conjunction with the declaration of a symbol, and it documents the intended meaning and use of that symbol. There’s already a well-defined way to provide docstrings for classes, functions and modules. This PEP is proposing a standardized way to provide doc strings for other types of symbols: parameters, type aliases, class-scoped variables (class vars and instance vars), and local variables. Such documentation is useful because it can be presented to users when they hover over identifiers within their code. It can also be associated with that symbol for runtime introspection.

The current draft starts to become confusing in the “Additional Scenarios” section. Under “Type Alias”, for example, it says that to: Username “is equivalent to” to: Annotated[str, doc(…)]. That doesn’t make sense because the doc string for the type alias Username documents the symbol Username, which is a type alias. This documentation should logically describe that type alias and how it’s intended to be used. This documentation should not apply to the parameter to, which is a completely different symbol. The to parameter should have its own documentation that is distinct from the documentation for the Username type alias.

Likewise, the section titled “Annotating Type Parameters” (which is technically talking about “type arguments”, not “type parameters”) makes no sense because the doc string isn’t associated with any symbol here and therefore has no meaning, nor would it have any utility in an editor or for runtime introspection. If the intent is to provide documentation for the to parameter, then the doc string should be specified as to: Annotated[list[str], doc(…)].

The same confusion applies to the “Annotating Unions” and “Nested Annotated” sections.

I recommend that the spec clarify up front what the doc string is documenting, as I propose above. If this clarification is added, then many of these “Additional Scenarios” sections can be deleted because they have no valid meaning.

Calling the doc function
The draft PEP mentions that doc accepts a single argument called documentation. That implies that the argument can be specified either positionally or by keyword. Is there a motivation for not making this positional-only? The reason I ask is that doc string processing will require special-case handling within tooling, and if you allow developers to pass it either by position or keyword, then it increases the complexity of implementing this special-case handling. If there’s good motivation for supporting a keyword argument, then that’s fine. But if there’s no good motivation, then it would be preferable to make it positional-only.

Markdown
The draft spec intentionally avoids taking a stand on whether the documentation should be interpreted as markdown. I think that’s a problem. The main purpose of this PEP is to provide a way for documentation to be viewed by developers within their editors, but it’s not clear how an editor should display the resulting documentation. Should it interpret the string as markdown or not? I think the PEP should take a stand here and say that doc strings will be interpreted as markdown. If a tool vendor, for some reason, doesn’t want to parse the markdown, they can choose to display it as raw text. After all, markdown is designed to be readable as raw text. If the PEP remains noncommittal about whether the doc string should be interpreted as markdown, then developers may start using raw text strings that happen to be formatted poorly when interpreted as markdown. We run into that problem today for traditional docstrings.

ParamSpec captures
When a function’s signature is captured by a ParamSpec, I presume that any doc strings associated with the parameters should be retained. I don’t know if this needs to be spelled out in the PEP, but perhaps it should be. Or perhaps it should just be a recommended implementation.

Language comparison
As others have already suggested, I think it would be useful to include a short section that provides a survey of other modern programming languages and how they deal with the problem of documenting parameters (and potentially type aliases and class-scoped variables).


Eric Traut
Contributor to pyright & pylance

11 Likes