Should the stdlib provide an API to examine function/class field descriptions in docstrings?

pawamoy · September 14, 2023, 2:49pm

I very much agree with @brettcannon here.

Before even thinking about providing APIs to parse different docstring formats into structured data, these format must at least be properly specified (they are not), and at best support documenting the same things (they do not).

Many tools have implemented parsers for Google-style or Numpydoc-style docstrings: it’s not that difficult. The issue is that they all probably differ a bit (because these styles have no specs), and that the structured data you get out of them are not the same. It means that documentation generators and other tools must both parse these different formats and render these different formats differently. This is super cumbersome. Adding new docstring formats without standardizing the data itself will just make these tools’ job more complicated.

I tried to address the second point in Griffe by declaring data classes that are used by both parsers (Google/Numpydoc), like a common denominator of structured data. The documentation renderer (mkdocstrings-python) can then declare templates for each of these classes, without even knowing if they come from Google or Numpydoc docstrings.

It works, but I had to drop some features of Numpydoc (See also, Warnings, References, because they are just markup), add some of them to the Google-style (named returned/yielded/received values, Methods), and add some to both (Functions, Classes, Modules, generic admonitions). As long as only a subset of all styles’ features have a common ground regarding data, it will be difficult to maintain or evolve.

So IMO the absolute first thing to do before creating new docstring formats (even if they’re already based on a data-friendly declarative syntax like TOML ) is to standardize the data itself.

Here is the data that Griffe currently handles:

regular text sections: plain markup, like Markdown, rST, Asciidoc, etc.
parameters sections: a list of parameters, each with a name, type, description (markup) and default value
other parameters sections: same thing, for keyword arguments, without default values
raises sections: list of exceptions raised by a function/method/property, each with a description
warns sections: list of warnings emitted by a function/method/property, each with a description
returns sections: list of returned values (think tuples), each with an optional name, a type, and a description
yields sections: list of yielded values (again, tuples) for iterators/generators, each with an optional name, a type, and a description
receives sections: list of received values (again, tuples) for generators, each with an optional name, a type, and a description

…as well as summary sections, like attributes, functions/methods, classes, and modules:

attributes have a name, a type, a description, and an optional value
functions/methods have a signature (can be just their name) and a description
classes have a signature (can be just their name) and a description
modules have a name and a description

…as well as admonition-like sections, such as examples, notes, warnings, deprecations, and any other generic kind that uses the syntax of the chosen style (tip, danger, quote, see also, preview, you name it), because users don’t like to mix style syntax with markup syntax in their docstrings.

I was seduced by the idea behind PEP 727 because it moves the data out of the docstrings, so that docstrings can simply be written using the chosen markup (Markdown, rST, etc.), without mixing it with a particular docstring style. Here is an example of what it can accomplish: Examples - Griffe TypingDoc. No docstring style used here, so no docstring parsing required (the whole docstring is “parsed” as a single regular text section, and collected data is inserted/appended before/after it).