Explicit parameter list in function documentation

EpicWink · May 30, 2022, 10:35pm

I prefer the parameters of a function (with their documentation) to be listed explicitly in the function’s documentation. This is in contrast with man-style, which describes the positionals in prose.

Compare open (what I call man-style) with numpy.linalg.norm (what I say employs an explicit list).

I think it’s more than a preference however: I think the available parameters are easier to identify, and typing is obvious. It also reduces the mixing of parameters with side-effects and further function details (for better or worse).

This doesn’t make sense for all functions however, for example listing each parameter for pow would be unnecessarily verbose.

Ideally this could apply too the function’s docstring as well.

guido · May 31, 2022, 4:53am

I think we borrowed the implicit style from Emacs docstrings. I’m happy to officially declare that we’re past that. But updating 1000s of functions and methods in the library docs will take time…

steven.daprano · May 31, 2022, 10:02am

I don’t know what “man style” means to you. To me, I think of examples like man man which starts off with a synopsis:

SYNOPSIS
   man  [-C  file] [-d] [-D] [--warnings[=warnings]] ...

and then later on goes to list and describe each option

General options
   -C file, --config-file=file
          Use this user configuration file rather than the default of ~/.manpath.

I can’t reconcile your comments to what I see in the examples you give. You seem to be using “man style” to mean that parameters aren’t listed explicitly, but to me it looks like man pages do list parameters (options) explicitly.

When I look at the two examples you give, the open built-in and numpy’s linalg.norm, it isn’t clear to me which one you prefer. Both examples list their parameters explicitly:

 open(file, mode='r', buffering=- 1, encoding=None, errors=None, newline=None, closefd=True, opener=None)

 linalg.norm(x, ord=None, axis=None, keepdims=False)

and then both go on to list and describe each parameter explicitly, just as man pages do (well, at least some of them) and so I have no idea which of those two examples you prefer, or why. Sorry.

Looking at your third example, the builtin pow, just confuses me even more. You say:

“listing each parameter for pow would be unnecessarily verbose.”

but pow takes only three parameters! If listing a mere three parameters is too verbose, then how would you describe a list of open’s eight parameters?

erlendaasland · May 31, 2022, 11:02am

You can also use man to read the docs for stuff like system calls (man section 2, for example man 2 open) and library functions (man section 3, for example man 3 printf). I believe Laurie was thinking of those, but I may be mistaken.

storchaka · May 31, 2022, 11:17am

As well as Steven I am confused and do not understand what the OP means.

malemburg · May 31, 2022, 11:35am

I believe the OP is talking about explaining the parameters of a function inline (as we do now in the docs, e.g. the open function) vs. presenting the parameters as a definition list (as is done in the numpy docs, e.g. numpy norm function).

Both have their pros and cons.

IMO, the parameter definition list is better in cases where you have more than just a few parameters and they are less dependent on each other, whereas the inline version is more readable for functions with few parameters or cases, where parameters are used in groups and thus context is required to understand the individual parameters of such groups.

Of course, you also do both, use a definition list and add extra paragraphs explaining the group context of parameter groups.

EpicWink · May 31, 2022, 1:14pm

Yes this. Sorry for not being clear and making assumptions in my original post.

By man-style, I mean documentation (of both C functions and shell commands) where the non-flag parameters (ie positional) are described in prose just after the signature. This is also recommended by click.

reverse X [--repeat N]

Output the reverse of the string X

OPTIONS

--repeat N   repeat the output N times

To list explicitly means to have effectively a bullet-list of all (positional and keyword etc) parameters under the signature, with a list item for each parameter (including parameter typing and description).

reverse(X, repeat=1)

Reverse a string

Args:
   X (string): string to reverse
   repeat (int): repeat the reverse string multiple times

Returns:
   string: reversed string

I disagree, I think inline style is more relevant where it doesn’t make sense to individually describe the parameters. For example, Compare the following with pow’s actual documentation:

pow(base, exp)

Exponentiate a number

Args:
   base: number to exponentiate
   exp: power to exponentiate to

Returns:
   resultant value

steve.dower · May 31, 2022, 3:38pm

Is this meant to be a negative example (that is, showing that “Return base to the power exp” (source) is more sensible than the six lines of text you showed above)? I hope so, but if not, you wouldn’t be the first to suggest requiring blind adherence to such a style.

Personally, I’m a big fan of being able to document functions in whatever way makes the most sense. When it makes the most sense to define each parameter in its own paragraph (in an indented list or otherwise), we can do that, but when the function is simple enough there’s no need to follow any particular rule.

So even with all the clarifications, I’m still not sure what the request is. If there are functions with documentation that you believe would be clearer with parameter names at the start of each paragraph describing them, rather than somewhere in line, feel free to submit suggestions on improving those docs.

We love documentation improvements in the form of suggested improvements to the documentation! We’re always a little skeptical of documentation improvements in the form of mandating a writing or formatting style.

steven.daprano · June 1, 2022, 1:05am

+1 to this.

Doc improvements that actually improve the documentation are great.
Forcing a documentation standard just for the sake of consistency is
not.

CAM-Gerlach · June 12, 2022, 8:44am

I don’t think anyone’s arguing for a rigid style required everywhere as opposed to applying a more structured format with reasonable discretion, and implemented incrementally by contributors motivated to do the work, not mandated by fiat.

However, presenting the key API information in the Python standard library reference in a more consistent, structured and easily-retrievable manner can have a huge cumulative impact on the productivity and quality of life for the ≈millions of developers around the world who consult it regularly (related: python/docs-community#50).

Personally, and at least anecdotally speaking to a variety of beginners and pros, having to constantly hunt through paragraphs of prose for parameter info and infer or outright guess at key details left implicit or missing (param types, return values, etc) is one biggest usability barriers with the Python Standard Library reference, particularly in contrast to the standard structured formats used by nearly all other API reference docs for third-party Python libraries.

Adopting a consistent, easily-scannable structure for the stdlib function/method/class reference (whether Numpydoc, Google, Sphinxdoc, etc. ^[1]; or something simpler/more flexible) would both make it easier for readers to quickly and painlessly find and parse the critical details (function inputs and outputs, i.e. param and return types and values), and aid authors in ensuring they clear and explicitly state them in the first place.

Take an example, the API reference for subprocess.run or subprocess.Popen. The former is over a full page, and the latter six pages of interleaved prose paragraphs, warnings, notes, “new/changed in”, “availability” and more, without much clear organization. Adopting a consistent high-level structure would allow readers to quickly navigate to a given parameter, determine which of these elements apply to which, and check the type and other details at a glance (many of which are currently not explicitly stated at all).

This also fits with the evidence-based guidance of @DanieleProcida 's Diataxis framework that we’re working on moving towards on what makes effective Reference documentation (emphasis original):

Diataxis guidance

Reference guides are technical descriptions of the machinery and how to operate it. Reference material is information-oriented.

The only purpose of a reference guide is to describe, as succinctly as possible, and in an orderly way.

Reference material should be austere and to the point. One hardly reads reference material; one consults it. There should be no doubt or ambiguity in reference; it should be wholly authoritative.

You’ll expect to find information about these sorts of things presented in much the same way for each one.

Reference material benefits from consistency. Be consistent, in structure, language, terminology, tone.

To note, even under specific formats like Numpydoc, the existing prose paragraphs as well as other elements can be retained as now under each parameter, or moved above or below the params/return type sections, however desired ↩︎

guido · June 12, 2022, 3:09pm

In my experience the pandas and numpy docs are much more usable than the current Python docs.

ezio-melotti · June 12, 2022, 8:47pm

+1

Currently we use italic for arguments, and that doesn’t really stand out much. This problem is clearly visible in the two subprocess examples linked by @CAM-Gerlach above.

If we used something like bold or code, it would be easier to find them while skimming through the docs. However, since arguments don’t use a specific role, changing the way *...* is rendered will also affect italic text that is not an argument, so this solution can’t probably be adopted unless we rewrite the markup for all the arguments.

Using a bullet list is another simple solution that could make the args more identifiable. Also note that open/subprocess.run/Popen are a bit of an exception, since they have several args that need somewhat lengthy descriptions, and while those can be improved, I’m not suggesting that we start using bullet lists or similar solutions for other cases too.

EpicWink · June 13, 2022, 12:01am

My suggestion for this is to describe these complex parameters twice, once in the way they are now, and in a bullet-list (or similar) in one of two sentences, which quickly identifies the parameter, and possibly refers to the prose for more information.

Yes

I think listing parameters usually helps, but there are certainly exceptions (eg most maths functions, where parameter names are meaningless).

I realise that documenting arguments which are to satisfy protocols (objects with specific attributes) is currently difficult with typing. Typing supports protocols well enough, but the documentation generation may struggle in bullet-lists

CAM-Gerlach · June 13, 2022, 4:17am

And if we do that, we may as make the more meaningful improvement of organizing them in a structure that’s easier for users consulting the reference documentation to quickly parse at a glance, with the key details present without having to trawl through paragraphs of text.

Yeah, subprocess.Popen is a fairly exceptional case (open being another, as you mention), and not the best choice on my part when I should have gone for a more typical example. Exceptional cases like these can and likely should be handled specially, with e.g. as you suggest, a bulleted list of param names, types and a short summary, each linked to subheading (or other construct) that explains each in more detail. in fact, this is exactly how it is handled for e.g. the highly complex methods in the argparse module, such as add_arguments.

Right; likewise, for non-standard (though prevalent in some areas of the stdlib) cases like these on the other end of the spectrum, that don’t have true named parameters, or otherwise depart from normal Python-level syntax/semantics, they can and should be kept as they are.

CAM-Gerlach · June 13, 2022, 4:21am

A more typical example, picked somewhat arbitrarily from the docs I had open, is, say, the logging.Formatter class constructor, which has a medium number of parameters (5). At present, the parameters are buried in the prose, and one isn’t even mentioned there at all; users have to read through Changed in version 3.8 annotation at the end to even find it. Furthermore, it contains a lot of duplicate information split between the section introduction and class object itself.

Revised to use standard Sphinxdoc fields (since several other callables in the document use it, its built into regular Sphinx, is designed and intended for this purpose, and allows easily enhancing/changing the layout, styling, index etc. in one place without touching the content), I find it much more suited to its intended purpose—a quickly-accessed, succinct, complete and unambiguous reference to the constructor, communicating the same information (and more) in a much more useful way: ^[1]

The logging module docs also a great example of the present lack of even basic consistency in how parameters are presented:

LogRecord constructor and setLogRecordFactory use Sphinxdoc
logging.basicConfig uses a table
The Formatter constructor and Logger.debug use prose

They’re a mix of constructors, functions and methods, adjacent in the same document, each having between a half dozen and a dozen parameters, with no clear motivation for the potporri other than whatever the individual contributor happened to use at the time. Documenting and implementing a consistent, tailored structure for this information makes it much easier for readers to quickly recognize and reference without the cognitive burden of processing three different representations, and avoids doc contributors having to manually guess at how they should ideally represent it.

The current styling isn’t ideal; its a little dense and could use a little more whitespace, but that’s easy to tweak in the stylesheet/theme—see, for example, how Lutra currently looks for another constructor in this module ↩︎

encukou · June 13, 2022, 1:11pm

Perhaps, but I wouldn’t mind a docs style guide that documented good defaults and best practices, like PEP-8 does for code style. (Or used to, before people started treating the auto-enforceable suggestions as law.)

If I’d be to use this I’d ask questions like:

When is it good to have an explisit parameters list?
(When) should I add a Returns section?
Should New/Changed in notes be with individual parameters, or the whole function?
Should class parameters be documented under __init__, or the class itself?
Should there be type hints? Where?
What to put in the docstring?

I’d say – let’s start adding “Numpy-style” parameter lists where they make most sense, but also start a style guide so we can be consistent where it’s best to be consistent.

ezio-melotti · June 13, 2022, 6:16pm

There’s already this:

https://devguide.python.org/documenting/#style-guide

monk-time · June 14, 2022, 3:55pm

Wow, IMO the page on the right of that image is a significant improvement over the original!

CAM-Gerlach · June 24, 2022, 4:49am

Sorry for the late reply—this fell through the cracks.

Yes, indeed—it could be part of our existing Style Guide, as @ezio-melotti mentioned (especially since it should get its own page soon, following the reorganization). The focus would be on giving doc writers an easy to follow structure and helpful guidelines for writing better API Reference documentation, rather than prescriptively mandating absolute, inflexible rules. Some additional experimentation, discussion and real-world proofs of concept will be useful to help inform this.

To that end, I propose opening a PR with the changes previewed above, so others can better render, view and review it for themselves.

My proposed responses to the question follow, for the sake of having something to start with:

When is it good to have an explisit parameters list?

Ideally, it makes sense whenever a non-trivial stdlib function has parameters that are not positional-only, *args/**kwargs, self, etc., follow the standard Python semantics (for example, not cases like the pow function that @EpicWink describes above) and are not otherwise a special case for some reason.

At least for Sphinxdoc, there’s no fixed overhead (just :param NAME: and go), so it makes the guideline simple and consistent to apply, while leaving room for flexibility. Even with only one or a few parameters, a consistent, structured format still not only helps readers find key information faster than reading a paragraph, but also helps writers provide that information, clearly, concisely and explicitly in a manner well-suited to Reference docs.

That said, given it will be applied incrementally over time to existing functions, the initial focus will be on those with relatively many parameters where the net value is the highest. Once we’ve improved those, coupled with other Diataxis-inspired enhancements, we can then focus on functions with fewer parameters.

Likewise, ideally consistently yes, aside from any special cases. At least for Sphinxdoc, since the structure is handled automatically and the types are auto-linked, there is little overhead to writing

:return: A list of matches.
:rtype: list[str]

instead of

It returns a :class:`list` of :class:`str` containing the matches.

whereas the result is more concise, more explicit and easier to navigate and reference. It could be omitted for functions that return None, so long as it is always done consistently, but as all it requires is writing :return: None, is more explicit for readers (particularly beginners, given how often they seem to make the mistake of thinking methods that mutate objects in place return the object) and makes it clear that the return type wasn’t simply implicitly omitted, it still would seem worth doing if we’re revising the function/section anyway.

But likewise, since the improvement will be incremental and alongside adding params and other enhancements, the initial focus will be on functions with many parameters or that are otherwise complex, leaving the simpler cases for later.

If the note is specific to one parameter, ideally with the parameter, for maximum locality and visibility and ease of reference for those reading it. However, due to a Sphinx issue, adding it there causes inconsistent line breaks/spacing, so we should either fix that in Sphinx or with CSS hacks in the theme before we do that, so I didn’t include that in my example for now.

Directly under the class appears to be the standard convention for Sphinxdoc and Numpydoc, and fits with the fact that generally in the stdlib reference, no separate __init__ is shown; the class constructor is already documented directly under the class, so this makes things simpler for both readers and writers and avoids churn. But if an existing module does already document the __init__ and the class itself separately, then that can just be kept for now; while consistency (especially within a module) is nice, I don’t see it worth changing existing content over given that the benefit to readers is much less clear.

I consider type hints (per say), at least in the signature, out of scope here since it will likely be controversial and not everyone is familiar with them. However, the parameter and return types should be included via the standard Sphinxdoc :type: and :rtype: fields, since this is often otherwise left implicit and up to the reader to guess at.

Preferably, the types should be expressed using the standard type annotation syntax if practicable, e.g. float, list[str] or int | None (since it is precise, unambiguous and Sphinx understands it and can automatically link the appropriate types and constructs), but if there isn’t a precise known type, it would be overly complex or impractical to express, or the author simply isn’t familiar with how to do so, it could simply be described in informal language in the field.

I don’t really have a strong or well-informed opinion on this, since CPython is the first project I’ve worked on that duplicates API references in both the docstrings and separate documentation, and I’m honestly not super-familiar with the current practice in this regard. My offhand impression from looking at a number of them a while ago was that the docstrings were somewhat neglected compared with the separate docs, but I’m not sure if that’s still true (if it ever was) and what the current policy and practice is on this, so I’d defer to others on that.

CAM-Gerlach · June 24, 2022, 4:50am

In other news, I drafted another example for sqlite3.connect(), in response to @erlendaasland pointing out some serious issues with the current structure on a recent PR: