Explicit parameter list in function documentation

I don’t think anyone’s arguing for a rigid style required everywhere as opposed to applying a more structured format with reasonable discretion, and implemented incrementally by contributors motivated to do the work, not mandated by fiat.

However, presenting the key API information in the Python standard library reference in a more consistent, structured and easily-retrievable manner can have a huge cumulative impact on the productivity and quality of life for the ≈millions of developers around the world who consult it regularly (related: python/docs-community#50).

Personally, and at least anecdotally speaking to a variety of beginners and pros, having to constantly hunt through paragraphs of prose for parameter info and infer or outright guess at key details left implicit or missing (param types, return values, etc) is one biggest usability barriers with the Python Standard Library reference, particularly in contrast to the standard structured formats used by nearly all other API reference docs for third-party Python libraries.

Adopting a consistent, easily-scannable structure for the stdlib function/method/class reference (whether Numpydoc, Google, Sphinxdoc, etc. [1]; or something simpler/more flexible) would both make it easier for readers to quickly and painlessly find and parse the critical details (function inputs and outputs, i.e. param and return types and values), and aid authors in ensuring they clear and explicitly state them in the first place.

Take an example, the API reference for subprocess.run or subprocess.Popen. The former is over a full page, and the latter six pages of interleaved prose paragraphs, warnings, notes, “new/changed in”, “availability” and more, without much clear organization. Adopting a consistent high-level structure would allow readers to quickly navigate to a given parameter, determine which of these elements apply to which, and check the type and other details at a glance (many of which are currently not explicitly stated at all).

This also fits with the evidence-based guidance of @DanieleProcida 's Diataxis framework that we’re working on moving towards on what makes effective Reference documentation (emphasis original):

Diataxis guidance

Reference guides are technical descriptions of the machinery and how to operate it. Reference material is information-oriented.

The only purpose of a reference guide is to describe, as succinctly as possible, and in an orderly way.

Reference material should be austere and to the point. One hardly reads reference material; one consults it. There should be no doubt or ambiguity in reference; it should be wholly authoritative.

You’ll expect to find information about these sorts of things presented in much the same way for each one.

Reference material benefits from consistency. Be consistent, in structure, language, terminology, tone.


  1. To note, even under specific formats like Numpydoc, the existing prose paragraphs as well as other elements can be retained as now under each parameter, or moved above or below the params/return type sections, however desired ↩︎

7 Likes

In my experience the pandas and numpy docs are much more usable than the current Python docs.

2 Likes

+1

Currently we use italic for arguments, and that doesn’t really stand out much. This problem is clearly visible in the two subprocess examples linked by @CAM-Gerlach above.

If we used something like bold or code, it would be easier to find them while skimming through the docs. However, since arguments don’t use a specific role, changing the way *...* is rendered will also affect italic text that is not an argument, so this solution can’t probably be adopted unless we rewrite the markup for all the arguments.

Using a bullet list is another simple solution that could make the args more identifiable. Also note that open/subprocess.run/Popen are a bit of an exception, since they have several args that need somewhat lengthy descriptions, and while those can be improved, I’m not suggesting that we start using bullet lists or similar solutions for other cases too.

My suggestion for this is to describe these complex parameters twice, once in the way they are now, and in a bullet-list (or similar) in one of two sentences, which quickly identifies the parameter, and possibly refers to the prose for more information.


Yes


I think listing parameters usually helps, but there are certainly exceptions (eg most maths functions, where parameter names are meaningless).


I realise that documenting arguments which are to satisfy protocols (objects with specific attributes) is currently difficult with typing. Typing supports protocols well enough, but the documentation generation may struggle in bullet-lists

2 Likes

And if we do that, we may as make the more meaningful improvement of organizing them in a structure that’s easier for users consulting the reference documentation to quickly parse at a glance, with the key details present without having to trawl through paragraphs of text.

Yeah, subprocess.Popen is a fairly exceptional case (open being another, as you mention), and not the best choice on my part when I should have gone for a more typical example. Exceptional cases like these can and likely should be handled specially, with e.g. as you suggest, a bulleted list of param names, types and a short summary, each linked to subheading (or other construct) that explains each in more detail. in fact, this is exactly how it is handled for e.g. the highly complex methods in the argparse module, such as add_arguments.

Right; likewise, for non-standard (though prevalent in some areas of the stdlib) cases like these on the other end of the spectrum, that don’t have true named parameters, or otherwise depart from normal Python-level syntax/semantics, they can and should be kept as they are.

A more typical example, picked somewhat arbitrarily from the docs I had open, is, say, the logging.Formatter class constructor, which has a medium number of parameters (5). At present, the parameters are buried in the prose, and one isn’t even mentioned there at all; users have to read through Changed in version 3.8 annotation at the end to even find it. Furthermore, it contains a lot of duplicate information split between the section introduction and class object itself.

Revised to use standard Sphinxdoc fields (since several other callables in the document use it, its built into regular Sphinx, is designed and intended for this purpose, and allows easily enhancing/changing the layout, styling, index etc. in one place without touching the content), I find it much more suited to its intended purpose—a quickly-accessed, succinct, complete and unambiguous reference to the constructor, communicating the same information (and more) in a much more useful way: [1]

The logging module docs also a great example of the present lack of even basic consistency in how parameters are presented:

They’re a mix of constructors, functions and methods, adjacent in the same document, each having between a half dozen and a dozen parameters, with no clear motivation for the potporri other than whatever the individual contributor happened to use at the time. Documenting and implementing a consistent, tailored structure for this information makes it much easier for readers to quickly recognize and reference without the cognitive burden of processing three different representations, and avoids doc contributors having to manually guess at how they should ideally represent it.


  1. The current styling isn’t ideal; its a little dense and could use a little more whitespace, but that’s easy to tweak in the stylesheet/theme—see, for example, how Lutra currently looks for another constructor in this module ↩︎

8 Likes

Perhaps, but I wouldn’t mind a docs style guide that documented good defaults and best practices, like PEP-8 does for code style. (Or used to, before people started treating the auto-enforceable suggestions as law.)

If I’d be to use this I’d ask questions like:

  • When is it good to have an explisit parameters list?
  • (When) should I add a Returns section?
  • Should New/Changed in notes be with individual parameters, or the whole function?
  • Should class parameters be documented under __init__, or the class itself?
  • Should there be type hints? Where?
  • What to put in the docstring?

I’d say – let’s start adding “Numpy-style” parameter lists where they make most sense, but also start a style guide so we can be consistent where it’s best to be consistent.

4 Likes

There’s already this:

https://devguide.python.org/documenting/#style-guide

3 Likes

Wow, IMO the page on the right of that image is a significant improvement over the original!

2 Likes

Sorry for the late reply—this fell through the cracks.

Yes, indeed—it could be part of our existing Style Guide, as @ezio-melotti mentioned (especially since it should get its own page soon, following the reorganization). The focus would be on giving doc writers an easy to follow structure and helpful guidelines for writing better API Reference documentation, rather than prescriptively mandating absolute, inflexible rules. Some additional experimentation, discussion and real-world proofs of concept will be useful to help inform this.

To that end, I propose opening a PR with the changes previewed above, so others can better render, view and review it for themselves.

My proposed responses to the question follow, for the sake of having something to start with:

When is it good to have an explisit parameters list?

Ideally, it makes sense whenever a non-trivial stdlib function has parameters that are not positional-only, *args/**kwargs, self, etc., follow the standard Python semantics (for example, not cases like the pow function that @EpicWink describes above) and are not otherwise a special case for some reason.

At least for Sphinxdoc, there’s no fixed overhead (just :param NAME: and go), so it makes the guideline simple and consistent to apply, while leaving room for flexibility. Even with only one or a few parameters, a consistent, structured format still not only helps readers find key information faster than reading a paragraph, but also helps writers provide that information, clearly, concisely and explicitly in a manner well-suited to Reference docs.

That said, given it will be applied incrementally over time to existing functions, the initial focus will be on those with relatively many parameters where the net value is the highest. Once we’ve improved those, coupled with other Diataxis-inspired enhancements, we can then focus on functions with fewer parameters.

Likewise, ideally consistently yes, aside from any special cases. At least for Sphinxdoc, since the structure is handled automatically and the types are auto-linked, there is little overhead to writing

:return: A list of matches.
:rtype: list[str]

instead of

It returns a :class:`list` of :class:`str` containing the matches.

whereas the result is more concise, more explicit and easier to navigate and reference. It could be omitted for functions that return None, so long as it is always done consistently, but as all it requires is writing :return: None, is more explicit for readers (particularly beginners, given how often they seem to make the mistake of thinking methods that mutate objects in place return the object) and makes it clear that the return type wasn’t simply implicitly omitted, it still would seem worth doing if we’re revising the function/section anyway.

But likewise, since the improvement will be incremental and alongside adding params and other enhancements, the initial focus will be on functions with many parameters or that are otherwise complex, leaving the simpler cases for later.

If the note is specific to one parameter, ideally with the parameter, for maximum locality and visibility and ease of reference for those reading it. However, due to a Sphinx issue, adding it there causes inconsistent line breaks/spacing, so we should either fix that in Sphinx or with CSS hacks in the theme before we do that, so I didn’t include that in my example for now.

Directly under the class appears to be the standard convention for Sphinxdoc and Numpydoc, and fits with the fact that generally in the stdlib reference, no separate __init__ is shown; the class constructor is already documented directly under the class, so this makes things simpler for both readers and writers and avoids churn. But if an existing module does already document the __init__ and the class itself separately, then that can just be kept for now; while consistency (especially within a module) is nice, I don’t see it worth changing existing content over given that the benefit to readers is much less clear.

I consider type hints (per say), at least in the signature, out of scope here since it will likely be controversial and not everyone is familiar with them. However, the parameter and return types should be included via the standard Sphinxdoc :type: and :rtype: fields, since this is often otherwise left implicit and up to the reader to guess at.

Preferably, the types should be expressed using the standard type annotation syntax if practicable, e.g. float, list[str] or int | None (since it is precise, unambiguous and Sphinx understands it and can automatically link the appropriate types and constructs), but if there isn’t a precise known type, it would be overly complex or impractical to express, or the author simply isn’t familiar with how to do so, it could simply be described in informal language in the field.

I don’t really have a strong or well-informed opinion on this, since CPython is the first project I’ve worked on that duplicates API references in both the docstrings and separate documentation, and I’m honestly not super-familiar with the current practice in this regard. My offhand impression from looking at a number of them a while ago was that the docstrings were somewhat neglected compared with the separate docs, but I’m not sure if that’s still true (if it ever was) and what the current policy and practice is on this, so I’d defer to others on that.

2 Likes

In other news, I drafted another example for sqlite3.connect(), in response to @erlendaasland pointing out some serious issues with the current structure on a recent PR:

8 Likes

Wow, that’s a great improvement. Real world examples really helps visualising potential improvement.

7 Likes

I agree wholeheartedly! We have type annotations for years already. Other ways of conveying types could be very confusing.

Could we consider to show the types without round brackets so the displayed form is more aligned with the real syntax?

These I would still show inside the round brackets (or marked some other way) to communicate that it is not a precise type. This way some complex types could be shown by an approximate type annotation without causing a confusion.

2 Likes

I have reservations, in the case of more complex types. Type annotations can be very verbose and unreadable when describing something like “optional async generator taking an int and returning strings”.

Readability should always be the priority here, and type annotations, while readable for simple cases, can hinder that in complex ones.

2 Likes

I don’t think anyone intends to create a super rigid format here; a pragmatic approach is almost always useful. Quoting Steve, earlier in this thread:

If type annotations in the docs can make things clear and easily understood, apply them. If they end up being too verbose, thus creating (more) confusion, don’t apply them. Readability counts.

We all strive towards the same goal: improving the docs.

7 Likes

Since I got quoted… yes, reusing the type annotation syntax where it makes things clearer is fine by me. We can argue about whether one is “too complex” on a case-by-case basis.

Worth keeping in mind that most of our standard library was built around the idea of functions “doing what I mean” rather than being designed for concrete types (as it may have been designed in another language). So we widely use anti-typing-patterns throughout the stdlib, and won’t be redesigning them to become type-first APIs. I expect a significant amount of cases where the “correct” annotation is too complex to be readable or helpful, so planning and explicitly allowing a fallback (since some authors will appreciate the explicit allowance) now will make things smoother.

This approach (parentheses for “not a machine-readable type annotation”) seems perfectly fine to me. Bonus points if we can also (easily) show it in a different font.

3 Likes

Indeed, as I’ve emphasized throughout. The main balance here is between making the guidance simple, unambiguous and consistent to apply, such that we don’t have to analyze and debate every individual case :smiley: and the end product is coherent and useful to readers, while allowing ample flexibility for the classes of cases where it doesn’t make sense and those that are otherwise special/exceptional in some way.

I also am not advocating these become hard requirements or anything for merging docs PRs, as opposed to a basis for guidance, suggestions and improvements.

I figure most have read this, but just to include the full quote to make clear that we’re all on the same page here (emphasis added, with a slight wording tweak to simplify):

4 Likes

Some updates: @erlendaasland merged python/cpython#94629 which added the explicit parameter list suggested above, with further refinements by Erlend, to the docs for sqlite3.connect, which is live now.

Also, I’ve opened an issue, python/cpython#94700, and a PR, python/cpython#94701, to do the same for the logging.Formatter class, following the preview above, with the intent to continue this work for the other sufficiently complex functions in the logging module reference in the near future.

Additionally, I plan to also bring this up at the Python Docs Community meeting, which we’d welcome anyone interested to join. Thanks!

7 Likes

Together with CAM, I’ve now applied parameter list improvements to the following sqlite3 functions and class methods:

  • sqlite3.connect
  • sqlite3.Connection.backup
  • sqlite3.Connection.blobopen
  • sqlite3.Connection.create_aggregate
  • sqlite3.Connection.create_function
  • sqlite3.Connection.create_window_function

I’m very pleased with the results[1]. As I see it, these improvements align with several[2] of Diátaxis’s reference guidelines:

  • be accurate: the parameter format makes it very easy to provide accurate information regarding parameter types, return values, and exceptions raised.
  • be consistent: the format is given, which also implies I need to think less about phrasing and wording, thus resulting in improved consistency in structure and language.
  • do nothing but describe: a result of the above; I find it a lot easier word myself consistent and to the point, avoiding digressions and discussions.

While adding these improvements, I’ve now noticed that several sqlite3 class methods lack information about their return values and exceptions raised! I’m tempted to apply this format all across the sqlite3 module docs in order to a) force myself to document all return values and raised exceptions, b) be consistent :wink:


  1. I also hope others are ↩︎

  2. 3 == several if you count as an orc: “one… two… many…” :japanese_ogre: ↩︎

7 Likes

Indeed—filling in implicit gaps in the reference information provided is a big motivation for this change, as this structure really helps doc writers find and clarify them, as well as avoid them in the first place. Especially as someone who’s likely experienced with whatever you’re writing about, when writing free-form prose its easy to leave implicit critical details that are obvious to oneself but not so much to a beginner reading about a function for the first time, whereas a consistent structure ensures we think about and add these key points every time.

3 Likes