Adopting the Diátaxis framework for Python documentation

DanieleProcida · May 26, 2022, 9:36pm

Hi - I happened to stumble across this discussion, so here I am, a month late.

Of course a) I would be thrilled if Python were to adopt Diátaxis for its documentation, and b) I’d be just as delighted to help in some way with the actual work.

For example, I’d be very pleased to run a couple of training workshops, as I’ve done for various other projects. Just let me know.

At the risk of restarting a stale conversation that nobody wants to have warmed up, I’ll reply to some other remarks in this thread separately.

Regards,

Daniele

DanieleProcida · May 26, 2022, 9:49pm

It depends a bit how complex it is, and how entangled the how-to and reference material. Shuffling the content to separate them would be a start; at least now when the next person comes to add a paragraph describing some actions you’d take to achieve something, they’d find a clearer place to put it. And probably, eventually it would be good to have two pages.

DanieleProcida · May 26, 2022, 10:01pm

It’s interesting to read this, because I did much the same thing with the Django logging documentation about a year ago. There was one long Logging page, that mixed up reference, how-to and explanation.

Now, there are three:

a Logging overview (i.e. explanation)
Logging reference (as a new user, I can’t add any more links, sorry, but all the pages link to each other anyway)
How to configure and use logging

I did it bit by bit over series of pull requests. There’s always more that could be done there, but the important thing is that it’s improved.

ezio-melotti · May 30, 2022, 4:07am

One thing I noticed is that sometimes the distinction between the different pages is not obvious, and makes things more difficult to find and navigate.

As an example, lately I was looking at the pytest docs (that seem to have adopted Diátaxis framework), and in particular I was looking at fixtures and fixture parametrization. Fixtures have:

How to use fixtures — pytest documentation – one how to;
About fixtures — pytest documentation – one explanation;
Fixtures reference — pytest documentation – one fixture reference;
API Reference — pytest documentation – one mention in the API reference;

Whereas fixture parametrizations have:

How to parametrize fixtures and test functions — pytest documentation – one how-to;
How to use fixtures — pytest documentation – one section in the fixtures how-to;
API Reference — pytest documentation – one entry in the ref;

Even now that I’m writing this post, I’m finding new pages that contain information that I hadn’t met yet. Initially I didn’t even realize that there were different pages, and I was trying to ctrl+f something I was sure I had seen before but I couldn’t find it because it was on a different page.

Even now I’m not exactly sure where to find some of the information. For example the “Fixture reference” contains sections that I would expect to find in the “About fixtures” explanation. To make things even worse, Google returned results from the old version of the docs (e.g. pytest fixtures: explicit, modular, scalable — pytest documentation) giving the impression that there were even more pages (the one I just linked became the “explanation” page, but had a different title).

These problems might be specific to the pytest docs and possibly the fixture sections in particular, however I think we should pay attention to:

only split pages when it makes sense and avoid creating (up to) four pages for each topic;
make the fact that there are other related pages (and what they cover) more evident and consistent throughout the docs, possibly by using a custom Sphinx directive.

Unless the page is already too long, having different sections (explanation, tutorial, how-to, reference) within the same page might be a better option, since it keeps the distinction while making the page easy to navigate. This approach will also solve the second point, since the .. contents: directive can be used to clearly show the four sections at the top.

For cases where there are multiple pages, a different approach can be followed. For example, the logging docs do this nicely with the yellow box at the top of the page. However the logging howto doesn’t have a similar box or a link back to the docs. (In addition in some versions of the docs, the yellow box seems to be missing.)

erlendaasland · May 30, 2022, 7:18am

Instead of splitting each page in four, what about a lightweight version, where we create four (page)top-level headings for each of the Diàtaxis sections, but keep the single page? I’m mainly thinking about the stdlib docs now. It seems contra-productive to me to quadruple the number of stdlib doc pages.

This is one of the best things with having all the docs for a stdlib module in one page; I use this “trick” all the time.

Perhaps other parts of the docs (non-stdlib) would be more suited to a complete Diàtaxis split (multiple pages).

encukou · May 30, 2022, 2:49pm

Hi Daniele,
I think we could use some guidance on how to get this started – IMO the project still looks like a mountain of work without a clear way to get the first few PRs in.

Would you by any chance have time to join next week’s meeting?

guido · May 30, 2022, 3:41pm

Addressing the “mountain of work”, doesn’t this link (How to use Diátaxis - Diátaxis) that you yourself referenced in your OP address how to handle this? I found it very helpful to read.

EpicWink · May 30, 2022, 10:18pm

I don’t think the number of discs pages matters. As long as the well-defined heirarchy is used, navigation is straightforward. I can’t imagine the next/previous topic link is useful for navigating between higher-level topics (perhaps the source RST files could also be more structured).

In contrast, I think some of the pages are too long (multiprocessing, io, ctypes, importlib, and more), and I think they would be easier to navigate if split

DanieleProcida · May 30, 2022, 10:20pm

I think that before diving into how to get it started it would be worthwhile to understand how much agreement there is about the end goal, i.e. the top of this particular mountain. It’s not much fun going up mountains unless everyone in the party actually wants to go there.

I would happily join a meeting but unfortunately Monday evenings are one time of the week when I am not generally available. Let me see what I can do about the 6th though.

encukou · May 31, 2022, 8:50am

That’s OK. Voice is a nice bonus, but text should work fine :‍)

This does seem to contradict the the page Guido linked:

Although Diátaxis is intended to provide a big picture of documentation, don’t try to work on the big picture . It’s both unnecessary and unhelpful. Diátaxis is designed to guide small steps; keep taking small steps to arrive where you want to go.)

The guide also seems helpful to me when I read it, but practically, I get stuck at the “Assess it” step, where I tend to come up with possible big reorganizations rather than small changes.
(This does smell like one of those situations where having a coach stand by and say “yes, that’s the way, keep going” works unreasonably well – possibly Daniele’s training workshops do that?)

DanieleProcida · May 31, 2022, 12:12pm

What I mean is: I would like to understand how much agreement there is about the end goal of applying Diátaxis to the Python documentation. Unless there’s general consent about that, it’s not going to be a good or successful experience.

AA-Turner · May 31, 2022, 1:31pm

I don’t recall any active pushback in any of the various threads, although I can’t find a record of this having been posted to python-dev. There have been cautions though of ensuring that there isn’t too much churn, and that each docs page is still complete enough to find things (Ezio’s point above). These though seem more to do with how we approach the project and change management, rather than the philisophical point of Diátaxis as an approach.

A

vsajip · May 31, 2022, 4:16pm

The yellow box is at the bottom of the page, including on the 3.9 version. That’s because it’s assumed that people usually arrive at the HOWTO from a reference page.

barry · May 31, 2022, 4:47pm

I for one am +1. I’m familiar with the Diátaxis framework from work.

encukou · June 2, 2022, 9:42am

+1, IMO we do have general consensus.

guido · June 2, 2022, 5:36pm

It’s hard to argue against Diataxis as a north star, and I am fine with it. (In fact I like the distinctions it makes quite a bit.) But it seems whenever it comes up we end up having to explain that Diataxis itself advises against well-meaning but overly ambitious restructuring plan – such plans will cause a lot of unnecessary alarm amongst people who fear churn (myself among them).

So I’m not sure what we can do specifically now that we’re accepting Diataxis. I feel we need something concrete to strive for, but it should discourage churn. I’m not sure what that plan would be.

ezio-melotti · June 2, 2022, 8:29pm

I think a good first step would be to survey what we already have:

What “patterns” are we using? Different documents use different “patterns”, such as:
- introduction/overview → reference → examples (e.g. the csv module and the socket module)
- basic examples → reference (e.g. the json module)
- separate reference and how-to (e.g. the logging module)
Which of these “patterns” are closer to the Diataxis model? Which ones seem to work best? Can we recommend a specific pattern to follow consistently?
What documents can benefit from an update (or an overhaul)? What are the longest documents? Can we reorganize/split them following the Diataxis model?
Are there already open issues on the tracker about reorganizing certain docs? Which documents/modules have the most issues?
Are there other metrics that might give us some useful insights (e.g. number of page hits or StackOverflow questions)? Which documents should we prioritize?
Do the translation teams have any feedback about which documents can be improved, since they are going through all of them already?

The answers to these questions will help us identify some documents that can be improved/reorganized/split, establish some guidelines, priorities, and update paths that we can follow to make our docs more consistent and easier to navigate.

That said, I’ve been also thinking how examples fits in the Diataxis model, since I find them quite valuable and I think we should add more.

A good portion of our documentation falls into the “reference” quadrant, and About reference - Diátaxis seems to suggest that examples should be included there – at least when they focus on individual APIs. However we also have more complex examples that might fit better into an how-to section (or page).

Should we strive to add an “Examples” section to each module? Should examples about individual APIs be kept inline or grouped in this section, in order to make the reference more compact and easier to navigate?

I guess the answer might change depending on the individual documents, but I’m curious to hear if you have any feedback about this.

jeanas · June 2, 2022, 9:39pm

Well, here is the point of view of a documentation translator. Most stdlib modules are documented with just a reference-style document, which of course is necessary. I think HOWTOs / understanding become needed if the module is particularly large or complex, or contains language concepts (e.g. async I/O or abstract base classes). On the other hand, I would say many if not most would benefit from a tutorial-like intro. As an example, take the ast module docs, which I translated:

I think this is a perfect example of a page that should be restructured.

After two paragraphs of introduction with technical terms, it starts right away with a copy of the full Python grammar, which is hairy from the beginner point of view,
Then it gives a detailed description of every single AST node,
And only then, you find the information that you would have needed to start your journey with the module, namely the functions to parse an AST and manipulate it.

I would do it this way instead:

An “AST overview” section demonstrating usage of parse() and attribute access on AST nodes with examples at the REPL (perhaps also showing how pattern matching can be used elegantly on ASTs),
Then the descriptions of the individual nodes,
Finally, as an appendix, the Python grammar.

The first part roughly corresponds to a tutorial, the rest is reference. Again, for many modules, I don’t see a need for more, but from experience of walking through stdlib modules that I have no prior knowledge of to translate them, I do miss more tutorials.

Another example of a module that would benefit from a tutorial is the re module. There are already quite good docs:

the module reference:
re — Regular expression operations — Python 3.12.1 documentation
the regular expression HOWTO:

On the other hand, even on the HOWTO, you get to read some text that is a little heavy reading (at least for a primer on regexes) on metacharacters and stuff before learning how to execute a regular expression on a string. Here it might be beneficial to add a small tutorial with simple steps to get started with regexes.

An example that I consider good is the argparse docs. There are

a tutorial (actually in the HOWTO section):

the reference:

With the tutorial, you quickly get a basic grasp of how using argparse feels like, and it gradually introduces more advanced concepts. Perhaps some of the material from the reference could be split into a HOWTO (e.g., how to use subparsers).

So, bottom line: in my opinion, the stdlib docs need more beginner-level material to help people get a basic grasp of stdlib modules before going into the fussy details.

petersuter · June 2, 2022, 9:55pm

A user perspective: I use the Python documentation very often, and prefer it over other sources.

csv module
- I use this module sometimes, but not that often.
  I always have to look up the examples, because the API is not that easy to remember.
- The examples are a bit inconveniently placed, some in the middle of the page, some at the end.
  → The “simplest reading example” and the “simplest writing example” should be duplicated at the very top.
- The exotic example quotechar examples could be moved to the end.
- The fmtparams is never actually explained explicitly. Basically it seems one has to guess based on the examples. Adding an explanation and explicitly enumerated parameters would be great.
- The simple DictReader and DictWriter examples should also be duplicated at the very top, and in the examples section.
- The sidebar would be more helpful if the most important functions (reader, writer) and classes (DictReader, DictWriter) were listed explicitly.
json module
- I use this module very often.
  I rarely have to look up anything, because the API is easy to remember.
  The examples are conveniently placed at the top.
- Maybe the simple loads examples could be moved up a bit more even, to the very top.
logging module
- I use this module regularly, but only the basics.
- There are zero examples. That seems inconvenient.
- The “simple example” from the basic tutorial page should be duplicated to the very top on the main page.
socket module
- I remember using this module only infrequently.
- The documentation jumps directly into very technical low-level details without any examples or explanations. That seems quite daunting.
- It would benefit from a lot of inline examples.
re module
- I use this module very often.
  I have to look up certain details all the time.
  The documentation is quite nice, with inline examples.
- The main thing missing is an overview index with links to the different syntax elements, and to the most common functions (match, search, findall, compile, split, sub).

guido · June 2, 2022, 10:42pm

Excellent feedback!