Diátaxis and Python documentation

DanieleProcida · December 24, 2023, 6:27pm

I’d like to put forward a vision for the future of Python’s documentation, as published at https://docs.python.org, in the hope of building general consensus for two things:

an idea of what good means in documentation and its application to Python’s documentation in particular
ways of working that will bring about significant improvements more swiftly

I see a very strong opportunity to bring what I can do, to what I think Python’s documentation needs, and I would like to do more of that work in the next year.

In the past 18 months several key people in Python documentation have worked with me on Python documentation, and I believe it has given them confidence that what I am proposing is a good way forward.

However, in order to pursue this really effectively, I need to secure wider consensus for the direction and approach, so I need to outline that more clearly - which is what I want to do here.

Vision

I propose to apply the Diátaxis documentation approach to Python’s existing documentation.
Diátaxis is a methodology that identifies four essential and qualitatively distinct needs of documentation users, and from them derives four distinct kinds of documentation: tutorials, how-to guides, reference and explanation.

Diátaxis proposes that these four constitute a complete map of needs, and consequently also of documentation, and that through the application of the principles it describes, a documentation architecture will emerge that reflects these needs and their relationship.

The Diátaxis website describes all this in more detail and is probably the best place to go for a fuller understanding of this, but I am happy to answer questions here.

Approach

Diátaxis prescribes a bottom-up approach to documentation, including documentation transformation of the kind I am proposing. What that means is that its principles should be applied at low levels, of pages, paragraphs and sentences, that eventually this will bring about the architectural patterns it predicts.

The key principles of this approach are:

success in documentation means following rules, not plans
documentation work progresses best in small iterations
every act of documentation should represent a tangible improvement, however small
there is always a simple and small next action available to us in documentation work
the correct overall structure will inevitably emerge from the application of simple rules

I am well aware that this can be hard for people to swallow, and for those who have not experienced it in action it is a lot to ask for their consent to an approach that deliberately avoids top-down planning or clear pictures of what the end result will look like. So once again I am happy to answer any questions, but in the end it might be necessary for people to have the chance to see it for themselves.

The basics of this approach are described at Diátaxis as a guide to work.

What it mean in practice is a different kind of workflow for documentation, compared with what’s expected for code - for example, small, rapidly-processed pull requests that don’t represent the final result, but a step in its direction.

Some work and experience so far

Last year I ran some workshops for the PSF on Diátaxis, which were well attended.

notes from Part 1, 16 August 2022
notes from Part 2, 18 August 2022

In 2023 I turned that into more concrete progress; for example at EuroPython I took part in the documentation sprint and following that landed a number of documentation commits and helped others get their start in contributing to Python documentation.

Some of the documentation commits I made in the last year related to Turtle graphics show both progress towards a clearer arrangement of contents, and the iterative approach of small changes.

https://github.com/python/cpython/pull/107449 is a stalled pull request addressing an article in the Python HOWTOs section, that shows the need for securing better consensus for the direction I’d like the Python documentation to take.

I look forward to further conversations about this, and to being able to help bring about a transformation of Python’s documentation to a new standard of excellence.

jeanas · December 24, 2023, 7:24pm

Have you seen Adopting the Diátaxis framework for Python documentation?

mwichmann · December 24, 2023, 10:37pm

Daniele is absolutely familiar with that, as it was the precursor that led to the workshops listed above. I gather he’s trying to flush out whether there really is consensus in practice, not just in theory. The PR in question makes some interesting reading in that regard.

jeanas · December 24, 2023, 11:40pm

OK. Sorry, from the outsider perspective, the relationship with the previous “proposal to adopt Diátaxis” was not really clear to me.

nedbat · December 27, 2023, 1:11am

For people who are interested in moving the Python docs in the direction of the Diataxis approach: what is in your way? What can we do systemically to make it easier for you to update the docs?

BrenBarn · December 27, 2023, 6:54am

I’m not anyone whose opinion is super important here but I do have some thoughts based on your post and looking at that PR.

I guess first thing is that I do think there is considerable room for improvement in Python docs and I appreciate your overall mission to set a “new standard of excellence”.

That said, I don’t think just having consensus on Diataxis in principle would really resolve these issues. Overall I agree with @AlexWaygood’s comments on your commits there. It is fine to proceed with small incremental improvements, but many of your changes don’t really look like improvements to me.

Ironically, whether these changes are viewed as improvements seems to in fact depend on taking as given some of the higher-level precepts of Diataxis (e.g., the stuff about “this isn’t what a Howto is”). So in a sense when I look at these changes I see that they are actually being driven by a plan, namely a plan involving definitions of concepts like “howto” and then ideas about how to make docs conform to that plan.

Although I agree that incremental improvements can be made to the docs, I think this particular corner of the docs may be an awkward place to start. Some of the Howtos there are somewhat howto-ish, but others (like the functional programming one) are really more like “explanations” in the Diataxis sense. With regard to that I basically agree with Alex’s comment towards the end of the issue thread, that the first incremental improvement might be to just reclassify that particular page. Maybe make a separate heading of the docs (“Discursive essays on Python topics”? ) and move some of the Howtos there? As it is, I think transforming something like the functional programming Howto into a Diataxis-compliant Howto would leave almost nothing of the original document intact. So from an incrementalist perspective it makes sense to head for the nearest port (namely, “explanation”) and then touch up the document from there, rather than focusing on the word “Howto” in the title and trying to shoehorn the content into that Diataxis concept.

More broadly, I will say that I have mixed feelings about Diataxis. I think the four-way division of documentation topics is a useful breakdown. But in my (admittedly limited) experience, when I see people trying to apply Diataxis, I often get the sense that the goal of trying to apply Diataxis sometimes becomes a bit too important relative to the goal of just improving the documentation. I don’t see that there is any need to rigidly follow the Diataxis format or even to foreground its principles. To me it is more useful as food for thought that, when digested, can help better decisions be made.

Or, to say that a bit more strongly, I’m not sure I’d ever see it as valid to say “the documentation shouldn’t do X because it’s not in alignment with Diataxis principles”; it only makes sense to me to say “well, there are different options here and it’s unclear what to choose, but if we put on our Diataxis hat that suggests maybe we should try X”. The most important thing is always the actual effectiveness of the documentation at helping users do what they want to do and know what they want to know.

DanieleProcida · December 27, 2023, 10:51am

Brendan Barnwell:

I basically agree with Alex’s comment towards the end of the issue thread, that the first incremental improvement might be to just reclassify that particular page [as discussed in https://github.com/python/cpython/pull/107449]. Maybe make a separate heading of the docs (“Discursive essays on Python topics”? ) and move some of the Howtos there? As it is, I think transforming something like the functional programming Howto into a Diataxis-compliant Howto would leave almost nothing of the original document intact. So from an incrementalist perspective it makes sense to head for the nearest port (namely, “explanation”) and then touch up the document from there, rather than focusing on the word “Howto” in the title and trying to shoehorn the content into that Diataxis concept.

What you’re proposing is pretty much what I envisage. This document is not a how-to guide, and it makes no sense to make it become one. Ultimately it belongs with other explanation-type material in a separate section.

But, adding whole new categories to the documentation is a much higher-level architectural step. That will come later, in its own time.

The reason for not wanting to start by moving things around right now in the way you describe is that then we’d have for example an Explanation section, with one article in it, an article that has some problems.

I recommend incremental improvement, i.e. working in-place, at the level of words and sentences, and performing structural alteration only when it will make a clear improvement.

I definitely agree with some of that. The problem is that sometimes people think Diátaxis is a “format” or a “structure” - which you seem to be suggesting above. It’s not. But you’re right, people try to apply it top-down as a pattern, creating or applying the structure before they have paid attention to the content at a much lower level.

The principles are not about structure, but about user needs, and apply mostly to words, sentences and pages. I do think they should be applied rigorously.

BrenBarn · January 3, 2024, 5:02am

Given what you’ve said here I’m not clear on what exactly you’re seeking consensus for or how it related to your PR. I’m going to respond to bits of your post as well as your PR and kind of challenge some of your statements. I hope this doesn’t come off as nitpicky or adversarial, but I want to try to get a better understanding of your notions of “rules” and “improvements” a bit.

What you’re proposing is pretty much what I envisage. This document is not a how-to guide, and it makes no sense to make it become one.

Okay. . . but if it’s not a how-to guide and won’t become one, then why are you (in that PR) even mentioning how-to guides in your explanations of your changes? If it’s not and will never be a how-to guide, then I don’t see how “what a how-to guide is” is even relevant for this document, except possibly to change the name so it doesn’t say “how-to guide” and/or move it out of a section of the docs that says “how-to guide” at the top.

I don’t agree. Imagine two alternatives and their consequences:

Immediately create such an “Explanations” section and move this page to it, along with maybe some other pages from the Howto section that aren’t really Howtos. Consequence: We have a more logical relationship between the section names and their contents, and no change is required to the actual wording of the pages. (That can still be done later, to make them fit better into the section they have been moved into.)
Gradually make a bunch of edits to the document while leaving it in its current section (howtos). Consequence: The document remains in the section on howtos, even though we know it isn’t and will never be a howto. The incremental changes may make it less howto-like (since we know it can’t become a howto), thus making it diverge even further from the section’s ostensible purpose.

I don’t see how #2 is better.

I think performing a structural alteration now would be an improvement.

But, more generally, you talk about “improvement” at the level of “words and sentences”, yet, as I said before, for many of the changes in your PR, I don’t see what is supposed to be better, on the level of words and sentences, about the new version vs. the old.

For instance, in your opening comment on the PR, you said:

This patch removes comments about what the author thinks the reader is familiar with, and the first-person voice that sometimes appeared.

Why is removing comments about what the author thinks the reader is familiar with an improvement? Why is removing first-person voice an improvement?

In general, summaries of what something is going to discuss (and summaries of what has been discussed) add nothing of value to writing.

I don’t agree. Again, this doesn’t mean I’m right and you’re wrong! But my point is (as I mentioned in my previous post) that just saying “well, these are incremental improvements” (maybe at the level of words and sentences) doesn’t remove the need to show how they improve things. In my experience as a writer (of documentation and other things), a brief overview/summary at the beginning of a long article can be helpful. We can debate whether a particular such summary is good or bad or in between, but that’s neither here nor there with regard to Diataxis, howtos, or any other such conceptual concerns.

If they are just one person’s opinion, they don’t belong in the documentation. The documentation’s contract with the user is exactly that: that it is the authoritative pronouncement from on-high about programming in Python.^[1]

I’m more sympathetic to this view, but I don’t think it is something that needs to be applied rigidly. Using the diataxis terminology, I think this kind of “ex cathedra” voice is most important in reference documentation, and far less important in howtos or tutorials. This is partly because there should be only one reference (in the sense that the underlying behavior should have only one specification), but there may be many different ways to help someone learn something (tutorial), or show them how to achieve a goal (howto). So I don’t see a problem with having stuff in the documentation that has a bit of individual flair or opinion. The main point is that that opinionated standpoint should be clear to the reader — which is, again, why I think simply retitling and/or reorganizing these pages would be a better incremental improvement.

Again, though, it seems the assumption you’re operating on here is something like “opinions in documentation are bad” or even “first-person voice is bad” — but I don’t see that that has anything to do with Diataxis or howto-ness or anything like that.

Okay, so what are these principles? In your original post, you talked about Diataxis, and you talked about “rules”. What are the rules?

Again, the reason I ask this is because I’m having trouble understanding the relationship between your thread here and your PR (which you say represents the direction you’d like to see the documentation go). Your post here (and everything I see on the Diataxis website) is about conceptual categories like howtos vs. explanations and how those correspond to different user needs, which is all well and good. But in your PR, you seem to be drawing on a different, set of rules or principles (like the ones I mentioned above about avoiding first person), more fine-grained and specific, more akin to what I would think of as a “style guide”. Which may be well and good (although as I said I disagree about some of the details), but as far as I can see it has nothing to do with Diataxis.

So, basically, I don’t really understand what Diataxis has to do with these various specific points from your PR. As I said in my earlier post, if I were the one reviewing the PR, I would just say (for these contentious bits of the changes, not all ) “these are not improvements”, and the tenets or even existence of Diataxis would have no bearing on that. Can you explain why you think Diataxis, incremental improvements, or such concerns are contributing to why that PR is stalled (as opposed to just the individual changes not being acceptable on their own merits)? Can you give a fuller account of what these “style guide” rules are and how they relate to Diataxis? And, finally, which aspects of this do you think need consensus, and/or which aspects do you think already have consensus? (Again, sorry if this seems like an interrogation, but I’m just trying to be clear about what I’m unclear on. )

This is also from the PR ↩︎

adorilson · December 7, 2024, 10:01pm

Do we have a decision about the structure of it?

I mean… I know that the documentation has the (library) reference and how-to sections, but some modules have all Diataxis sections on a single page, like the sqlite3 module, which explicitly segment into tutorial, reference, and guides, but they are all co-located.

nedbat · December 8, 2024, 2:10am

There is not a decision about the one right way to adjust the structure. The latest thinking is that we will continue to keep the four needs and forms in mind as we rework documentation. In some places it makes sense to intermix them, in others they can be separated into larger sections.

Does that work for you?

Wombat · December 8, 2024, 5:19am

Can this be done is a separate repo, leaving the existing docs intact? There are an enormous number of external links into the official documentation. It would be a shame to break all those links. This strategy would also be kind to current active users who have invested in knowing their way around the current docs. Uninvested users can go to your new, rearranged docs and adapt as you make edits over time.

nedbat · December 8, 2024, 12:12pm

I don’t think making a second separate set of documentation is a good idea. First, it’s a huge effort. Second, it leaves users with the question of which is the real documentation. Third, it fragments and confuses the contributors: where should they put their effort in improving the docs?

Broken links are a real problem, but we have ways to keep them working. Either we leave multiple targets in the docs (see restore an anchor to for/else for example), or we write explicit redirects for pages that have moved.

Wombat · December 8, 2024, 3:19pm

Forking is not hard. Nor is it difficult to create a link to legacy docs.

The proposal is itself is the large effort and that will have a large effect. What if the volunteer abandons their effort part way through? Or if as edits occur that we decide that it isn’t an improvement.

A volunteer who teaches workshops on a documentation methodology wants to apply that to our documentation. At this point, we don’t even have an in miniature sample of what this would look like or a basis to form a consensus that it is desirable. Without that, we’re giving blanket authorization for someone to make an enormous number of hard to reverse edits based on some loose notion of “currrent docs bad, diataxis good”.

As far as I can tell, no one has really identified what problem is being solved or had made promises about what will be better.

Second, it leaves users with the question of which is the real documentation.

Mostly the proposal is to rearrange the current docs. Just like the Library Reference and the Global Module Index, the two can be on the same footing because the new and the old would have the same underlying content (a single source for both).

Third, it fragments and confuses the contributors: where should they put their effort in improving the docs?

That could be a real problem except that most of the detailed content won’t change. The description of the builtin pow function or defaultdict missing method will be the same. So, they could have a common source.

I think most of the edits currently being applied to docs are aimed at that common source used by both the new and old arrangement. If a contributor rewrites the str.split docs, that should be the same under the existing doc arrangement and under the diataxis arrangement.

It doesn’t sound like the OP intends to rewrite every paragraph in the existing docs which took decades of successive refinement to get them to where they are now.

nedbat · December 8, 2024, 4:06pm

Perhaps I don’t understand what you are proposing. Can you say more about how this would work? I thought you meant that we would fork the current docs, make changes, and then eventually decide we liked those changes, and make the new fork the official docs, replacing the existing docs. If I have that wrong, please clarify.

If I have that right, then the broken links and disoriented users would still happen, they would just happen at a later date? Wouldn’t gradual changes in the current official docs be better? We could adjust the plan as we hear from users.

There are been a number of discussions in the Docs Working Group about this approach, and a number of changes already made based on these principles. See the current sqlite3 docs section, which has been updated with these ideas in mind.

Wombat · December 8, 2024, 5:13pm

It sounds like the decision has already been made. Apologies, I mistook this for an open discussion on the merits.

Thanks for the link. Looking at the new and old versions. Personally I can’t see a substantive improvement other than the nice looking links at the top. Otherwise it just takes longer to get to the meat. Most of the underlying reference content is exactly the same as it was before (except the tutorial was rewritten). It doesn’t look like any of the known problems were solved (best practices on concurrent access, what all the flags and config options do, …)

Idea: Instead of churning the existing docs, could effort be expended applying the diataxis format to new docs that we’ve been needing?

The typing modules docs are a really difficult cold read. It is hard to get going with those as the sole guide.
Concurrent futures also lacks an introduction, tutorial, how-to guides, concept discussion, module overview, …
There is nothing on how to put a WASM build into productive use.
The new threading build also lacks an instruction manual. Everything in the threading documentation assumes that a reader has already mastered thread programming and concepts in some other language.
Asynio also needs better on-ramping.

For most of these topics, the current best resource is a RealPython article. Their articles cover the holes in our documentation and they tend to be of high quality, helping people get up to speed quickly. They are kind of gold standard.

mwichmann · December 8, 2024, 7:05pm

Zeke: it was - it you look back you’ll see someone woke up a year-old
thread with a fresh comment. A fair bit has happened in the intervening
12 months!

Wombat · December 8, 2024, 7:15pm

Rats! This was at the top of my feed and I thought it was new. Thanks for pointing out the huge time gap. Looks like I was late to the party and that a much less invasive view has evolved.

It does still look like there are unmet needs for on-ramping documentation for asynio, threads, concurrent futures, wasm builds and whatnot. The docs still have some major holes that need to be filled.

adorilson · December 8, 2024, 7:40pm

Sincerely, it doesn’t work fine.

Who and which criteria decide to intermix or separate the things?

The result is that the documentation (especially the Library Reference section) is messy and inconsistent. Each module has a format.

Let’s look at the sqlite3 module docs.

If someone says to me, “Look at the sqlite3 module how-to (or tutorial)”, I look it up on Python How-To section, but it “doesn’t” exist.

On the other hand, the explicit reference in the sqllite3 doc is excellent. It highlights “all that we have about this module.” Let’s do more of this (independently, if a single or multiple pages). Other modules, like argparse, have a board indicating a tutorial.

sqlite3.connect has a parameter highlight. It is beneficial, but just a few modules do that.

Some modules have a lot of practical examples, others nothing.

I believe we need some templates for docs, at least to strive for them, as well as the own Diataxis.

nedbat · December 8, 2024, 8:02pm

You are right, the docs don’t have one approach applied uniformly throughout the docs. There are a few reasons for that. The docs have grown over the course of 30 years, usually by edits made by core developers as they built the language and standard library. The sqlite3 pages are the most recently updated with a specific approach in mind. It’s a lot of work to apply that to the rest of the docs, so we need more contributors.

The most difficult reason might be that an approach that works in one place won’t work well in another place. So it’s hard to create rules for how to structure docs.

Finally, it’s hard to get agreement. You mentioned the “parameter highlight”, but I don’t think the pink sidebar helps: indented bullets are enough. The sidebar just takes up horizontal space that is in short supply on mobile devices.

But in any case, I’ll come back to a point I’ve made a few times already: any change of the scope we’re talking about here is a lot of work that takes many hands. Opinions are fine, but doing the work is what gets it done.

nedbat · December 8, 2024, 8:03pm

This is a great list of places that could use improvement. I hope we can turn it into action.