Broken references in Sphinx docs

Currently, Python documentation contains thousands of missing references – words that should link to a definition but don’t. You can list them by building with Sphinx’s -n (nitpick) option, e.g.:

$ make -C Doc clean
$ make -C Doc html SPHINXERRORHANDLING=-n
…
…/cpython/Doc/glossary.rst:144: WARNING: py:meth reference target not found: __await__
…/cpython/Doc/glossary.rst:153: WARNING: py:data reference target not found: sys.stdin.buffer
…/cpython/Doc/glossary.rst:153: WARNING: py:data reference target not found: sys.stdout.buffer
…/cpython/Doc/glossary.rst:232: WARNING: py:meth reference target not found: __enter__
…/cpython/Doc/glossary.rst:232: WARNING: py:meth reference target not found: __exit__
…

(I currently get 5021 of these, excluding /whatsnew/ & NEWS)

We don’t run with -n, so all these “nitpicks” go unnoticed.
Many (most?) are Sphinx technicalities (as above) that “just” don’t generate a hyperlink or give incorrect info in search (e.g. #96996).

But when a link isn’t generated but should, docs quality can suffer. For example, asynchronous context manager docs would be much more useful if it actually linked to __aenter__/__aexit__ docs.
When adding new docs, it’s very easy for the author and reviewer to miss that a link is missing.
Also, when something in CPython is moved, renamed or removed, dangling references to it can remain in the docs.

Should we fix this? And if so, how? Preferably without bothering contributors that don’t care much for Sphinx quirks? Should we have something like the Argument Clinic derby?

How can we limit new nitpick failures? I have a hacky proof of concept that generates GitHub Actions warnings on changed files only, do we want something like that in the CI?

10 Likes

We should be able to compile a list of known failures and run some automatic matching, though I recall in some cases introducing a reference failure was intentional (cross reference syntax to an undocumented C attribute is one I remember).

A

2 Likes

In general, adding a ! to a reference content causes it to not try to resolve, and instead only display with the appropriate formatting/semantics. This is useful for cases where the reference is not intended to resolve, such as those to objects that are deliberately undocumented, those that are removed (e.g. as noted in the What’s New), or examples (e.g. :class:`!AnExampleClass`). Versus some sort of manual whitelist, this doesn’t require any additional infra, is quick to add to existing usages, is more efficient as it doesn’t waste time trying to resolve those references and then printing a warning, and ensures those warnings aren’t emitted in the first place.

However, there seem to be some cases where, contrary to what Sphinx’s documentation seems to imply, ! doesn’t work as documented; specifically, when I tried to add it to a :c:func: in python/cpython#96016, it was treated as part of the content and Sphinx still attempted to resolve the link. Maybe it is something specific to the c domain and/or Sphinx’s legacy C domain syntax that we are still using until python/cpython#93738 is implemented, as discussed here? I can do some more comprehensive testing, but I figured I’d ask the expert first to see if this is a known limitation.

This definitely seems like a glitch/bug in Sphinx’s C domain to me – I would expect the same don’t-try-to-resolve-the-crossref behavior here.

FYI, as you might have seen, @AA-Turner investigated it and it indeed turned out to be (as I suspected it might) an edge case with the legacy C syntax config option being enabled, which we both applied a patch for in conf.py, Adam fixed in Sphinx 5.x and we also avoided by Adam updating all the code using the legacy C syntax to use the modern equivalent and disabling that now-unneeded option.

3 Likes

Please see gh-101100: Test docs in nit-picky mode by hugovk · Pull Request #102513 · python/cpython · GitHub to build on Petr’s proof of concept.

It’s just over a year since issue gh-101100 was opened, when we had 8,212 Sphinx reference warnings in the docs.

Since then, we’ve made a concerted effort to fix them: 118 PRs plus 196 backport PRs (linked to gh-101100) have fixed over half of them (52%), and we’re down to 3,951! :tada:

Looking just in the Doc/ directory, 56% are fixed. :broom: :books:

We also now check for Sphinx warnings on the CI (thanks to @encukou for the proof-of-concept in this thread), and don’t allow “cleaned” files to introduce new warnings, through an entry in a file called .nitignore (shout out to @CAM-Gerlach for the inspired name!).

When we created .nitignore, it listed 299 files with warnings. We’ve fixed 64% and are down to 108. :chart_with_downwards_trend:

Raw numbers
Tag Date All warnings % fixed Doc/ % fixed .nitignore entries % fixed
main/gh-101100 2023-01-17 8,212 0.00% 7,129 0.00% n/a n/a
3.12.0a5 2023-02-07 8,187 0.30% 7,445 -4.43% n/a n/a
3.12.0a6 2023-03-07 8,096 1.41% 7,450 -4.50% n/a n/a
3.12.0a7 2023-04-04 8,428 -2.63% 7,492 -5.09% 299 0.00%
3.12.0b1 2023-05-22 7,754 5.58% 6,651 6.71% 294 1.67%
3.12.0b2 2023-06-06 7,749 5.64% 6,638 6.89% 293 2.01%
3.12.0b3 2023-06-19 7,722 5.97% 6,607 7.32% 293 2.01%
3.12.0b4 2023-07-11 7,638 6.99% 6,520 8.54% 293 2.01%
(3.12.0rc1) main 2023-08-06 6,331 22.91% 5,212 26.89% 239 20.07%
(3.12.0rc2) main 2023-09-06 5,615 31.62% 4,587 35.66% 177 40.80%
(3.12.0rc3) main 2023-09-19 5,597 31.84% 4,573 35.85% 176 41.14%
(3.12.0) main 2023-10-02 5,499 33.04% 4,475 37.23% 166 44.48%
3.13.0a1 2023-10-13 5,483 33.23% 4,454 37.52% 165 44.82%
3.13.0a2 2023-11-22 5,325 35.16% 4,288 39.85% 155 48.16%
(3.12.1) main 2023-12-07 5,094 37.97% 4,058 43.08% 143 52.17%
3.13.0a3 2024-01-17 4,015 51.11% 3,237 54.59% 117 60.87%
main 2024-01-26 3,939 52.03% 3,162 55.65% 108 63.88%

Thanks to everyone for helping out, and keep up the good work!

17 Likes

Fantastic work by all.

4 Likes