Broken references in Sphinx docs

Currently, Python documentation contains thousands of missing references – words that should link to a definition but don’t. You can list them by building with Sphinx’s -n (nitpick) option, e.g.:

$ make -C Doc clean
$ make -C Doc html SPHINXERRORHANDLING=-n
…
…/cpython/Doc/glossary.rst:144: WARNING: py:meth reference target not found: __await__
…/cpython/Doc/glossary.rst:153: WARNING: py:data reference target not found: sys.stdin.buffer
…/cpython/Doc/glossary.rst:153: WARNING: py:data reference target not found: sys.stdout.buffer
…/cpython/Doc/glossary.rst:232: WARNING: py:meth reference target not found: __enter__
…/cpython/Doc/glossary.rst:232: WARNING: py:meth reference target not found: __exit__
…

(I currently get 5021 of these, excluding /whatsnew/ & NEWS)

We don’t run with -n, so all these “nitpicks” go unnoticed.
Many (most?) are Sphinx technicalities (as above) that “just” don’t generate a hyperlink or give incorrect info in search (e.g. #96996).

But when a link isn’t generated but should, docs quality can suffer. For example, asynchronous context manager docs would be much more useful if it actually linked to __aenter__/__aexit__ docs.
When adding new docs, it’s very easy for the author and reviewer to miss that a link is missing.
Also, when something in CPython is moved, renamed or removed, dangling references to it can remain in the docs.

Should we fix this? And if so, how? Preferably without bothering contributors that don’t care much for Sphinx quirks? Should we have something like the Argument Clinic derby?

How can we limit new nitpick failures? I have a hacky proof of concept that generates GitHub Actions warnings on changed files only, do we want something like that in the CI?

8 Likes

We should be able to compile a list of known failures and run some automatic matching, though I recall in some cases introducing a reference failure was intentional (cross reference syntax to an undocumented C attribute is one I remember).

A

2 Likes

In general, adding a ! to a reference content causes it to not try to resolve, and instead only display with the appropriate formatting/semantics. This is useful for cases where the reference is not intended to resolve, such as those to objects that are deliberately undocumented, those that are removed (e.g. as noted in the What’s New), or examples (e.g. :class:`!AnExampleClass`). Versus some sort of manual whitelist, this doesn’t require any additional infra, is quick to add to existing usages, is more efficient as it doesn’t waste time trying to resolve those references and then printing a warning, and ensures those warnings aren’t emitted in the first place.

However, there seem to be some cases where, contrary to what Sphinx’s documentation seems to imply, ! doesn’t work as documented; specifically, when I tried to add it to a :c:func: in python/cpython#96016, it was treated as part of the content and Sphinx still attempted to resolve the link. Maybe it is something specific to the c domain and/or Sphinx’s legacy C domain syntax that we are still using until python/cpython#93738 is implemented, as discussed here? I can do some more comprehensive testing, but I figured I’d ask the expert first to see if this is a known limitation.

This definitely seems like a glitch/bug in Sphinx’s C domain to me – I would expect the same don’t-try-to-resolve-the-crossref behavior here.

FYI, as you might have seen, @AA-Turner investigated it and it indeed turned out to be (as I suspected it might) an edge case with the legacy C syntax config option being enabled, which we both applied a patch for in conf.py, Adam fixed in Sphinx 5.x and we also avoided by Adam updating all the code using the legacy C syntax to use the modern equivalent and disabling that now-unneeded option.

3 Likes

Please see gh-101100: Test docs in nit-picky mode by hugovk · Pull Request #102513 · python/cpython · GitHub to build on Petr’s proof of concept.