Add the sphinx-codeautolink extension to doc build process?

This thread is to gather opinions on whether it would be a good idea to add the sphinx-codeautolink extension (https://sphinx-codeautolink.readthedocs.io) to the documentation build process. The extension, in their own words,

makes code examples clickable by inserting links from individual code elements to the corresponding reference documentation

In my opinion, it is quite helpful for documentation readers to be able to immediately jump from e.g. tutorial code using logging.getLogger to its reference description. I have tested this locally, and after a few fixes which need a new release of the extension, it works quite well for the tutorials and how-tos.

Obviously, the styling of these extra links can be as unobtrusive as we want them since the extension adds its own CSS class to those links (Examples — sphinx-codeautolink 0.16.2 documentation).
Trio, for example, does not add any extra styling, so you have to know that some of the code is linked: Tutorial — Trio 0.29.0 documentation

Code examples that are self-contained, i.e. have all their imports, do not need any change.
Examples that are made up of parts that build on each other can also be supported with some extra markup: Examples — sphinx-codeautolink 0.16.2 documentation
Missing imports can be annotated “invisibly”: Examples — sphinx-codeautolink 0.16.2 documentation

Cons: adding the extension is an extra dependency and increases the documentation build time somewhat (not measured so far).

So
 any feedback? I’m ready to prepare a PR if positive.

I gave it a quick test to see how it looks and got lots of errors:

building [html]: targets for 511 source files that are out of date
updating environment: [new config] 511 added, 0 changed, 0 removed
reading sources... [100%] whatsnew/3.14 .. whatsnew/index
Traceback (most recent call last):
  File "Doc/venv/lib/python3.13/site-packages/sphinx_codeautolink/extension/__init__.py", line 46, in wrapper
    return func(*args, **kwargs)
  File "Doc/venv/lib/python3.13/site-packages/sphinx_codeautolink/extension/__init__.py", line 182, in create_references
    self.filter_and_resolve(transforms, skipped, doc)
    ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "Doc/venv/lib/python3.13/site-packages/sphinx_codeautolink/extension/__init__.py", line 206, in filter_and_resolve
    key = resolve_location(name, self.inventory)
  File "Doc/venv/lib/python3.13/site-packages/sphinx_codeautolink/extension/resolve.py", line 40, in resolve_location
    cursor = locate_type(cursor, tuple(comps), inventory)
  File "Doc/venv/lib/python3.13/site-packages/sphinx_codeautolink/extension/resolve.py", line 94, in locate_type
    return locate_type(previous, components[i:], inventory)
  File "Doc/venv/lib/python3.13/site-packages/sphinx_codeautolink/extension/resolve.py", line 94, in locate_type
    return locate_type(previous, components[i:], inventory)
  File "Doc/venv/lib/python3.13/site-packages/sphinx_codeautolink/extension/resolve.py", line 94, in locate_type
    return locate_type(previous, components[i:], inventory)
  [Previous line repeated 983 more times]
  File "Doc/venv/lib/python3.13/site-packages/sphinx_codeautolink/extension/resolve.py", line 68, in locate_type
    cursor = Cursor(
        cursor.location + "." + component,
        getattr(cursor.value, component, None),
        cursor.instance,
    )
RecursionError: maximum recursion depth exceeded
Doc/library/ast.rst:1216: WARNING: unterminated triple-quoted string literal (detected at line 24) (<unknown>, line 1) in document 'library/ast'
Parsed source in `pycon` block:
block source: >>> print(ast.dump(ast.parse("""\
              ... for a in b:
              ...     if a > 5:
              ...         break
              ...     else:
              ...         continue
              ...
              ... """), indent=4))
              Module(
                  body=[
                      For(
                          target=Name(id='a', ctx=Store()),
                          iter=Name(id='b', ctx=Load()),
                          body=[
                              If(
                                  test=Compare(
                                      left=Name(id='a', ctx=Load()),
                                      ops=[
                                          Gt()],
                                      comparators=[
                                          Constant(value=5)]),
                                  body=[
                                      Break()],
                                  orelse=[
                                      Continue()])])]) [codeautolink.parse_block]
Doc/library/configparser.rst:347: WARNING: unterminated triple-quoted string literal (detected at line 9) (<unknown>, line 1) in document 'library/configparser'
Parsed source in `pycon` block:
block source: >>> config = """
              ... option = value
              ...
              ... [  Section 2  ]
              ... another = val
              ... """
              >>> unnamed = configparser.ConfigParser(allow_unnamed_section=True)
              >>> unnamed.read_string(config)
              >>> unnamed.get(configparser.UNNAMED_SECTION, 'option')
              'value' [codeautolink.parse_block]
Doc/library/configparser.rst:778: WARNING: unterminated triple-quoted string literal (detected at line 19) (<unknown>, line 1) in document 'library/configparser'
Parsed source in `pycon` block:
block source: >>> config = """
              ... [Section1]
              ... Key = Value
              ...
              ... [Section2]
              ... AnotherKey = Value
              ... """
              >>> typical = configparser.ConfigParser()
              >>> typical.read_string(config)
              >>> list(typical['Section1'].keys())
              ['key']
              >>> list(typical['Section2'].keys())
              ['anotherkey']
              >>> custom = configparser.RawConfigParser()
              >>> custom.optionxform = lambda option: option
              >>> custom.read_string(config)
              >>> list(custom['Section1'].keys())
              ['Key']
              >>> list(custom['Section2'].keys())
              ['AnotherKey'] [codeautolink.parse_block]
Doc/library/configparser.rst:815: WARNING: unterminated triple-quoted string literal (detected at line 16) (<unknown>, line 2) in document 'library/configparser'
Parsed source in `pycon` block:
block source: >>> import re
              >>> config = """
              ... [Section 1]
              ... option = value
              ...
              ... [  Section 2  ]
              ... another = val
              ... """
              >>> typical = configparser.ConfigParser()
              >>> typical.read_string(config)
              >>> typical.sections()
              ['Section 1', '  Section 2  ']
              >>> custom = configparser.ConfigParser()
              >>> custom.SECTCRE = re.compile(r"\[ *(?P<header>[^]]+?) *\]")
              >>> custom.read_string(config)
              >>> custom.sections()
              ['Section 1', 'Section 2'] [codeautolink.parse_block]
Doc/library/re.rst:1730: WARNING: unterminated triple-quoted string literal (detected at line 6) (<unknown>, line 1) in document 'library/re'
Parsed source in `pycon` block:
block source: >>> text = """Ross McFluff: 834.345.1254 155 Elm Street
              ...
              ... Ronald Heathmore: 892.345.3428 436 Finley Avenue
              ... Frank Burger: 925.541.7625 662 South Dogwood Way
              ...
              ...
              ... Heather Albrecht: 548.326.4584 919 Park Place""" [codeautolink.parse_block]
Doc/using/windows.rst:421: WARNING: invalid syntax (<unknown>, line 1) in document 'using/windows'
Parsed source in `python` block:
block source: >>> import os
              >>> test_file = 'C:\\Users\\example\\AppData\\Local\\test.txt'
              >>> os.path.realpath(test_file)
              'C:\\Users\\example\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\\LocalCache\\Local\\test.txt' [codeautolink.parse_block]
Doc/whatsnew/3.10.rst:160: WARNING: '{' was never closed (<unknown>, line 1) in document 'whatsnew/3.10'
Parsed source in `python` block:
block source: expected = {9: 1, 18: 2, 19: 2, 27: 3, 28: 3, 29: 3, 36: 4, 37: 4,
                          38: 4, 39: 4, 45: 5, 46: 5, 47: 5, 48: 5, 49: 5, 54: 6,
              some_other_code = foo() [codeautolink.parse_block]
Doc/whatsnew/3.10.rst:169: WARNING: invalid syntax (<unknown>, line 1) in document 'whatsnew/3.10'
Parsed source in `python` block:
block source: File "example.py", line 3
                  some_other_code = foo()
                                  ^
              SyntaxError: invalid syntax [codeautolink.parse_block]
Doc/whatsnew/3.10.rst:178: WARNING: invalid syntax (<unknown>, line 1) in document 'whatsnew/3.10'
Parsed source in `python` block:
block source: File "example.py", line 1
                  expected = {9: 1, 18: 2, 19: 2, 27: 3, 28: 3, 29: 3, 36: 4, 37: 4,
                             ^
              SyntaxError: '{' was never closed [codeautolink.parse_block]
Doc/whatsnew/3.10.rst:199: WARNING: invalid syntax (<unknown>, line 1) in document 'whatsnew/3.10'
Parsed source in `python` block:
block source: >>> foo(x, z for z in range(10), t, w)
                File "<stdin>", line 1
                  foo(x, z for z in range(10), t, w)
                         ^
              SyntaxError: Generator expression must be parenthesized [codeautolink.parse_block]
Doc/whatsnew/3.10.rst:209: WARNING: invalid syntax (<unknown>, line 1) in document 'whatsnew/3.10'
Parsed source in `python` block:
block source: >>> foo(x, z for z in range(10), t, w)
                File "<stdin>", line 1
                  foo(x, z for z in range(10), t, w)
                         ^^^^^^^^^^^^^^^^^^^^
              SyntaxError: Generator expression must be parenthesized [codeautolink.parse_block]
Doc/whatsnew/3.10.rst:224: WARNING: invalid syntax (<unknown>, line 1) in document 'whatsnew/3.10'
Parsed source in `python` block:
block source: >>> if rocket.position > event_horizon
                File "<stdin>", line 1
                  if rocket.position > event_horizon
                                                    ^
              SyntaxError: expected ':' [codeautolink.parse_block]
Doc/whatsnew/3.10.rst:236: WARNING: invalid syntax (<unknown>, line 1) in document 'whatsnew/3.10'
Parsed source in `python` block:
block source: >>> {x,y for x,y in zip('abcd', '1234')}
                File "<stdin>", line 1
                  {x,y for x,y in zip('abcd', '1234')}
                   ^
              SyntaxError: did you forget parentheses around the comprehension target? [codeautolink.parse_block]
Doc/whatsnew/3.10.rst:248: WARNING: invalid syntax (<unknown>, line 1) in document 'whatsnew/3.10'
Parsed source in `python` block:
block source: >>> items = {
              ... x: 1,
              ... y: 2
              ... z: 3,
                File "<stdin>", line 3
                  y: 2
                     ^
              SyntaxError: invalid syntax. Perhaps you forgot a comma? [codeautolink.parse_block]
Doc/whatsnew/3.10.rst:263: WARNING: invalid syntax (<unknown>, line 1) in document 'whatsnew/3.10'
Parsed source in `python` block:
block source: >>> try:
              ...     build_dyson_sphere()
              ... except NotEnoughScienceError, NotEnoughResourcesError:
                File "<stdin>", line 3
                  except NotEnoughScienceError, NotEnoughResourcesError:
                         ^
              SyntaxError: multiple exception types must be parenthesized [codeautolink.parse_block]
Doc/whatsnew/3.10.rst:277: WARNING: invalid syntax (<unknown>, line 1) in document 'whatsnew/3.10'
Parsed source in `python` block:
block source: >>> values = {
              ... x: 1,
              ... y: 2,
              ... z:
              ... }
                File "<stdin>", line 4
                  z:
                   ^
              SyntaxError: expression expected after dictionary key and ':'

              >>> values = {x:1, y:2, z w:3}
                File "<stdin>", line 1
                  values = {x:1, y:2, z w:3}
                                      ^
              SyntaxError: ':' expected after dictionary key [codeautolink.parse_block]
Doc/whatsnew/3.10.rst:299: WARNING: invalid syntax (<unknown>, line 1) in document 'whatsnew/3.10'
Parsed source in `python` block:
block source: >>> try:
              ...     x = 2
              ... something = 3
                File "<stdin>", line 3
                  something  = 3
                  ^^^^^^^^^
              SyntaxError: expected 'except' or 'finally' block [codeautolink.parse_block]
Doc/whatsnew/3.10.rst:313: WARNING: invalid syntax (<unknown>, line 1) in document 'whatsnew/3.10'
Parsed source in `python` block:
block source: >>> if rocket.position = event_horizon:
                File "<stdin>", line 1
                  if rocket.position = event_horizon:
                                     ^
              SyntaxError: cannot assign to attribute here. Maybe you meant '==' instead of '='? [codeautolink.parse_block]
Doc/whatsnew/3.10.rst:325: WARNING: invalid syntax (<unknown>, line 1) in document 'whatsnew/3.10'
Parsed source in `python` block:
block source: >>> f"Black holes {*all_black_holes} and revelations"
                File "<stdin>", line 1
                  (*all_black_holes)
                   ^
              SyntaxError: f-string: cannot use starred expression here [codeautolink.parse_block]
Doc/whatsnew/3.10.rst:341: WARNING: invalid syntax (<unknown>, line 1) in document 'whatsnew/3.10'
Parsed source in `python` block:
block source: >>> def foo():
              ...    if lel:
              ...    x = 2
                File "<stdin>", line 3
                  x = 2
                  ^
              IndentationError: expected an indented block after 'if' statement in line 2 [codeautolink.parse_block]
Doc/whatsnew/3.10.rst:359: WARNING: invalid syntax (<unknown>, line 1) in document 'whatsnew/3.10'
Parsed source in `python` block:
block source: >>> collections.namedtoplo
              Traceback (most recent call last):
                File "<stdin>", line 1, in <module>
              AttributeError: module 'collections' has no attribute 'namedtoplo'. Did you mean: namedtuple? [codeautolink.parse_block]
Doc/whatsnew/3.10.rst:380: WARNING: invalid syntax (<unknown>, line 1) in document 'whatsnew/3.10'
Parsed source in `python` block:
block source: >>> schwarzschild_black_hole = None
              >>> schwarschild_black_hole
              Traceback (most recent call last):
                File "<stdin>", line 1, in <module>
              NameError: name 'schwarschild_black_hole' is not defined. Did you mean: schwarzschild_black_hole? [codeautolink.parse_block]
Doc/whatsnew/3.11.rst:118: WARNING: invalid syntax. Perhaps you forgot a comma? (<unknown>, line 1) in document 'whatsnew/3.11'
Parsed source in `python` block:
block source: Traceback (most recent call last):
                File "distance.py", line 11, in <module>
                  print(manhattan_distance(p1, p2))
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^
                File "distance.py", line 6, in manhattan_distance
                  return abs(point_1.x - point_2.x) + abs(point_1.y - point_2.y)
                                         ^^^^^^^^^
              AttributeError: 'NoneType' object has no attribute 'x' [codeautolink.parse_block]
Doc/whatsnew/3.11.rst:133: WARNING: invalid syntax. Perhaps you forgot a comma? (<unknown>, line 1) in document 'whatsnew/3.11'
Parsed source in `python` block:
block source: Traceback (most recent call last):
                File "query.py", line 37, in <module>
                  magic_arithmetic('foo')
                File "query.py", line 18, in magic_arithmetic
                  return add_counts(x) / 25
                         ^^^^^^^^^^^^^
                File "query.py", line 24, in add_counts
                  return 25 + query_user(user1) + query_user(user2)
                              ^^^^^^^^^^^^^^^^^
                File "query.py", line 32, in query_user
                  return 1 + query_count(db, response['a']['b']['c']['user'], retry=True)
                                             ~~~~~~~~~~~~~~~~~~^^^^^
              TypeError: 'NoneType' object is not subscriptable [codeautolink.parse_block]
Doc/whatsnew/3.11.rst:151: WARNING: invalid syntax. Perhaps you forgot a comma? (<unknown>, line 1) in document 'whatsnew/3.11'
Parsed source in `python` block:
block source: Traceback (most recent call last):
                File "calculation.py", line 54, in <module>
                  result = (x / y / z) * (a / b / c)
                            ~~~~~~^~~
              ZeroDivisionError: division by zero [codeautolink.parse_block]
Doc/whatsnew/3.12.rst:310: WARNING: invalid syntax (<unknown>, line 1) in document 'whatsnew/3.12'
Parsed source in `python` block:
block source: >>> my_string = f"{x z y}" + f"{1 + 1}"
                File "<stdin>", line 1
                  (x z y)
                   ^^^
              SyntaxError: f-string: invalid syntax. Perhaps you forgot a comma? [codeautolink.parse_block]
Doc/whatsnew/3.12.rst:322: WARNING: invalid syntax (<unknown>, line 1) in document 'whatsnew/3.12'
Parsed source in `python` block:
block source: >>> my_string = f"{x z y}" + f"{1 + 1}"
                File "<stdin>", line 1
                  my_string = f"{x z y}" + f"{1 + 1}"
                                 ^^^
              SyntaxError: invalid syntax. Perhaps you forgot a comma? [codeautolink.parse_block]
Doc/whatsnew/3.12.rst:459: WARNING: Did you mean to use 'from ... import ...' instead? (<unknown>, line 1) in document 'whatsnew/3.12'
Parsed source in `pycon` block:
block source: >>> import a.y.z from b.y.z
              Traceback (most recent call last):
                File "<stdin>", line 1
                  import a.y.z from b.y.z
                  ^^^^^^^^^^^^^^^^^^^^^^^
              SyntaxError: Did you mean to use 'from ... import ...' instead? [codeautolink.parse_block]
Doc/whatsnew/3.13.rst:502: WARNING: unterminated triple-quoted string literal (detected at line 9) (<unknown>, line 2) in document 'whatsnew/3.13'
Parsed source in `pycon` block:
block source: >>> def spam():
              ...     """
              ...         This is a docstring with
              ...           leading whitespace.
              ...
              ...         It even has multiple paragraphs!
              ...     """
              ...
              >>> spam.__doc__
              '\nThis is a docstring with\n  leading whitespace.\n\nIt even has multiple paragraphs!\n' [codeautolink.parse_block]
Doc/library/symtable.rst:206: WARNING: unterminated triple-quoted string literal (detected at line 18) (<unknown>, line 2) in document 'library/symtable'
Parsed source in `pycon` block:
block source: >>> import symtable
              >>> st = symtable.symtable('''
              ... def outer(): pass
              ...
              ... class A:
              ...    def f():
              ...        def w(): pass
              ...
              ...    def g(self): pass
              ...
              ...    @classmethod
              ...    async def h(cls): pass
              ...
              ...    global outer
              ...    def outer(self): pass
              ... ''', 'test', 'exec')
              >>> class_A = st.get_children()[2]
              >>> class_A.get_methods()
              ('f', 'g', 'h') [codeautolink.parse_block]
Doc/whatsnew/3.14.rst:168: WARNING: invalid syntax (<unknown>, line 1) in document 'whatsnew/3.14'
Parsed source in `python` block:
block source: >>> x, y, z = 1, 2, 3, 4
              Traceback (most recent call last):
                File "<stdin>", line 1, in <module>
                  x, y, z = 1, 2, 3, 4
                  ^^^^^^^
              ValueError: too many values to unpack (expected 3, got 4) [codeautolink.parse_block]
Doc/whatsnew/3.14.rst:182: WARNING: invalid syntax (<unknown>, line 1) in document 'whatsnew/3.14'
Parsed source in `python` block:
block source: >>> "The interesting object "The important object" is very important"
              Traceback (most recent call last):
              SyntaxError: invalid syntax. Is this intended to be part of the string? [codeautolink.parse_block]

Extension error (sphinx_codeautolink.extension):
Handler <bound method SphinxCodeAutoLink.create_references of <sphinx_codeautolink.extension.SphinxCodeAutoLink object at 0x106ed4ec0>> for event 'env-updated' threw an exception (exception: maximum recursion depth exceeded)
make: *** [build] Error 2

I’ve not checked those closely but we’d need a way to configure it not to parse new and unreleased Python syntax that the extension doesn’t support, and it looks like we have lots of code blocks that intentionally demonstrate buggy code that we’d have to somehow tell it to ignore.

Is that possible?

From the docs, it looks like Reference — sphinx-codeautolink 0.16.2 documentation would be the option to use, but I have not tried it yet.
If that’s not the right option, the extension author would likely add a new one.
And of course, there is always suppress_warnings that can be set for warnings from this extension specifically :slight_smile:

The RecursionError has been fixed a few hours ago, triggered by `RecursionError` when run on documentation of Python itself · Issue #165 · felix-hilden/sphinx-codeautolink · GitHub
Support highlight language fallbacks in blocks · Issue #166 · felix-hilden/sphinx-codeautolink · GitHub is another relevant problem that already has a fix.

Most of those errors you show above come from example code syntax using >>> that requires the pycon Pygments lexer but is not explicit marked up as such - so the default highlight_language = python3 from conf.py would normally take effect.
Sphinx appears to have code to detect the >>> and automatically switch the lexer being used, but the extension does not (yet - I have reported this).
In case the extension author does not want to implement auto-detection, we could set the document-wide .. highlight:: pycon for documents that mostly use this syntax (like howto/regex) and explicitly mark up the “outliers”.

I have also seen some places where the correct Pygments lexer specification is missing, e.g. for shell code - I would fix those in a PR.

I’ll take a look at the best way to handle unreleased Python syntax next.

Sphinx appears to have code to detect the >>> and automatically switch the lexer being used,

Yes, we do this: sphinx-doc/sphinx@03df811/sphinx/highlighting.py#L148-L153.

We had a discussion at Sphinx to decide whether to have this kind of extension for C/C++ and we were also wondering how to make it work for Python. So it could be possible to include it upstream as well if the extension doesn’t end up too large.

I’ll take a look at the best way to handle unreleased Python syntax next.

I think there’s some skip directive that can be used. Most of the errors actually are either triple-quoted strings that are not detected as being terminated, or code that has syntax errors on purpose (which in this case we should be able to mark as “unsafe”).

Now, building the docs could also be a bit hard depending on what we parse. For instance, if Python 3.12 docs using the type keyword are built using Sphinx and Python 3.10, I think it would fail because the type A = ... would be a syntax error in 3.10 (but not in 3.12) and we won’t be able to extract a correct AST (the extension uses the ast module to parse the code-block).


By the way,

Doc/library/symtable.rst:206: WARNING: unterminated triple-quoted string literal (detected at line 18) (, line 2) in document ‘library/symtable’

actually made me realize that I’ve actually used a 3-space indent for def f() but a 4-space indent for the inner def w(). (This portion of the code is something I have written). So thanks for spotting an unrelated issue :')

The .. autolink-skip:: directive of the extension is indeed meant for skipping such “unsafe” code. I have found a bug in it though and reported it.

I suppose for cases like the type keyword, it would be okay to just let parsing fail for these rare occurrences (with warning suppression) and thus not add links?

I suppose for cases like the type keyword, it would be okay to just let parsing fail for these rare occurrences (with warning suppression) and thus not add links?

Probably, although this kind of issue would only appear if the base Python building the docs is < 3.12. It shouldn’t happen on the main branch but for local development it could be annoying (for instance, I use a system-wide installation of Python 3.12 to run Sphinx).

Perhaps we should wait a while for the extension to be more robust. For the benefit of redistributors, we typically make such quality-of-life extensions optional (see the e.g. opengraph), and so if we adopted it I be hesitant to have any changes beyond conf.py.

A

I guess the problem is that if we just add directives like .. autolink-skip:: to the docs and the extension is not present, doc building would fail with “Unknown directive type”, right?
Are there downsides if we define a dummy directive with the same name that does nothing, using e.g. add_directive(override=False)?
(EDIT: seems to work fine: Comparing python:main...cmarqu:feat/add-sphinx-codeautolink · python/cpython · GitHub)

Yes, but I would expect to be able to use the extension without any such ‘skip’ directives — it should be robust enough to handle failures to parse (if so configured).

A

Oh, it already is robust. The one RecursionError has been fixed, with a release pending.
It will print warnings for parse failures but these can be suppressed.
What is not configurable yet is to not report these warnings in the first place. I’ll open a ticket, assuming such a config option is preferred over using suppress_warnings.

1 Like

While working on a PR, I found that the extension employing a parser uncovers a number of smaller bugs in code blocks in the documentation. This also requires us to make some “accidentally” working highlighting by Pygments more explicit and correct.