Automated testing of documentation grammar and links

Hi,

This is more a question than anything else as I am new to contributing to CPython.

Are there any tools that are currently used to run checks on the Python documentation? For example, web crawlers to check broken links or grammar checkers.

I don’t have much experience with this but there seems to be some good tools in the Python ecosystem for both of these things.
For example:

There seems to be quite a few issues related to these things so I wonder if it would be useful to implement them for checking common documentation issues? Or perhaps relying on individuals flagging these issues works effectively enough?

I’d potentially be interested in creating one but I wouldn’t want to waste my time if they already exist or those with more experience don’t think it would be useful.

1 Like

sphinx-lint is being used to check for reST markup errors. You can run make linkcheck to check for broken hyperlinks; this uses Sphinx’s built-in builder for this task.

2 Likes

Yes, there’s a link checker built into the Sphinx documentation tool which is run every so often. For example: Sphinx linkcheck and broken/redirect occurrences in Python Docs · Issue #103484 · python/cpython · GitHub

There’s some discussion in there about whether to run it on CI: Sphinx linkcheck and broken/redirect occurrences in Python Docs · Issue #103484 · python/cpython · GitHub

Against is it taking 50 minutes, but we can cut that down to 5 by skippin links to our own issue tracker. That’s better, but still a bit slow, and it’s prone to new failures when things change external to a PR, like a website going down or temporary network failures.

So we may be better off running it from time to time.

As to grammar checkers, I think they tend to flag too many false positives to be used in any automated fashion.

5 Likes

Thanks for mentioning sphinx-lint, this could be a good place to help out, there’s some open issues and fixing those would help improve the quality of at least the CPython docs, devguide and the PEPs.

2 Likes

We maintain pytest-check-links in Jupyter for this reason - the responses get cached in GitHub Actions CI. GitHub - jupyterlab/pytest-check-links: pytest plugin that checks URLs