Identifying undocumented bits

I’ve been messing around trying to compare modules with their corresponding .rst files to identify undocumented stuff. I don’t know if this sort of thing would be useful, but before I go much further (the code is almost certainly a mess), I thought I would see if the sort of output it’s generating would be useful to those working on the documentation.

The basic idea is to get a list of modules from the module index and for each module compare its __all__ attribute value (or dir(mod) if there is no __all__) with the various references in the modules documentation. For each module it then displays the module name and the list of attributes it didn’t find in the module’s documentation. For example:

array ['ArrayType']
ast ['TypeIgnore', 'boolop', 'cmpop', 'excepthandler', 'expr', 'expr_context', 'main', 'mod', 'operator', 'pattern', 'stmt', 'type_ignore', 'type_param', 'unaryop']
asyncio ['AbstractServer', 'BaseEventLoop', 'StreamReaderProtocol', '_get_running_loop', '_set_running_loop', 'create_subprocess_exec', 'create_subprocess_shell', 'gather', 'iscoroutinefunction', 'open_connection', 'open_unix_connection', 'shield', 'sleep', 'start_server', 'start_unix_server', 'to_thread', 'wait', 'wait_for']
codecs ['BOM32_BE', 'BOM32_LE', 'BOM64_BE', 'BOM64_LE']
configparser ['ConverterMapping', 'DEFAULTSECT', 'Interpolation', 'SectionProxy']
copyreg ['add_extension', 'clear_extension_cache', 'remove_extension']
ctypes ['ARRAY', 'SIZEOF_TIME_T', 'SetPointerType', 'c_buffer', 'c_voidp']
curses ['ALL_MOUSE_EVENTS', 'REPORT_MOUSE_POSITION', 'intrflush', 'window']

No doubt, much of the stuff will be trivial. For example, it’s not clear that describing the various BOM* values in the codecs module will be all that valuable. Other stuff, like many of the exceptions/errors in the email.errors module:

email.errors ['CharsetError', 'HeaderMissingRequiredValue', 'InvalidHeaderDefect', 'InvalidMultipartContentTransferEncodingDefect', 'NonASCIILocalPartDefect', 'NonPrintableDefect', 'ObsoleteHeaderDefect', 'UndecodableBytesDefect']

might be more interesting discoveries.

If nothing else, the generated list might be useful for people new to documenting Python (easy stuff to correct) or for those looking to rework the existing documentation for a given module.

If something like this is already available, I’ll crawl back in my hole. If not, and those in the know think it might be of value, I can toss it up on my GitHub repo.


Thanks, I think this is good to have.

A related project that I’d find useful would be to find things that are documented, but that are not in GitHub - python/typeshed: Collection of library stubs for Python, with static types. I hope we’ve gotten all of those by now, but there may still be things missing.

As Jelle mentions, the docs <-> code problem is similar to the type stubs <-> code. When testing whether typeshed type stubs match code, I found I needed to use slightly more sophisticated logic than dir(module) when __all__ is not present. See mypy/mypy/ at master · python/mypy · GitHub , hopefully it’s of some use!

Also see e.g. GitHub - bskinn/sphobjinv: Toolkit for manipulation and inspection of Sphinx objects.inv files which uses the Sphinx-generated objects.inv file – you might find that useful @smontanaro if you’re currently just parsing RST yourself.


1 Like

Ooh, thanks @AA-Turner . I wouldn’t call what my current tool is “parsing.” That .inv tool would definitely be a step up.

1 Like

I redid my tool slightly, using sphobjinv to get the inventory. I had named the original comparedoc, which I never pushed to my GitHub account. The new one is named pyundoc and has been pushed to GitHub.

I make no claims that this is pretty code. I was just “scratching an itch.” I’m sure more could be done. This might be a reasonable starting point though.