Parsing type stubs

ethanc8 · May 2, 2024, 7:15pm

Hi, I’m trying to make Python API docs for OpenCV, which is a CPython extension that ships type stubs in the project. I tried copying over the type stubs to a different directory, renaming *.pyi to *.py, and using sed to add from __future__ import annotations and rename the module (which is named cv2) to opencv_stubs so I could import both the stubs and the CPython extension into the same doc generation file. However, the classes have forward references to the parent classes, which causes import errors.

I was wondering if it would be better to try to make the type stubs importable, to use ast from the standard library to parse the stubs, or to use the implementation of the type checkers like mypy, pyright, pytype, etc to parse these type stubs.

The main problem I’m trying to solve is to get the type information from the type stubs, because this information cannot be found in the docstrings, in the CPython extension, in the Doxygen-generated C++ API docs, nor in the C++ header files

[I also posted this in #python_typing:gitter.im]

Jelle · May 2, 2024, 7:28pm

You may be able to use GitHub - JelleZijlstra/typeshed_client: Retrieve information from typeshed and other typing stubs, which is a library that can parse stubs and understands their semantic structure. However, you’d have to do significant work yourself to map the information to useful documentation.

ethanc8 · May 2, 2024, 8:53pm

Thanks – I’ll look into that.

ethanc8 · May 2, 2024, 9:38pm

Do you have a function that will return things like the types of the arguments, or does it just return the ast node and I’ll have to traverse the tree to find the types I need?

Jelle · May 2, 2024, 10:07pm

You have to traverse the AST yourself. I have code in pyanalyze/pyanalyze/typeshed.py at master · quora/pyanalyze · GitHub that translates the AST into types, but it’s tied to pyanalyze’s own representation of types.

ethanc8 · May 2, 2024, 10:20pm

I think I’ve almost gotten it, but I’m encountering an issue when trying to make strings from the annotation types:

def parseAstOf(name: str, data: FunctionData):
    ast = astOfFunction(name)
    for arg in ast.args.args:
        paramName = arg.arg
        if paramName not in data.params:
            data.params[paramName] = ParamData()
        param = data.params[paramName]
        param.name = paramName
        param.type = ast.unparse(ast.fix_missing_locations(arg.annotation))

(ParamData is a class I wrote, I can share my whole code if that’d be useful)

When I try to use ast.unparse, it tells me that

Traceback (most recent call last):
  File "/home/ethan/Projects/IMSA/FRC/VisionDocs/opencv-doc-parser/opencv-doc-parser/docstring-parsing.py", line 167, in <module>
    print(documentFunction("cv2.aruco.calibrateCameraCharucoExtended"))
  File "/home/ethan/Projects/IMSA/FRC/VisionDocs/opencv-doc-parser/opencv-doc-parser/docstring-parsing.py", line 121, in documentFunction
    parseAstOf(name, data)
  File "/home/ethan/Projects/IMSA/FRC/VisionDocs/opencv-doc-parser/opencv-doc-parser/docstring-parsing.py", line 164, in parseAstOf
    param.type = ast.unparse(ast.fix_missing_locations(arg.annotation))
AttributeError: 'FunctionDef' object has no attribute 'unparse'

Jelle · May 2, 2024, 10:23pm

You are shadowing the name of the ast module with a local variable when you do ast = astOfFunction(name). Use a different variable name.

ethanc8 · May 2, 2024, 10:30pm

Oh, thanks! I do find it a bit annoying that Python likes having classes and modules with lowercase names, since it tends to conflict with the identifiers I want to use. Well, changing that got it working.