I was reading an article this morning that linked to this tool for TypeScript:
This tool will check type of all identifiers, the type coverage rate = the count of identifiers whose type is not any / the total count of identifiers , the higher, the better.
And I wondered if there is any similar tool for Python projects?
I think we can tell e.g. mypy to be strict about untyped identifiers and raise errors for them. We could then count the errors.
But what if we wanted to run it more leniently but still collect a soft metric about the state of our code?
Pyright can output a “type completeness” score. This measures the percentage of public symbols (functions, methods, globals etc) that are fully typed. It’ll also warn about any definitions that would require inference, and therefore might vary across different type checkers or versions.
I’ve been integrating MyPy coverage into several projects recently (stuff under aio-libs, like frozenlist/multidict/yarl, a few projects under ansible and some more).
The reports are uploaded to Codecov and are combined in the UI/metrics, but can also be viewed separately.
I add Codecov flags like MyPy and pytest to different reports so that they can be filtered when inspecting in web UI. I also set it up to report several separate GitHub Checks API statuses, for typing and regular pytest runtime in particular. This allows me to require different levels of coverage for normal runtime vs. type checking time. Plus the combined coverage is also reported.
When both coverage.py (pytest) and MyPy reports are uploaded to Codecov, the combined report may be confusing in some places. I had situations where I 100% knew the tests hit a line that shows up as red/uncovered there. It took some time to debug and update my mental model to understand what was actually happening. And what happened was that coverage.py may exclude some lines from coverage (either because of its internal logic or due to user-requested exclusions) — those lines are not marked as covered or uncovered, they are “not measured”. When looking at them in web UIs, they aren’t marked as green/red/yellow. But then, they show up as red and that’s because MyPy is what treats them as measured and may decide they’re uncovered. So the combined report ends up showing “uncovered overall”. This is easy to grasp once you understand what’s going on, especially if you have flags set up properly and compare what shows up when different ones are selected.
What makes things worse is that MyPy does not have a # pragma: no cover like Coverage.py so it’s really hard to rely on metrics being reported. In Coverage.py, it’s easy to mark some lines with this pragma + use global exclusions, which allows you to require 100% coverage. But with MyPy it’s currently not possible. Hopefully, it’ll be improved at some point.
This is all to say that the tool is not ideal but somewhat usable. And it’d be nice to have more people with this setup so that with more users it would be possible to get more feedback and fixes.