Testing linters and formatters by program generation and property checking

Hi there!

Does anybody know if there is some research about testing Python code checkers tools by code generation and property checking? Maybe there are works that tests mypy/black/flake8/pylint/ruff/etc. this way?

Maybe I am naive :sweat_smile:, but it seems promising to generate a bunch of Python programs (using LLM or formal methods) with some broken properties, run the tool and check that the tool has found all mistakes or fixed them. For instance, we can ask a generator to produce programs with a long lines, run black with line-length=N and check that here is no line with length greater than N.