Systematically finding bugs in Python C extensions (575+ confirmed so far)

What I was thinking of would be a deterministic coverage guided fuzzer.

It would require a couple of abilities:

  • The ability to selectively fail a call to specified functions.

  • Following code execution coverage. If a change doesn’t result in new code coverage we don’t trigger that one anymore, recursively. However, I think a lot of the fails are going to be quickly followed by correct error exit or a crash of some form.

  • Tracing if calls occurred within a specific directory/tree. (e.g. pillow/src). Don’t fail calls that are from other source trees (like core python)

  • ASAN coverage.

What I’d see it doing is run the test harness once for a baseline of code coverage, and then systematically fail one new call per run, in code execution order looking for code coverage changes. If you’ve failed one (new) call, run to completion. There would potentially be issues with calls in a tight loop, bit if it was failing per-code location rather than per-call it would prevent running one test run per iteration, assuming code coverage didn’t change.

That should exhaustively cover error cases. Probably slowly, but it would definitely help getting much higher branch test coverage.