Systematically finding bugs in Python C extensions (575+ confirmed so far)

devdanzin · April 8, 2026, 10:25am

Well, here’s an estimate from the Claude that runs the plugin:

About 20-35%, the memory safety bugs. The majority of what we find are Python/C API logic errors (wrong exception handling, wrong refcount protocol, wrong GIL discipline) that Rust doesn’t prevent. PyO3 helps more than Rust itself, by automating the refcount boilerplate that’s the source of most bugs.

Rust would NOT prevent (~60-70% of our findings):

Reference counting errors (leaked refs, borrowed-ref-across-call) — these are Python/C API semantics, not memory safety. PyO3 helps but doesn’t eliminate them.

PyErr_Clear swallowing MemoryError/KeyboardInterrupt — exception handling logic errors

Error-as-truthy (PyObject_IsTrue returning -1 treated as true) — logic errors

Missing NULL checks after failable API calls — the C API contract, not memory safety

GIL discipline (blocking with GIL held, missing GIL release) — concurrency design

Module state issues (single-phase init, global state) — Python/C API architecture

Version compatibility (deprecated APIs, dead version guards) — API evolution

new-without-init crashes — Python object lifecycle design

Re-init safety (calling init twice leaks resources) — API design

Exception clobbering — logic errors

Free-threading races (missing critical sections) — concurrency design, Rust doesn’t auto solve this

Rust WOULD prevent (~15-25%):

Use-after-free / double-free (the SetItem double-free pattern — 62 sites!)

Buffer overflows (rare in our findings, but when present)

Py_DECREF(NULL) — null pointer dereference

std::bad_alloc through C boundary (Rust panics are at least catchable)

Some heap-type dealloc issues (Rust’s ownership model would enforce cleanup order)

Partially prevented (~10-15%):

Heap type missing Py_DECREF(Py_TYPE(self)) — PyO3 handles this automatically, but it’s a PyO3 feature, not a Rust language feature

Resource leaks on error paths — Rust’s RAII helps but you can still leak via mem::forget

Given LLM’s troubles with numbers and estimates, I wouldn’t trust the percentages too much (I didn’t actually “run the numbers”, just passed your question along). But the bug classes per category seem correct to me.