Systematically finding bugs in Python C extensions (575+ confirmed so far)

Howdy Daniel!

First of all, thank you for creating the thread, and I believe that the tool is truly excellent.

I know that the thread is about C extensions but maybe my experiences with your GitHub - devdanzin/cpython-review-toolkit: A Claude Code plugin for exploring, analyzing, and reviewing CPython's C source code. · GitHub in the CPython realm are applicable. The conversation is now focuses on both how to improve the tool and how to handle the reports. I’d like to post my two eurocents on the latter.

I’m not going to post the exact PRs (not hard to find, only three) created in the python/cpython repository, but even after manually reviewing the reports (some false positives, some just plain duplicates of what you’ve found), my success rate wasn’t great.

I intentionally picked one optimization, one (that I believe) is a nitpick, and one perfectly reproducible. One was rejected, one most likely will make it, and one led to an interesting conversation.

The sample is far from statistically significant, but my conclusion why is not exactly related to code but more to the overall context.

Unless the issue is critical (even if perfectly reproducible), many fixes are just distracting. Maintainers have their own projects, plans, schedules etc., and some pathological refleak is not really that important. I believe that such PRs used to make it in the past, because they were seen as an investment (education) in a potential maintainer, a future colleague. Now, it’s “Contributor” badge hunting.

I don’t like to philosophize too much but it seems to me to be similar to law. For various reasons, societies agree that it’s OK not to abide by the law always. Perfect policing hits dimishing returns. Most likely, various projects will settle on different either explicit or implicit rules with what level of bugs they’re OK with.

Importantly, the metaphor also holds because legal systems prefer policing crimes with victims. In this context, unless someone was hit with a bug or might be realistically hit, it’s a theoretical distraction.

More actionably, I think it’d be interesting to start thinking not only in terms of code, but researching what were the past decisions in the project. I think that `git blame`, github.com etc. are the perfect resources as a start:

  • Do maintainers rejected similar reports in the past?
  • What is the policy stated by maintainers?
  • Are there similar fixes in the code lately?

The actual rules of the community are the filter.

(I don’t know how it solves the problem of victimless bugs, though, as well as the trade-off between open source as a community and extracting status from the community.)

Sorry for the ramble! This is a novel subject and there are no established ways of thinking about it yet.