Systematically finding bugs in Python C extensions (575+ confirmed so far)

Oh, I’d be very interested in getting more context about your experience with the findings and the maintainers. Any feedback about the reports is welcome too, I’m often able to improve the prompts to address issues maintainers raise. I’ll search for the PRs later, thanks for the pointer.

I agree that not all findings are worth fixing, but each maintainer has their own rules for what is valuable and what isn’t. For some, pathological refleaks warrant a fix, for others a rare segfault is tolerable. The best I can do is offer a listing of what the tools find and let them decide what to fix.

Oh, I was lured by “easy” issues myself and truly hope some of these findings might make someone interested enough in CPython and extensions that they’ll become a maintainer. It’s always been a sieve situation, where most contributors don’t stick around. I think we should still invest in things that might help to find new maintainers, like easy issues and helping new contributors, even if badge hunters take most of them. I take it you don’t agree?

I’ve had little feedback from CPython devs about whether these tiny PRs targeting nits, leaks, etc. are valuable at all. If they are, I don’t see a problem with people making drive-by contributions, but am open to discuss the desirability of this.

Yes, it’s something each project will figure out. So far, my contributions are like lists of petty crimes I witnessed and the maintainers decide which to pursue and which to ignore. It may well evolve into a situation where I know what is interesting for a project beforehand and only report significant issues.

I try to get a feel about what a project is willing to accept. I could perform this kind of research more carefully for each project I analyze, but I figure it’s easier and more reliable to simply ask maintainers to look at concrete findings and give feedback if they want. For many extensions I already know what is acceptable and what will be rejected.

For CPython, I still don’t know what the acceptance thresholds are (sometimes it seems every little issue will be fixed, but then we try to fix them and some are rejected as you say).

Funny you talk about Are there similar fixes in the code lately?, one of the most successful agents does exactly that, but in the opposite direction: it looks for recent fixes and tries to find similar bugs that weren’t fixed. That leads to many interesting findings.

Indeed, but it’s often the case that the rules are implicit or even undecided. Until there are explicit guidelines to go by, I think erring by reporting too much is the less bad option, even more so when this elicits feedback about what is desirable and what isn’t. What would you suggest in the current situation?

Thanks for engaging and for your valuable perspective, I’ll try to take community standards into account more often and weigh whether issues are significant or just distractions.

Sorry for the delay, I wanted to answer sooner but there were too many things in-flight.