Using Claude Code to look for issues within CPython?

Generated from GitHub - devdanzin/cpython-review-toolkit: A Claude Code plugin for exploring, analyzing, and reviewing CPython's C source code. · GitHub at commit gh-150685: update bundled pip to 26.1.2 (gh-150686) · python/cpython@f4bda4d:

Also maybe a set of starters for first-time contributors to delve into C code…

  CPython C Code Exploration Report

  Project: CPython 3.16.0a0

  Scope: Entire project (1,105 .c/.h files; 5,226 include directives)

  Agents Run: include-graph-mapper, refcount-auditor, error-path-analyzer,
  null-safety-scanner, gil-discipline-checker, c-complexity-analyzer,
  pep7-style-checker, api-deprecation-tracker, macro-hygiene-reviewer,
  memory-pattern-analyzer, git-history-analyzer

  ---
  Executive Summary

  CPython 3.16.0a0's core object and interpreter code is highly reliable — the refcount
  audit found zero confirmed bugs across 634 candidates, reflecting mature,
  well-reviewed patterns. The most significant risks cluster in two areas: the new
  Modules/_remote_debugging/ subsystem (introduced Nov 2025, showing a 28:20
  fix-to-feature ratio in 6 months) and new language feature code for PEP 750 t-strings
  (Objects/interpolationobject.c). The critical-section gap in
  BinaryWriter/BinaryReader is a genuine free-threaded crash risk in a module declared
  Py_MOD_GIL_NOT_USED. Five correctness bugs were confirmed across the codebase, all
  introduced since 2024, none yet fixed. The compiler pipeline (codegen.c/flowgraph.c)
  continues to be a high-churn area with recurring refcount cleanup oversights.

  ---
  Key Metrics

  ┌─────────────────┬──────────────┬──────────────────────────────────────────────┐
  │    Dimension    │    Status    │                   Summary                    │
  ├─────────────────┼──────────────┼──────────────────────────────────────────────┤
  │ Refcount Safety │ CLEAN        │ 0 confirmed bugs; 634 candidates all false   │
  │                 │              │ positives                                    │
  ├─────────────────┼──────────────┼──────────────────────────────────────────────┤
  │ Error Handling  │ 2 issues     │ 1 POLICY (inverted return convention), 1     │
  │                 │              │ CONSIDER (stale borrowed ref)                │
  ├─────────────────┼──────────────┼──────────────────────────────────────────────┤
  │ NULL Safety     │ 1 FIX        │ Exception clobber in _conversion_converter   │
  ├─────────────────┼──────────────┼──────────────────────────────────────────────┤
  │ GIL /           │ 1 FIX        │ BinaryWriter/BinaryReader missing            │
  │ Free-Threading  │              │ @critical_section                            │
  ├─────────────────┼──────────────┼──────────────────────────────────────────────┤
  │ Complexity      │ 1 FIX, 7     │ codegen_pattern_or PyErr_Clear; 7 hotspot    │
  │                 │ CONSIDER     │ refactoring targets                          │
  ├─────────────────┼──────────────┼──────────────────────────────────────────────┤
  │ PEP 7 Style     │ ~48 FIX      │ Brace placement, pointer spacing, keyword    │
  │                 │              │ spacing in _remote_debugging/                │
  ├─────────────────┼──────────────┼──────────────────────────────────────────────┤
  │                 │ 4 CONSIDER,  │ ob_type mutation, uppercase PyMem, 3         │
  │ API Deprecation │ 2 POLICY     │ single-phase-init modules, ~150              │
  │                 │              │ PyArg_ParseTuple sites                       │
  ├─────────────────┼──────────────┼──────────────────────────────────────────────┤
  │ Macro Hygiene   │ 1 FIX, 4     │ pyexpat.h missing guard; namespace           │
  │                 │ CONSIDER     │ pollution; latent precedence trap            │
  ├─────────────────┼──────────────┼──────────────────────────────────────────────┤
  │ Memory Patterns │ 1 FIX, 1     │ b_exceptstack OOM leak in                    │
  │                 │ CONSIDER     │ label_exception_targets; integer overflow    │
  ├─────────────────┼──────────────┼──────────────────────────────────────────────┤
  │ Include Graph   │ CLEAN        │ Three-tier API boundary enforced at compile  │
  │                 │              │ time; 294/295 headers guarded                │
  └─────────────────┴──────────────┴──────────────────────────────────────────────┘

  ---
  Findings by Priority

  Must Fix (FIX) — Correctness Bugs

  F1. Objects/interpolationobject.c:22-28 — _conversion_converter clobbers prior
  exception
  - Agents: null-safety-scanner, git-history-analyzer
  - Bug: PyUnicode_AsUTF8AndSize(arg, &len) can return NULL (e.g., for lone
  surrogates). When it does, len == -1 prevents a NULL deref, but the code falls into
  PyErr_SetString(PyExc_ValueError, ...) which overwrites the genuine
  UnicodeEncodeError with a misleading "must be one of 's', 'a' or 'r'".
  - Fix: Add if (conv_str == NULL) { return 0; } before the len != 1 check.
  - Context: Introduced 2025-04-30, zero fixes in 13 months, zero tests for surrogate
  inputs.

  F2. Modules/_remote_debugging/module.c — BinaryWriter/BinaryReader missing
  @critical_section in Py_MOD_GIL_NOT_USED module
  - Agents: gil-discipline-checker, git-history-analyzer
  - Bug: The module declares Py_MOD_GIL_NOT_USED. All RemoteUnwinder and GCMonitor
  methods carry @critical_section; all BinaryWriter/BinaryReader methods do not. In
  free-threaded builds: concurrent close()+write_sample() produces a TOCTOU
  use-after-free; concurrent write_sample() calls corrupt the string table and write
  buffer.
  - Fix: Add @critical_section to clinic input for write_sample, finalize, close,
  __exit__, get_stats (BinaryWriter) and replay, get_info, get_stats, close
  (BinaryReader).
  - Context: The hardening commit a5be25d3b (2026-05-25, 6 days before HEAD) explicitly
  targeted safety yet missed this gap.

  F3. Python/codegen.c:6379 — codegen_pattern_or PyErr_Clear conflates error/not-found
  - Agents: c-complexity-analyzer, git-history-analyzer
  - Bug: PySequence_Index returns -1 both for "not found" (sets ValueError) and for
  genuine errors (sets any exception). The subsequent PyErr_Clear() silently swallows
  real errors (e.g., TypeError from a comparison).
  - Fix: Use PyErr_ExceptionMatches(PyExc_ValueError) before clearing, or replace
  PySequence_Index with an explicit search loop.
  - Context: Ported from compile.c in the Sept 2024 split, never independently
  reviewed; codegen.c has had 11 fix-type commits in 2026 alone.

  F4. Python/flowgraph.c:929-950 — label_exception_targets leaks b_exceptstack on OOM
  - Agents: memory-pattern-analyzer, git-history-analyzer
  - Bug: When copy_except_stack() fails mid-loop, goto error frees only todo_stack and
  except_stack. Any b_exceptstack pointers already assigned to previously-queued
  basicblocks are leaked. The same recurring pattern as d3c54f378 (fixed Feb 2025).
  - Fix: Walk the entryblock chain at the error: label and PyMem_Free(b->b_exceptstack)
  for each block.
  - Context: Ancient code predating 2024 history; flowgraph.c has had 13 fix-type
  commits since 2024-01-01.

  F5. Include/pyexpat.h — missing include guard (struct redefinition risk)
  - Agents: macro-hygiene-reviewer, include-graph-mapper, git-history-analyzer
  - Bug: The sole unguarded public-API header. Including it twice in the same TU causes
  a hard compilation error (struct PyExpat_CAPI redefined). The file has had 3
  security-driven commits in 2025 (CVE-related Expat API additions) without adding a
  guard.
  - Fix: Wrap with #ifndef Py_PYEXPAT_H / #define Py_PYEXPAT_H / #endif.

  F6. Modules/_remote_debugging/ — ~48 PEP 7 violations (FIX-grade style)
  - Agent: pep7-style-checker
  - Issues: ~34 function opening-brace placement errors (brace on same line as closing
  paren instead of own line); ~11 PyObject* pointer-asterisk spacing; 3 if(
  keyword-space violations.
  - Files: asyncio.c, frames.c, threads.c, interpreters.c, object_reading.c, module.c.
  - Fix: Mechanical — brace position and spacing corrections. All in the same
  recently-added module.

  ---
  Should Consider (CONSIDER)

  #: C1
  Location: Modules/_remote_debugging/asyncio.c:639-642
  Issue: Borrowed ref used after DECREF (stale pointer risk)
  Note: result_item decreffed then dereferenced; safe today due to list ownership but
    fragile
  ────────────────────────────────────────
  #: C2
  Location: Python/ceval.c:587
  Issue: PyTuple_New(0) not NULL-checked
  Note: Effectively infallible (empty-tuple singleton), but violates coding standard
  ────────────────────────────────────────
  #: C3
  Location: Python/flowgraph.c:650
  Issue: Signed integer overflow in (max_label + 1) size calculation
  Note: Theoretical UB; cast max_label to size_t before addition
  ────────────────────────────────────────
  #: C4
  Location: Python/marshal.c:1172 — r_object
  Issue: 519-line, 146-cyclomatic; extract r_code_object() arm
  Note: Would reduce score 7.2→5.5
  ────────────────────────────────────────
  #: C5
  Location: Python/symtable.c:1889 — symtable_visit_stmt
  Issue: FunctionDef/AsyncFunctionDef duplication; extract symtable_visit_function()
  Note: Score 6.6→4.5
  ────────────────────────────────────────
  #: C6
  Location: Objects/longobject.c:4966 — long_pow
  Issue: Three algorithms in one function; extract binary/kary helpers
  Note: Score 5.7→4.0
  ────────────────────────────────────────
  #: C7
  Location: Modules/_remote_debugging/binary_io_reader.c:980 — binary_reader_replay
  Issue: Depth-7 nesting; extract STACK_REPEAT/STACK_FRAME arms
  Note: Highest nesting in non-generated code
  ────────────────────────────────────────
  #: C8
  Location: Python/specialize.c:1930 — binary_op_fail_kind
  Issue: NB_SUBSCR arm (50 lines) categorically different from other arms
  Note: Extract binary_op_subscr_fail_kind()
  ────────────────────────────────────────
  #: C9
  Location: Python/slots.c:114 — _PySlotIterator_Next
  Issue: Phase-mixing in new slots infrastructure
  Note: Decompose into try_pop/convert/resolve helpers
  ────────────────────────────────────────
  #: C10
  Location: Objects/setobject.c:1554
  Issue: set->ob_type = &PyFrozenSet_Type should use Py_SET_TYPE()
  Note: Unsafe under free-threading (missing atomic write)
  ────────────────────────────────────────
  #: C11
  Location: Modules/_remote_debugging/binary_io.h:183
  Issue: UNLIKELY/LIKELY bare names in header (namespace pollution)
  Note: Use REMOTE_DEBUG_UNLIKELY already defined in _remote_debugging.h
  ────────────────────────────────────────
  #: C12
  Location: Modules/_remote_debugging/_remote_debugging.h:152
  Issue: MAX bare name, double-eval; reinvents Py_MAX
  Note: Replace with Py_MAX
  ────────────────────────────────────────
  #: C13
  Location: Objects/unicodeobject.c:909
  Issue: BLOOM macro missing parens on mask argument (latent precedence trap)
  Note: Add (mask)
  ────────────────────────────────────────
  #: C14
  Location: Python/ast_preprocess.c:408-424
  Issue: CALL, CALL_OPT, CALL_SEQ macros missing do-while
  Note: Dangling-else trap if called in bare if
  ────────────────────────────────────────
  #: C15
  Location: ~30 sites in 13 files
  Issue: PyMem_NEW/PyMem_FREE/PyMem_MALLOC uppercase deprecated macros
  Note: Mechanical replacement with lowercase equivalents
  ────────────────────────────────────────
  #: C16
  Location: Modules/_ctypes/_ctypes.c:2474
  Issue: PyUnicode_InternInPlace — needs mortal/immortal split review
  Note: Decide mortal vs _PyUnicode_InternImmortal

  ---
  Tensions

  - Complexity vs. necessity in _Py_dg_strtod (Python/dtoa.c:1383):
  c-complexity-analyzer flags score 7.5 (highest in non-auto-generated code). This is
  David Gay's strtod — the complexity is intrinsic to high-precision floating-point
  parsing. Do not refactor.
  - templateiter_next GIL safety: gil-discipline-checker flagged per-iterator state
  mutation without a lock as potentially racy. However, template iterators are always
  freshly minted per for loop — sharing one across threads is misuse. Acceptable as-is.
  - PyUnicode_READY / Py_UNICODE: api-deprecation-tracker found zero production callers
  — these have been fully cleaned from the internals. This is a strength, not a
  finding.

  ---
  Policy Decisions (POLICY)

  #: P1
  Issue: Modules/_remote_debugging/code_objects.c:41 — cache_tlbc_array returns 1/0
    instead of 0/-1
  Decision Needed: Normalize to CPython convention (-1 fail, 0 success) or document as
    an intentional API contract
  ────────────────────────────────────────
  #: P2
  Issue: 3 modules still on single-phase init: _tkinter.c, readline.c, _tracemalloc.c
  Decision Needed: Prioritize migration to Py_mod_exec slots; readline.c and
    _tracemalloc.c are Easy-Moderate, _tkinter.c is Complex
  ────────────────────────────────────────
  #: P3
  Issue: ~150 PyArg_ParseTuple/PyArg_ParseTupleAndKeywords sites in ~30 files
  Decision Needed: Continue Argument Clinic migration; highest-value targets:
    _lzmamodule.c, _ctypes/callproc.c, _io/textio.c
  ────────────────────────────────────────
  #: P4
  Issue: Modules/_remote_debugging/ systematic review process
  Decision Needed: The high fix-to-feature ratio and the hardening commit that missed
    both the critical_section gap and the asyncio UAF suggest the module needs a
    structured thread-safety checklist before the next release

  ---
  Strengths

  1. World-class refcount hygiene: 634 scanner candidates, 0 confirmed bugs. Universal
  use of centralized goto fail/done + Py_XDECREF-everything pattern.
  2. Compile-time API tier enforcement: Three-tier Py_BUILD_CORE gating makes silent
  tier violations impossible — the strongest possible boundary guarantee.
  3. Near-universal include guards: 294/295 headers correctly guarded; the sole
  exception (pyexpat.h) is addressable in one commit.
  4. New PEP 750 objects are clean: interpolationobject.c, lazyimportobject.c,
  sentinelobject.c, templateobject.c all pass refcount, GIL, and NULL-safety audits —
  with the single exception of the conversion validator at line 22.
  5. _remote_debugging authors know the patterns: RemoteUnwinder/GCMonitor methods
  correctly apply @critical_section; BinaryWriter/BinaryReader are an oversight rather
  than unawareness.
  6. Active maintenance: The last 50 commits include multiple null-check,
  critical-section, and error-path bug fixes — the community is actively finding and
  fixing the same class of bugs identified here.

  ---
  Recommended Action Plan

  Immediate (before next release)

  1. F1 — Fix _conversion_converter NULL check in Objects/interpolationobject.c:22.
  One-line fix, traceable to a specific Python-reachable bug.
  2. F2 — Add @critical_section to all BinaryWriter/BinaryReader clinic methods. The
  module is Py_MOD_GIL_NOT_USED; this is a free-threaded crash risk.
  3. F3 — Fix codegen_pattern_or:6379 — use PyErr_ExceptionMatches(PyExc_ValueError)
  before the PyErr_Clear().
  4. F4 — Fix label_exception_targets OOM leak — walk entryblock chain at error: label.
  5. F5 — Add include guard to Include/pyexpat.h.

  Short-term (next few sprints)

  6. C10 — Replace set->ob_type = ... with Py_SET_TYPE() in setobject.c:1554 (one line;
  important for free-threading).
  7. C1 — Fix asyncio.c:639-642 borrowed ref — acquire Py_NewRef before the Py_DECREF.
  8. F6 — PEP 7 fixes in _remote_debugging/: batch brace-placement and pointer-spacing
  corrections.
  9. P1 — Normalize cache_tlbc_array return convention.
  10. C11–C14 — Macro hygiene fixes in _remote_debugging/ headers and ast_preprocess.c.
  11. C15 — Mechanical uppercase-PyMem replacement across 13 files.

  Ongoing

  12. P2 — Track readline.c/_tracemalloc.c single-phase-init migration as dedicated
  issues.
  13. P3 — Continue Argument Clinic migration; target _lzmamodule.c and
  _ctypes/callproc.c first.
  14. P4 — Establish a thread-safety checklist for _remote_debugging/ contributions,
  given its 28:20 fix-to-feature ratio.
  15. C4–C9 — Complexity refactoring of r_object, symtable_visit_stmt, long_pow,
  binary_reader_replay — plan as separate cleanup PRs when touching those files.

In my experience of AI tools it is not helpful to just pass on the reports of “analysis” agents, they must at least go through an initial adversarial review, be required to produce a minimum reproducible example, and go through another round of adversarial review that what’s produced is actually a bug, before it’s worth someone looking at.

At which point it is most helpful to review the output yourself, trying to understand it, and then you can raise the produced bug report as regular issue.

Moving this to the general Help category as it’s not an idea for the Python project, but a dump of LLM output.

You may find it interesting to follow what the linux kernel people are doing with AI to automate review patches.

For example this report https://lwn.net/Articles/1073583/ (should be free to read 2026/06/02).

Using Claude Code to look for issues within CPython?

A lot of people have done (and do do) that. Those findings also often have a habit of ending up in GHSAs or security@python.org. Before reporting issues that an LLM identifies as vulnerabilities, please confirm that they actually meet the criteria in our security policy: https://devguide.python.org/security/policy/