TL;DR
Add a baseline tracking system for sanitizer warnings (ASan, TSan, MSan, UBSan) that blocks PRs from introducing new issues while allowing incremental fixes. Similar to how ./python -m test -R
works for refleaks, but for sanitizer output.
Motivation
CPython already supports building with sanitizers (--with-address-sanitizer
, etc.), and they’re incredibly valuable for catching memory errors, undefined behavior, and race conditions. However, there’s currently no systematic way to:
-
Prevent regressions: A PR might introduce a new leak or UB without anyone noticing
-
Track progress: We can’t easily see if we’re improving or regressing over time
-
Incentivize fixes: There’s no clear “win” when someone fixes an existing sanitizer warning
This means sanitizer findings can silently accumulate, and we lose the benefit of having these tools in the first place.
Real-World Context
Similar systems exist in other large projects:
-
Chromium: Uses LSan suppression files + enforcement in CI
-
Android: Has hwasan baseline tracking
-
Linux kernel: Uses various sanitizer suppressions
CPython already has the -R
flag for refleak testing, which works great! This proposal extends that philosophy to sanitizer warnings.
Proposed Solution
Core Idea
-
Capture baseline: Run sanitizers on main branch, record all existing warnings/leaks
-
Store fingerprints: Save normalized stack traces and issue signatures (JSON/YAML format)
-
CI enforcement: On every PR, compare sanitizer output against baseline
-
Pass: Same or fewer issues
-
Fail: New issues detected
-
Bonus: Automatic detection when someone fixes issues
-
-
Easy updates: When leaks are fixed, baseline updates in the same PR
Example Workflow
# PR introduces code changes
$ git push origin my-feature
# CI runs sanitizer builds:
# ❌ ASan detected 1 new leak in Modules/parser.c:456
# Stack trace: [normalized fingerprint]
# ✅ Fixed 2 existing leaks in Objects/dictobject.c
#
# Net: -1 leak, but 1 new issue blocks merge
# Developer fixes the new leak
$ git commit -m "Fix memory leak in parser"
$ git push
# CI runs again and now passes:
# ✅ All baselines pass
# 🎉 Net improvement: -3 leaks total
# Please run: ./Tools/scripts/update_sanitizer_baseline.py
Configuration Matrix
Start with high-priority combinations, expand over time:
Phase 1 (Essential):
-
Clang + default build + ASan
-
Clang + default build + UBSan
-
GCC + default build + ASan
-
Clang +
--disable-gil
+ TSan
Phase 2 (Expand):
-
Add
--enable-experimental-jit
configurations -
Add
--enable-optimizations
,--with-lto
-
Add MSan (though it’s tricky with dependencies)
This avoids the explosion to 48+ configs while covering the most important cases.
Implementation Sketch
1. Baseline File Format
{
"config": "clang-asan-default",
"cpython_version": "3.14.0a1",
"date": "2025-01-15",
"issues": [
{
"type": "leak",
"fingerprint": "sha256:abc123...",
"location": "Objects/dictobject.c:123",
"stack_trace_hash": "sha256:def456...",
"first_seen": "2024-11-20"
}
]
}
2. Tooling
# Generate/update baseline (maintainers only)
$ ./python Tools/scripts/sanitizer_baseline.py generate
# Check against baseline (CI + local)
$ ./python Tools/scripts/sanitizer_baseline.py check
# Show diff
$ ./python Tools/scripts/sanitizer_baseline.py diff
3. CI Integration
Add to GitHub Actions:
-
Build with each sanitizer config
-
Run test suite
-
Compare output to baseline
-
Report as status check
-
Comment on PR with summary
Questions for the Community
I’d love feedback on:
-
Baseline storage: Keep in repo (
.github/sanitizer-baselines/
) or external? Files might be large. -
Update workflow: Should baseline updates be:
-
Same PR as the fix (convenient but clutters history)?
-
Separate automated PR (cleaner but more overhead)?
-
Manual by core devs only?
-
-
CI resources: How much extra CI time is acceptable? Each sanitizer build adds ~30-60 min.
-
Should we run on every PR or only on merge to main?
-
Could we use GitHub’s larger runners for this?
-
-
False positives: How to handle flaky warnings?
-
Should we have a separate suppression file?
-
Grace period for new warnings?
-
-
External dependencies: OpenSSL, libffi, etc. may have their own issues. Should we:
-
Use suppression files for external code?
-
Include them in baseline?
-
Try to isolate CPython-specific issues only?
-
-
Existing pain points: For those who already use sanitizers with CPython:
-
What issues do you encounter?
-
Would this system help your workflow?
-
What am I missing?
-
Next Steps
If there’s interest, I’m happy to:
-
Create a proof-of-concept for 2-3 configurations
-
Open a GitHub issue for detailed technical discussion
-
Prototype the tooling
-
Measure actual CI impact
I think this could significantly improve CPython’s code quality and make sanitizers actually useful in practice rather than just “nice to have” tools that nobody checks regularly.
Thoughts? Is this solving a real problem you’ve experienced? Any major concerns?
Note: I’m happy to contribute the implementation if there’s consensus this is worth doing.