Baseline System for Sanitizer Warnings to Prevent Regressions

TL;DR

Add a baseline tracking system for sanitizer warnings (ASan, TSan, MSan, UBSan) that blocks PRs from introducing new issues while allowing incremental fixes. Similar to how ./python -m test -R works for refleaks, but for sanitizer output.

Motivation

CPython already supports building with sanitizers (--with-address-sanitizer, etc.), and they’re incredibly valuable for catching memory errors, undefined behavior, and race conditions. However, there’s currently no systematic way to:

  • Prevent regressions: A PR might introduce a new leak or UB without anyone noticing

  • Track progress: We can’t easily see if we’re improving or regressing over time

  • Incentivize fixes: There’s no clear “win” when someone fixes an existing sanitizer warning

This means sanitizer findings can silently accumulate, and we lose the benefit of having these tools in the first place.

Real-World Context

Similar systems exist in other large projects:

  • Chromium: Uses LSan suppression files + enforcement in CI

  • Android: Has hwasan baseline tracking

  • Linux kernel: Uses various sanitizer suppressions

CPython already has the -R flag for refleak testing, which works great! This proposal extends that philosophy to sanitizer warnings.

Proposed Solution

Core Idea

  1. Capture baseline: Run sanitizers on main branch, record all existing warnings/leaks

  2. Store fingerprints: Save normalized stack traces and issue signatures (JSON/YAML format)

  3. CI enforcement: On every PR, compare sanitizer output against baseline

    • :white_check_mark: Pass: Same or fewer issues

    • :cross_mark: Fail: New issues detected

    • :tada: Bonus: Automatic detection when someone fixes issues

  4. Easy updates: When leaks are fixed, baseline updates in the same PR

Example Workflow

# PR introduces code changes
$ git push origin my-feature

# CI runs sanitizer builds:
# ❌ ASan detected 1 new leak in Modules/parser.c:456
#    Stack trace: [normalized fingerprint]
# ✅ Fixed 2 existing leaks in Objects/dictobject.c
# 
# Net: -1 leak, but 1 new issue blocks merge

# Developer fixes the new leak
$ git commit -m "Fix memory leak in parser"
$ git push

# CI runs again and now passes:
# ✅ All baselines pass
# 🎉 Net improvement: -3 leaks total
# Please run: ./Tools/scripts/update_sanitizer_baseline.py

Configuration Matrix

Start with high-priority combinations, expand over time:

Phase 1 (Essential):

  • Clang + default build + ASan

  • Clang + default build + UBSan

  • GCC + default build + ASan

  • Clang + --disable-gil + TSan

Phase 2 (Expand):

  • Add --enable-experimental-jit configurations

  • Add --enable-optimizations, --with-lto

  • Add MSan (though it’s tricky with dependencies)

This avoids the explosion to 48+ configs while covering the most important cases.

Implementation Sketch

1. Baseline File Format

{
  "config": "clang-asan-default",
  "cpython_version": "3.14.0a1",
  "date": "2025-01-15",
  "issues": [
    {
      "type": "leak",
      "fingerprint": "sha256:abc123...",
      "location": "Objects/dictobject.c:123",
      "stack_trace_hash": "sha256:def456...",
      "first_seen": "2024-11-20"
    }
  ]
}

2. Tooling

# Generate/update baseline (maintainers only)
$ ./python Tools/scripts/sanitizer_baseline.py generate

# Check against baseline (CI + local)
$ ./python Tools/scripts/sanitizer_baseline.py check

# Show diff
$ ./python Tools/scripts/sanitizer_baseline.py diff

3. CI Integration

Add to GitHub Actions:

  • Build with each sanitizer config

  • Run test suite

  • Compare output to baseline

  • Report as status check

  • Comment on PR with summary

Questions for the Community

I’d love feedback on:

  1. Baseline storage: Keep in repo (.github/sanitizer-baselines/) or external? Files might be large.

  2. Update workflow: Should baseline updates be:

    • Same PR as the fix (convenient but clutters history)?

    • Separate automated PR (cleaner but more overhead)?

    • Manual by core devs only?

  3. CI resources: How much extra CI time is acceptable? Each sanitizer build adds ~30-60 min.

    • Should we run on every PR or only on merge to main?

    • Could we use GitHub’s larger runners for this?

  4. False positives: How to handle flaky warnings?

    • Should we have a separate suppression file?

    • Grace period for new warnings?

  5. External dependencies: OpenSSL, libffi, etc. may have their own issues. Should we:

    • Use suppression files for external code?

    • Include them in baseline?

    • Try to isolate CPython-specific issues only?

  6. Existing pain points: For those who already use sanitizers with CPython:

    • What issues do you encounter?

    • Would this system help your workflow?

    • What am I missing?

Next Steps

If there’s interest, I’m happy to:

  1. Create a proof-of-concept for 2-3 configurations

  2. Open a GitHub issue for detailed technical discussion

  3. Prototype the tooling

  4. Measure actual CI impact

I think this could significantly improve CPython’s code quality and make sanitizers actually useful in practice rather than just “nice to have” tools that nobody checks regularly.

Thoughts? Is this solving a real problem you’ve experienced? Any major concerns?


Note: I’m happy to contribute the implementation if there’s consensus this is worth doing.

1 Like

Apart from the fact that this seems to be written by AI (please correct me if I’m wrong, it but it is quite suspicious, e.g. all the emojis, and the questions at the end), how do you justify the 1/2-1 hour of extra time per commit (tbh I wonder where you’ve got these numbers from)?

1 Like

Our CPython CI already runs sanitizers and a PR cannot be merged if any fail. I don’t know which and what constitutes a fail, but any proposal should start from these.

1 Like

Right, there are already ASAN, UBSAN and TSAN GitHub Actions:

ASAN buildbots:

UBSAN buildbots:

3 Likes

As I can see, AMD64 Arch Linux Asan 3.x and AMD64 Arch Linux Asan Debug 3.x buildbots both use gcc as a compiler. But with clang there is a list of problems, see Memory leak in test__interpchannels: _PyXIData_New not freed in channel_send · Issue #140306 · python/cpython · GitHub for example.

There are 16 failed tests on main with Asan Debug clang build on my machine (Debian 12 x86_64). I’ve used this configure command:

CC=clang CXX=clang++ ./configure --with-address-sanitizer --with-pydebug