TL;DR: Would CPython like a _valgrind extension module to control Callgrind instrumentation?
Motivation
I work at Meta on the Cinder team. Before that, I worked on Skybison. We do a lot of profiling and sometimes use Callgrind as a way to measure performance.
Sometimes a benchmark has some warmup before it reaches a steady state, or has multiple iterations that we would like to separate out (like single web server requests), and we would like to instrument only specific parts of the benchmark. With the default callgrind mode, this is not possible: it starts profiling at process start and ends at process end. It’s possible to pass --instr-atstart=no, but then you need to figure out how to start it at some point.
Module surface area
Skybison and Cinder both have a _valgrind module that allows fine-grained control over Callgrind instrumentation. It contains four helper functions:
callgrind_start_instrumentation, to start profiling
callgrind_stop_instrumentation, to stop profiling
callgrind_dump_stats, to dump all the profile data gathered between a
start/stop
callgrind_zero_stats, to zero out the counters
We can add calls to these functions inside our benchmarks (like in our small Django benchmark) to get better visibility. With this, we get cg.0, cg.1, etc for each request.
I am wondering if CPython would like this module to be upstreamed.
Implementation details
We need to either vendor Valgrind headers or make this an optional module that uses system Valgrind headers (if available). Cinder and Skybison have chosen to vendor Valgrind headers.
Does it need to be used in situations where it couldn’t be installed specifically?
Alternatively, are there competing implementations that don’t work well together (or are unnecessary duplication) where we ought to pick a winner?
Finally, does it rely on access to CPython’s internals in a way that can’t be covered by public APIs?
I think these are the main reasons we’d adopt a module into the stdlib there days, and I suspect this doesn’t meet the bar. Though if it’s being used by multiple alternative implementations who directly use the stdlib without modification, then, perhaps…
Maybe it makes more sense to be included with Callgrind? Should only take a PYTHONPATH setting to make it importable, which also doubles as an “am I running under instrumentation” test, and there shouldn’t really be anything wrong with Callgrind updating PYTHONPATH itself to make it transparent.
Probably not. I published a PyPI package in the meantime.
No, to all of the above. I could imagine some cases where access to CPython internals might be helpful (like if we want to separate managed and native code, for example), but as described above, no.
Huh. Hadn’t thought about that. I think it would probably need to meet some high usage bar for a Python-specific thing to be included in Valgrind, but that’s just speculation on my part.
For me, it perfectly makes sense to maintain such module as a 3rd party project. If tomorrow it requires a feature hidden in Python internals, we can add a private function, or even a public function, if needed.