Add a _valgrind module to control Callgrind instrumentation

Hi all,

TL;DR: Would CPython like a _valgrind extension module to control Callgrind instrumentation?


I work at Meta on the Cinder team. Before that, I worked on Skybison. We do a lot of profiling and sometimes use Callgrind as a way to measure performance.

$ callgrind ./python
$ kcachegrind

Sometimes a benchmark has some warmup before it reaches a steady state, or has multiple iterations that we would like to separate out (like single web server requests), and we would like to instrument only specific parts of the benchmark. With the default callgrind mode, this is not possible: it starts profiling at process start and ends at process end. It’s possible to pass --instr-atstart=no, but then you need to figure out how to start it at some point.

Module surface area

Skybison and Cinder both have a _valgrind module that allows fine-grained control over Callgrind instrumentation. It contains four helper functions:

  • callgrind_start_instrumentation, to start profiling
  • callgrind_stop_instrumentation, to stop profiling
  • callgrind_dump_stats, to dump all the profile data gathered between a
  • callgrind_zero_stats, to zero out the counters

We can add calls to these functions inside our benchmarks (like in our small Django benchmark) to get better visibility. With this, we get cg.0, cg.1, etc for each request.

I am wondering if CPython would like this module to be upstreamed.

Implementation details

We need to either vendor Valgrind headers or make this an optional module that uses system Valgrind headers (if available). Cinder and Skybison have chosen to vendor Valgrind headers.

Here are the Valgrind headers and _valgrind extension module in Cinder.



Here’s what the use in the Django benchmark looks like.

Does it need to be used in situations where it couldn’t be installed specifically?

Alternatively, are there competing implementations that don’t work well together (or are unnecessary duplication) where we ought to pick a winner?

Finally, does it rely on access to CPython’s internals in a way that can’t be covered by public APIs?

I think these are the main reasons we’d adopt a module into the stdlib there days, and I suspect this doesn’t meet the bar. Though if it’s being used by multiple alternative implementations who directly use the stdlib without modification, then, perhaps…

Maybe it makes more sense to be included with Callgrind? Should only take a PYTHONPATH setting to make it importable, which also doubles as an “am I running under instrumentation” test, and there shouldn’t really be anything wrong with Callgrind updating PYTHONPATH itself to make it transparent.


Probably not. I published a PyPI package in the meantime.

No, to all of the above. I could imagine some cases where access to CPython internals might be helpful (like if we want to separate managed and native code, for example), but as described above, no.

Huh. Hadn’t thought about that. I think it would probably need to meet some high usage bar for a Python-specific thing to be included in Valgrind, but that’s just speculation on my part.

Thanks for the comments.


1 Like

For me, it perfectly makes sense to maintain such module as a 3rd party project. If tomorrow it requires a feature hidden in Python internals, we can add a private function, or even a public function, if needed.


Thanks for the feedback, @steve.dower and @vstinner !