Sampling mode for tracemalloc

Tracemalloc tracks every allocation, which can be useful for debugging purposes, but imposes a high cost both in CPU (to collect the stacktrace for every allocation) and Memory (to store tracking metadata for every live object). In many cases, this overhead is unnecessary, and a statistical sample would be sufficient to explain both high memory consumption, as well as memory leaks.

I propose to add a poisson sampling mode to tracemalloc. In the common case, allocations would not be sampled, which means the CPU cost of tracemalloc would be just an increment and a comparison, while the additional memory cost would be 0. In cases where sampling does occur, the cost would be the same as before. The tracemalloc metadata would need to add an additional “weight” field to track the attributed weight of an allocation. In pseudocode:

static void*
tracemalloc_alloc(int need_gil, int use_calloc,
                  void *ctx, size_t nelem, size_t elsize)
{
	...
	bytes_since_last_sample += nelem * elsize;
	if (bytes_since_last_sample > threshold) {
		// do sampling logic
		bytes_since_last_sample = 0;
		threshold = new_poisson_threshold();
	}
	...
}

This is not a new idea: Go uses this to enable high-performance memory profiling in production go-profiler-notes/guide/README.md at main · DataDog/go-profiler-notes · GitHub . Is this something the Python community would be interested in?

5 Likes

We have implemented (but not yet merged) this feature in PyPy’s profiler. I think it would make lots of sense to have it in tracemalloc too.

There’s also prior work in Ocaml.

2 Likes

Awesome. I will work on adding this to tracemalloc then.

One question: I’ve been thinking about what the API we would want for e.g. setting sampling rates, how to report nominal vs upscaled object sizes/counts, etc. Ideally, the CPython and PyPy APIs would be similar if they can be.

Thanks!

I created an issue to track work on this Sampling mode for tracemalloc · Issue #150494 · python/cpython · GitHub