Tracemalloc tracks every allocation, which can be useful for debugging purposes, but imposes a high cost both in CPU (to collect the stacktrace for every allocation) and Memory (to store tracking metadata for every live object). In many cases, this overhead is unnecessary, and a statistical sample would be sufficient to explain both high memory consumption, as well as memory leaks.
I propose to add a poisson sampling mode to tracemalloc. In the common case, allocations would not be sampled, which means the CPU cost of tracemalloc would be just an increment and a comparison, while the additional memory cost would be 0. In cases where sampling does occur, the cost would be the same as before. The tracemalloc metadata would need to add an additional “weight” field to track the attributed weight of an allocation. In pseudocode:
static void*
tracemalloc_alloc(int need_gil, int use_calloc,
void *ctx, size_t nelem, size_t elsize)
{
...
bytes_since_last_sample += nelem * elsize;
if (bytes_since_last_sample > threshold) {
// do sampling logic
bytes_since_last_sample = 0;
threshold = new_poisson_threshold();
}
...
}
This is not a new idea: Go uses this to enable high-performance memory profiling in production go-profiler-notes/guide/README.md at main · DataDog/go-profiler-notes · GitHub . Is this something the Python community would be interested in?