Precision about tracemalloc and memory consumption with python

MartinV · October 18, 2022, 8:11am

I don’t exactly understand how python use memory, and what Python gives with the tracemalloc builtins. Let’s take this exemple:

import tracemalloc
import gc
L=[1,2,3]
tracemalloc.start()
gc.collect()

L += [4]
a = L[:]
b = [_ for _ in L]
del(L)

snapshot = tracemalloc.take_snapshot()   # 1° way to get memory consumption
top_stats = snapshot.statistics('traceback')
for stat in top_stats:
    print(stat)

#print(tracemalloc.get_traced_memory())  # 2° way to get memory consumption

tracemalloc.stop()

With the 1° method I get:
D:\file.py:13: size=504 B, count=3, average=168 B → for ‘b’
D:\file.py:20: size=416 B, count=1, average=416 B → for ‘snapshot’ (useless)
D:\file.py:12: size=88 B, count=2, average=44 B → for ‘a’

and with the 2° one: (592, 784) # (Courant,Peak)

So we have Courant which is the sum of memory for ‘a’ and ‘b’. But Peak > Courant so why don’t we have the additional memory consumptions given with take_snapshot() builtin ? When one uses some functions, take_snapshot() tells how much memory had been use inside. Why not here?

Secondly, about the values in themselves. Why take_snapshot() gives 2 different sizes for ‘a’ and for ‘b’ (88 and 504) ? And we get sys.getsizeof(a)=88 and sys.getsizeof(b)=88. So what does size=504 for ‘b’ mean?

Finally, why L += [4] doesn’t take any memory space? More simply, if one does c=12 or c=12.0 or c=‘azerty’ or c=(1,2) … (every instanciation of not mutable objects, I would say), it’s invisible for both functions of ‘tracemalloc’. However it isn’t for sys.getsizeof(.)!

So I don’t understand how I can found the memory consumption (Courant, Peak and line-by-line).

My aim is to compare different script to see which one is the less expensive in memory. So, I’m asking for the memory specifically used by a variable (to compare the memory consumption for the variables at the end of my script), but also for the memory used during the script execution (so the peak or the line-by-line consumption)

Thanks for your answers!

barry-scott · October 18, 2022, 6:33pm

If you want to know what the cost in memory will be of a script so that you can minimise the memory used for the process I would not use tracemalloc.

I would ask the OS what the memory used is. This is exactly what I use for tracking memory use of long running servers processes written in python. I’m tracking process sizes of GiB’s.
We kill and replace processes that get too big.

On linux looking in /proc/$PID/status is a starting point. If you run lots of proccess then there are considerations of PSS that will be important. But for a single process status is a reasonable way to find the process size.

MartinV · October 19, 2022, 8:32am

Thanks for answering so quickly!
I’m not on linux but on windows, and not familiar with this kind of solution you propose (I’m not a specialist in programming and computer science). Maybe you can make clear your answer ?
But otherwise why should I not use tracemalloc for this ? I specify that I don’t need a very precise result : I have some enormous data, there are the only one that does really interest me. As long as all results are coherent for comparison, and almost true, it’s enough for me. By the way, if it’s not useful for such, what is it for ?
Thanks for the further details!

barry-scott · October 19, 2022, 6:10pm

Its been a long time since I worked on windows at this level.

If you goal is to find out the impact of the python process running the code on a windows systems then tracemalloc will not tell you about all the memory that is commited to the process. Its only telling you about one tiny part of the whole.

I supose you could look in task manager and see what it says the memory size of the python process is.

I’m not sure use case is for tracemalloc.

vbrozik · October 19, 2022, 6:23pm

More detailed than Task Manager could be Process Explorer: