How to get the size of data in a class instance?

Python 3.12 on Windows 10.

I’m not a pro at Python, I’m still new with about 5 months experience.

I have a custom class that is full of options for the program, and many other things. I use it to pass data to different functions as a single parameter. This idea works very well. My instance variable is called options.

While in the debugger I have tried sys.getsizeof(options) but that returns 48, which is far too small. I can use sys.getsizeof(options.df) to show the size of a dataframe within my options variable and that seems like it gives an accurate number.

I did some searching on the web and didn’t find anything useful.

  1. But how do I get the size of all the data in the instance with one command?
  2. It doesn’t need to be super accurate but “close enough” is fine. I would only use this during debugging to see how big the instance is.
  3. This will not have methods, only attributes. The class is used to pass much data between functions in an easy manner.
  4. I was curious because my class instance is now 72MB and I thought there might be a limit on how much I can pass between functions.

Thank you.

Related:

I think only you can give a precise definition of what is “the data in”. If your object has a reference to a dataframe, it looks like you want to count the size of that dataframe. If there are further references in there, do you want to aggregate those sizes too? Maybe for some you do and for some others you don’t. Hard to tell without knowing your specific class, and what is the size that you want to account for.

Thanks for the link. As I mentioned above, os.getsizeof() does not work properly because, from your link,

getsizeof calls the object’s __sizeof__ method

I didn’t put a __sizeof__ method in my custom class.

  1. How would I do that?
  2. How would I loop through each property of the object to get its size?

Ok I just did a web search and didn’t find anything helpful. I think there were a total of 10 pages returned.

In some of the answers to that SO post there are examples. You might need to adapt them to exclude/include the ones that you don’t need or need.

Pympler seems current and should meet my needs. I hope this helps others.

2009 module pympler

It works with Python 3.6 to 3.12. Pympler · PyPI and
How do I determine the size of an object in Python? - Stack Overflow

Chuck,

Python is huge already and keeps growing so there can be very long discussions especially on a not-yet well-defined need.

Adding a dunder method is fairly straightforward. Your object may only happen to have data aspects now, but can have all kinds of methods defined including defining one called sizeof() containing two leading and trailing underscores. How you implement it is up to you but a straightforward method would sequentially keep adding the sizes of the constituents (some may need recursive searches) and return the sum plus perhaps some padding representing the space the object framework uses. If the contents are fixed and won’t change, this can be computed once when the object is created and the result stored, or computed only the first time and stored for subsequent needs and so on. Other approaches may use the internal dictionary to loop on and find what internal sub-objects are available and check their sizes.

I note in some cases, especially in other languages, there are cases where objects or parts of them are shared. An example might be if a dataframe-like object gets changes in one column, to make a new dataframe-like object, and both still exist as valid objects, the remaining columns may be kept in common but in a way that carefully deals with what happens later if one of them makes further changes and so on. This can happen invisibly in languages like R and it may be there are objects you have with similar behavior. So, in building or modifying your custom class, there may be places you want to copy or deep copy so that you can be sure that any storage you are measuring is unique and freeing the object can release all the memory.

And note that often some things are intrinsically shared and there may actually be a memory location containing constants like the digit 1 or “Hello World!” that may not occupy more space when placed in one or more of your class instances other than the “pointer” to them. And, if all your objects share some data, there are many choices in python that allow the data to be stored in more convenient ways including sometimes the class itself as a static part.

My point is not to depend on any sizes you get to be exactly what you think. What you may get is the maximum size a collection of your class instances contains. But taken as a whole, the entire collection of instances of the class can occupy less space. You may benefit from choosing data types that use less space if your data will fit. Sometimes multiple items can be combined for storage and extracted as needed, such as storing lots of Boolean flags into a single character or integer. Speaking of which, python integers can get very large and use more space as needed.

And, if space is a constraint, there are many tradeoffs to consider. For example, how much space does a dictionary use for just 5 items versus some different processing that searches another kind of data structure like a tuple of tuples or just a series of if statements that search and return the values directly. Compact can be a tradeoff for slow.

I did not see any explanation of why you need the info, and that is fine, but if you know the details of when and why you might need to determine such sizes, you may be able to tune the work you do to get it done fast in programmer time, or to have it be efficient in time or space as it runs or even to realize some other approach is what you really need such as compression of text or storing some of the data in a common external area that allows sharing.

Avi

1 Like