Allpcation of memory for large matrix

Hi
I am looking to create a large matrix of integers which will be added to / subtracted from often.
To achieve an acceptable performance I need to ensure my system has sufficient RAM to hold the complete matrix.
I have read quite a lot that because integers are ‘immutable’ Python allocates a second memory location to the changed number.
Do I need to size my system accordingly- IE double the number of integers.
Does the same apply to ‘float’
Thanks.

To avoid confusion - added to / subtracted from refers to the arithmetic operations on individual integers, not the size of the matrix.

How large is your matrix actually going to be? Is it thousands by thousands, tens of thousands by tens of thousands? Or maybe hundreds of thousands dimension? How much RAM does your system have? And yes, the same applies to floats
I suggest you use a library for scientific computing which has optimized operations for such cases. Scipy is probably good.

Hi,

if you are worried about RAM space (though today’s RAM are pretty large nowadays … a few gig), consider working with Generator Expressions. You won’t need to hold the ENTIRE matrix but slices at a time.

The details here are going to matter a lot. As was already asked: the size of the matrix is important. Also, how sparse is your matrix (that is, how many entries are not zero)? There are nice packages for working with sparse arrays (sparse is the name of one I like, scipy also has support). You can fit a massive sparse array in memory.

If sparsity is above 10% or so you really just need that much memory, or you need to work out-of-core with some kind of disk based solution. That could be as simple as a sqlite database.

To answer the question you asked: you should test, but I’d plan on basically 8 times the number of integers you need to store, in bytes of RAM, if you use a dense array of 64 ints. If you know your counts are low enough you can use a smaller data type.

Thanks Sergey.
Initially a 1000 by 1000 by 1000 matrix of 64 bit integers for a model I am developing.
I currently have 64Gb installed and am working out what I need to buy to achieve results
I have been using Scipi on some AI stuff and while Scipi handles this OK it does churn the SSD a lot.
I was hoping to avoid that by buying second RAM card

How large are the numbers? And how do you intend to add/subtract so that you’d need double the whole thing? If you just replace the elements one by one, you’d only temporarily double the memory of one element.

The model I am creating calculates each element based on the contents of its neighbours, so every element is recalculated on every iteration of the matrix, so if the original nu,bers cannot be changed then I need a second set of memory locations for each element, or so I thought.
I assume immutablilty means that the original number cannot be changed at all, so every += creates a new location.
Te numbers can go up to 2^63

Ian

I think for this size you should be fine with 64 GB of RAM. It’s not that large. I suggest you try and see how it is. And also, you should be perform computation on a different disk if you have one, not SSD. Set up cache or swap file somewhere else and that should do it.
As for your physical RAM question, it might not be that simple to add more RAM. If you are on a laptop, it might be the limit or you might have to by a new full 128 GB of RAM because you have your slots full.
On the desktop it’s similar. If you have 2x32 GB sticks and have 2 free slots, you can easily add it. Or if your motherboards supports 64GB sticks and you have space, you can add them. Basically depends on what you have and your hardware limitations, it might or might not be easy or even possible to to add more.

The model I am creating calculates each element based on the contents of its neighbours, so every element is recalculated on every iteration of the matrix, so if the original nu,bers cannot be changed then I need a second set of memory locations for each element, or so I thought.
I assume immutablilty means that the original number cannot be changed at all, so every += creates a new location.
Te numbers can go up to 2^63

You definitely want to use a scientific computation library, if not scipy, then something else. Is this machine learning project, or just some math project? Perhaps you want to use a different language that is more memory efficient for this. Maybe even do vector optimizations.

1 Like

OK, thanks for your help.