Allpcation of memory for large matrix

Matrixman · January 17, 2024, 10:48pm

Hi
I am looking to create a large matrix of integers which will be added to / subtracted from often.
To achieve an acceptable performance I need to ensure my system has sufficient RAM to hold the complete matrix.
I have read quite a lot that because integers are ‘immutable’ Python allocates a second memory location to the changed number.
Do I need to size my system accordingly- IE double the number of integers.
Does the same apply to ‘float’
Thanks.

Matrixman · January 17, 2024, 10:57pm

To avoid confusion - added to / subtracted from refers to the arithmetic operations on individual integers, not the size of the matrix.

sgrey · January 17, 2024, 10:58pm

How large is your matrix actually going to be? Is it thousands by thousands, tens of thousands by tens of thousands? Or maybe hundreds of thousands dimension? How much RAM does your system have? And yes, the same applies to floats
I suggest you use a library for scientific computing which has optimized operations for such cases. Scipy is probably good.

onePythonUser · January 17, 2024, 10:59pm

Hi,

if you are worried about RAM space (though today’s RAM are pretty large nowadays … a few gig), consider working with Generator Expressions. You won’t need to hold the ENTIRE matrix but slices at a time.

jamestwebber · January 17, 2024, 11:06pm

The details here are going to matter a lot. As was already asked: the size of the matrix is important. Also, how sparse is your matrix (that is, how many entries are not zero)? There are nice packages for working with sparse arrays (sparse is the name of one I like, scipy also has support). You can fit a massive sparse array in memory.

If sparsity is above 10% or so you really just need that much memory, or you need to work out-of-core with some kind of disk based solution. That could be as simple as a sqlite database.

jamestwebber · January 17, 2024, 11:09pm

To answer the question you asked: you should test, but I’d plan on basically 8 times the number of integers you need to store, in bytes of RAM, if you use a dense array of 64 ints. If you know your counts are low enough you can use a smaller data type.

Matrixman · January 17, 2024, 11:09pm

Thanks Sergey.
Initially a 1000 by 1000 by 1000 matrix of 64 bit integers for a model I am developing.
I currently have 64Gb installed and am working out what I need to buy to achieve results
I have been using Scipi on some AI stuff and while Scipi handles this OK it does churn the SSD a lot.
I was hoping to avoid that by buying second RAM card

Stefan2 · January 17, 2024, 11:10pm

How large are the numbers? And how do you intend to add/subtract so that you’d need double the whole thing? If you just replace the elements one by one, you’d only temporarily double the memory of one element.

Matrixman · January 17, 2024, 11:16pm

The model I am creating calculates each element based on the contents of its neighbours, so every element is recalculated on every iteration of the matrix, so if the original nu,bers cannot be changed then I need a second set of memory locations for each element, or so I thought.
I assume immutablilty means that the original number cannot be changed at all, so every += creates a new location.
Te numbers can go up to 2^63

Ian

sgrey · January 17, 2024, 11:18pm

I think for this size you should be fine with 64 GB of RAM. It’s not that large. I suggest you try and see how it is. And also, you should be perform computation on a different disk if you have one, not SSD. Set up cache or swap file somewhere else and that should do it.
As for your physical RAM question, it might not be that simple to add more RAM. If you are on a laptop, it might be the limit or you might have to by a new full 128 GB of RAM because you have your slots full.
On the desktop it’s similar. If you have 2x32 GB sticks and have 2 free slots, you can easily add it. Or if your motherboards supports 64GB sticks and you have space, you can add them. Basically depends on what you have and your hardware limitations, it might or might not be easy or even possible to to add more.

The model I am creating calculates each element based on the contents of its neighbours, so every element is recalculated on every iteration of the matrix, so if the original nu,bers cannot be changed then I need a second set of memory locations for each element, or so I thought.
I assume immutablilty means that the original number cannot be changed at all, so every += creates a new location.
Te numbers can go up to 2^63

You definitely want to use a scientific computation library, if not scipy, then something else. Is this machine learning project, or just some math project? Perhaps you want to use a different language that is more memory efficient for this. Maybe even do vector optimizations.

Matrixman · January 17, 2024, 11:20pm

OK, thanks for your help.

mel1 · December 8, 2024, 1:12pm

I am trying to save a matrix of size 300000000*3000. I tried to use zipped .h5 format but even that version takes significant amount of space (10TB). I changed all dtypes to the most optimal ones. Do you have any suggestions to save such a large non-sparse matrix in a memory efficient way?

jamestwebber · December 8, 2024, 6:32pm

For that size of data I would look into distributed/out-of-core methods. Specifically dask for processing and zarr for storage.

You don’t necessarily need a cluster or anything like that: you can have a bunch of workers running on a single machine and iterate over chunks of the data that are small enough to keep in memory.