MemoryError despite having enough RAM

grosen · February 14, 2025, 6:28pm

Hello,

I am trying to run a command using python 64bit. I am running into the following memory error: MemoryError: Unable to allocate 21.2 GiB for an array with shape (105284, 53946) and data type uint32

My computer has 64 GB of RAM and only has about half of it committed as displayed in the task manager performance tab. I also tried it on a computer with 32 GB of RAM and got the same 21.2 GiB error. I have also already tried setting the priority of python to high, as well as the priority of the IDE. I get the error with both the spyder IDE and the default IDLE that comes with python. I am running 3.12.

Thanks for considering, fingers crossed someone can help me analyze some awesome data!

Glitched_Assassin · February 14, 2025, 8:00pm

can you send the code

grosen · February 14, 2025, 8:15pm

Sure, the code up to the point of error is:

from spatialdata_io import xenium
xenium_path = r"C:\Users\grosen\Xenium\redo\output1"
zarr_path = r"C:\Users\grosen\Xenium\redo\output1\xenium.zarr"
sdata = xenium(xenium_path)

Glitched_Assassin · February 14, 2025, 9:01pm

do you know if this files or modules are big files or do a lot of things

grosen · February 14, 2025, 9:44pm

The files are big, but not so big that the RAM shouldn’t be able to handle it.

jamestwebber · February 14, 2025, 9:58pm

It’s possible that the error is occurring after the memory is already in use–i.e. maybe it’s making copies of this giant array and by the third or fourth time it runs out.

It might be hard to monitor this because it could happen quickly.

Is this someone else’s tool or your own code? I would be looking into using a sparse array for this type of data, although zarr is great for storage (I’m actually very familiar with spatial scrna data!)

grosen · February 14, 2025, 10:30pm

Hello fellow scrna scientist! This is somebody else’s code, and I’m very new to this. It had worked just fine with the example datasets online, but they were smaller. I will look into sparse arrays. I’m assuming that would involve modifying the xenium.py code?

jamestwebber · February 15, 2025, 1:35am

Yes this would involve modifying the internals. But it’s hard to say if that’s really the solution without knowing how it works. Is this an open source package?

grosen · February 15, 2025, 3:20pm

Yes! It is available here: GitHub - scverse/spatialdata-io
I’d love to know your thought process if you take a look. I’m coming at this with more background in wet lab and really want to learn so I can become more of an expert at both.

jamestwebber · February 15, 2025, 4:57pm

I believe the relevant i/o is in this file which seems to be constructing a sparse array, as it should. Can you post the traceback for the exception, which shows the precise line that raised an error?

grosen · February 15, 2025, 8:36pm

Sure, here is the traceback. It matches up with [this code] which uses the i/o you’ve linked. (spatialdata-io/src/spatialdata_io/readers/xenium.py at main · scverse/spatialdata-io · GitHub).

MemoryError                               Traceback (most recent call last)
Cell In[81], line 1
----> 1 sdata = xenium(xenium_path)

File ~\AppData\Local\Programs\Python\Python312\Lib\site-packages\spatialdata_io\_utils.py:47, in deprecation_alias.<locals>.deprecation_decorator.<locals>.wrapper(*args, **kwargs)
     45 class_name = f.__qualname__
     46 rename_kwargs(f.__name__, kwargs, aliases, class_name)
---> 47 return f(*args, **kwargs)

File ~\AppData\Local\Programs\Python\Python312\Lib\site-packages\spatialdata_io\readers\xenium.py:219, in xenium(path, cells_boundaries, nucleus_boundaries, cells_as_circles, cells_labels, nucleus_labels, transcripts, morphology_mip, morphology_focus, aligned_images, cells_table, n_jobs, imread_kwargs, image_models_kwargs, labels_models_kwargs)
    212 # From the public release notes here:
    213 # https://www.10xgenomics.com/support/software/xenium-onboard-analysis/latest/release-notes/release-notes-for-xoa
    214 # we see that for distinguishing between the nuclei of polinucleated cells, the `label_id` column is used.
    215 # This column is currently not found in the preview data, while I think it is needed in order to unambiguously match
    216 # nuclei to cells. Therefore for the moment we only link the table to the cell labels, and not to the nucleus
    217 # labels.
    218 if nucleus_labels:
--> 219     labels["nucleus_labels"], _ = _get_labels_and_indices_mapping(
    220         path,
    221         XeniumKeys.CELLS_ZARR,
    222         specs,
    223         mask_index=0,
    224         labels_name="nucleus_labels",
    225         labels_models_kwargs=labels_models_kwargs,
    226     )
    227 if cells_labels:
    228     labels["cell_labels"], cell_labels_indices_mapping = _get_labels_and_indices_mapping(
    229         path,
    230         XeniumKeys.CELLS_ZARR,
   (...)
    234         labels_models_kwargs=labels_models_kwargs,
    235     )

File ~\AppData\Local\Programs\Python\Python312\Lib\site-packages\spatialdata_io\readers\xenium.py:420, in _get_labels_and_indices_mapping(path, file, specs, mask_index, labels_name, labels_models_kwargs)
    416     zip_ref.extractall(tmpdir)
    418 with zarr.open(str(tmpdir), mode="r") as z:
    419     # get the labels
--> 420     masks = z["masks"][f"{mask_index}"][...]
    421     labels = Labels2DModel.parse(
    422         masks, dims=("y", "x"), transformations={"global": Identity()}, **labels_models_kwargs
    423     )
    425     # build the matching table

File ~\AppData\Local\Programs\Python\Python312\Lib\site-packages\zarr\core.py:798, in Array.__getitem__(self, selection)
    796     result = self.get_orthogonal_selection(pure_selection, fields=fields)
    797 else:
--> 798     result = self.get_basic_selection(pure_selection, fields=fields)
    799 return result

File ~\AppData\Local\Programs\Python\Python312\Lib\site-packages\zarr\core.py:924, in Array.get_basic_selection(self, selection, out, fields)
    922     return self._get_basic_selection_zd(selection=selection, out=out, fields=fields)
    923 else:
--> 924     return self._get_basic_selection_nd(selection=selection, out=out, fields=fields)

File ~\AppData\Local\Programs\Python\Python312\Lib\site-packages\zarr\core.py:966, in Array._get_basic_selection_nd(self, selection, out, fields)
    960 def _get_basic_selection_nd(self, selection, out=None, fields=None):
    961     # implementation of basic selection for array with at least one dimension
    962 
    963     # setup indexer
    964     indexer = BasicIndexer(selection, self)
--> 966     return self._get_selection(indexer=indexer, out=out, fields=fields)

File ~\AppData\Local\Programs\Python\Python312\Lib\site-packages\zarr\core.py:1330, in Array._get_selection(self, indexer, out, fields)
   1328 # setup output array
   1329 if out is None:
-> 1330     out = np.empty_like(
   1331         self._meta_array, shape=out_shape, dtype=out_dtype, order=self._order
   1332     )
   1333 else:
   1334     check_array_shape("out", out, out_shape)

MemoryError: Unable to allocate 21.2 GiB for an array with shape (105309, 54075) and data type uint32

grosen · February 19, 2025, 1:51am

Hi James,

I want to share that one of the 10x specialists was able to resolve this for me. I got very lucky in meeting him, as the standard for the company seems to be to not support this type of analysis. He recommended the following command:

sdata = xenium(xenium_path, cells_labels=False)

And my RAM was able to handle it. You can load in the cell labels after the fact.

Thanks for your help parsing all this!

jamestwebber · February 19, 2025, 5:15am

Ah great that you got an answer. Sorry I lost track of this thread.