I wrote a code which reads some columns from several csv files and creates a dataframe. Then i concatenate this dataframe so that i have 163200000 rows and 51 columns. A tall array as it is called.
My problem is that when i try to save the dataframe in hdf format then i get a message that it cannot allocate memory for saving a (51,163200000) array but actually the array is (163200000,51) and why to allocate more memory the dataframe is already in memory…
Can you please help…?
This is my code
import os import glob import pandas as pd from pandas import HDFStore,DataFrame import numpy as np import h5py files_path = r"F:\Snapshots_U\Snapshots_U~\Snapshots_U" read_files = glob.glob(os.path.join(files_path, r"tmp_plate_45Machu.*")) cols =  print("Now reading the files") dataframes = [pd.read_csv(f, header=None, delimiter=" ", usecols=cols,skiprows=9) for f in read_files] print("Now writing the matrix X") concatenated_df = pd.concat(dataframes, axis=1, ignore_index=True) hdf =HDFStore('Big_Matrix.h5') hdf.put('d1', concatenated_df, format='table') hdf.close()
When i do a concatenated_df.shape i do get the correct shape (163200000,51) but as soon as it reaches the hdf lines it fails with the message “cannot allocate 64GB of RAM for an array with shape (51,163200000)”
Thank you in advance