I am currently working on very large netCDF climate data. I am handling 71 summers, which results in 6532 maps of dimensions 465-by-705.
I need to carry out a scanning operation of each map, which means, for almost each grid cell, I want to compute a matching scanning window of 35-by-35 and check if this window exceeds a given criterion.
I tried to handle it using a 4-dimensionnal numpy array, named cpt_table_bool, which gathers all the scanning windows.
Therefore, this process is quite long (the script runs for approximately 15 hours), and I would like to know if someone has an idea of how to improve it.
The speed-determining step is indicated with the comment “this takes time, could probably be improved”. It takes approximately 6 or 7 seconds for each iteration of the loop.
for year in range(an) : print('year',year,'/70') red=f.variables['temp'][year*92:(year+1)*92,:,:] # start = 01/06 ; end = 31/08 ; SHAPE = (time,lat,lon) siz=np.shape(red) #make the scan operation of each zone in order to determine the heatwaves dates, stored into the netCDF file for t in range(92) : print('day ', t, '/91, year ', year, '/70') cpt_table_bool=ma.array(np.zeros((siz,siz,scan_lat,scan_lon)),mask=False) #siz=lat, siz=lon for i in range(int((len(lat)-scan_lat)/2)) : for j in range(int((len(lon)-scan_lon)/2)) : cpt_table_bool[i*2,j*2,:,:]=ma.array(red[t,i*2:i*2+scan_lat,j*2:j*2+scan_lon]>0) #this takes time, could probably be improved cpt_table=cpt_table_bool*weight_table_4d cpt_table = np.sum(cpt_table,-1) cpt_table = np.sum(cpt_table,-1) detect_heatwave_table_bool=np.array(cpt_table > np.round((weight_table_2d-sea_cpt_table)*pourcent)) if detect_heatwave_table_bool.any() : #Check if at least one of the scanning windows has detected a sub-heatwave ntimes=np.shape(temp) #take the time dimension length in order to know the next index to use date_idx[ntimes]=year*92+t date_format[ntimes] = nc.stringtochar(np.array([calendar[year,t]], 'S10')) temp[ntimes,:,:]=ma.array(np.zeros((siz,siz)),mask=red[t,:,:].mask) #siz=lat, siz=lon stack_where=np.argwhere(detect_heatwave_table_bool) for i,j in stack_where: temp[ntimes,i:i+scan_lon,j:j+scan_lat] = red[t,i:i+scan_lon,j:j+scan_lat] #save the temperature anomalies responsible for the sub-heatwave print("Number of recorded days :",ntimes+1) time[:]=range(ntimes+1)
It is my first time posting on a forum, so I hope my message is complete enough for users to help me.
Thank you in advance for your time and consideration,