Hello,
I am currently working on very large netCDF climate data. I am handling 71 summers, which results in 6532 maps of dimensions 465-by-705.
I need to carry out a scanning operation of each map, which means, for almost each grid cell, I want to compute a matching scanning window of 35-by-35 and check if this window exceeds a given criterion.
I tried to handle it using a 4-dimensionnal numpy array, named cpt_table_bool, which gathers all the scanning windows.
Therefore, this process is quite long (the script runs for approximately 15 hours), and I would like to know if someone has an idea of how to improve it.
The speed-determining step is indicated with the comment “this takes time, could probably be improved”. It takes approximately 6 or 7 seconds for each iteration of the loop.
for year in range(an) :
print('year',year,'/70')
red=f.variables['temp'][year*92:(year+1)*92,:,:] # start = 01/06 ; end = 31/08 ; SHAPE = (time,lat,lon)
siz=np.shape(red)
#make the scan operation of each zone in order to determine the heatwaves dates, stored into the netCDF file
for t in range(92) :
print('day ', t, '/91, year ', year, '/70')
cpt_table_bool=ma.array(np.zeros((siz[1],siz[2],scan_lat,scan_lon)),mask=False) #siz[1]=lat, siz[2]=lon
for i in range(int((len(lat)-scan_lat)/2)) :
for j in range(int((len(lon)-scan_lon)/2)) :
cpt_table_bool[i*2,j*2,:,:]=ma.array(red[t,i*2:i*2+scan_lat,j*2:j*2+scan_lon]>0) #this takes time, could probably be improved
cpt_table=cpt_table_bool*weight_table_4d
cpt_table = np.sum(cpt_table,-1)
cpt_table = np.sum(cpt_table,-1)
detect_heatwave_table_bool=np.array(cpt_table > np.round((weight_table_2d-sea_cpt_table)*pourcent))
if detect_heatwave_table_bool.any() : #Check if at least one of the scanning windows has detected a sub-heatwave
ntimes=np.shape(temp)[0] #take the time dimension length in order to know the next index to use
date_idx[ntimes]=year*92+t
date_format[ntimes] = nc.stringtochar(np.array([calendar[year,t]], 'S10'))
temp[ntimes,:,:]=ma.array(np.zeros((siz[1],siz[2])),mask=red[t,:,:].mask) #siz[1]=lat, siz[2]=lon
stack_where=np.argwhere(detect_heatwave_table_bool)
for i,j in stack_where:
temp[ntimes,i:i+scan_lon,j:j+scan_lat] = red[t,i:i+scan_lon,j:j+scan_lat] #save the temperature anomalies responsible for the sub-heatwave
print("Number of recorded days :",ntimes+1)
time[:]=range(ntimes+1)
It is my first time posting on a forum, so I hope my message is complete enough for users to help me.
Thank you in advance for your time and consideration,
Théo