Improve two for loops while scanning a map

artheoups · June 21, 2021, 9:56am

Hello,
I am currently working on very large netCDF climate data. I am handling 71 summers, which results in 6532 maps of dimensions 465-by-705.

I need to carry out a scanning operation of each map, which means, for almost each grid cell, I want to compute a matching scanning window of 35-by-35 and check if this window exceeds a given criterion.
I tried to handle it using a 4-dimensionnal numpy array, named cpt_table_bool, which gathers all the scanning windows.

Therefore, this process is quite long (the script runs for approximately 15 hours), and I would like to know if someone has an idea of how to improve it.

The speed-determining step is indicated with the comment “this takes time, could probably be improved”. It takes approximately 6 or 7 seconds for each iteration of the loop.

    for year in range(an) :
        print('year',year,'/70')
        red=f.variables['temp'][year*92:(year+1)*92,:,:] # start = 01/06 ; end = 31/08 ; SHAPE = (time,lat,lon)
        siz=np.shape(red)
        #make the scan operation of each zone in order to determine the heatwaves dates, stored into the netCDF file
        for t in range(92) :

            print('day ', t, '/91, year ', year, '/70')
            cpt_table_bool=ma.array(np.zeros((siz[1],siz[2],scan_lat,scan_lon)),mask=False) #siz[1]=lat, siz[2]=lon

            for i in range(int((len(lat)-scan_lat)/2)) :
                for j in range(int((len(lon)-scan_lon)/2)) :
                    cpt_table_bool[i*2,j*2,:,:]=ma.array(red[t,i*2:i*2+scan_lat,j*2:j*2+scan_lon]>0) #this takes time, could probably be improved
            
            cpt_table=cpt_table_bool*weight_table_4d

            cpt_table = np.sum(cpt_table,-1)
            cpt_table = np.sum(cpt_table,-1)

            detect_heatwave_table_bool=np.array(cpt_table > np.round((weight_table_2d-sea_cpt_table)*pourcent))
            if detect_heatwave_table_bool.any() : #Check if at least one of the scanning windows has detected a sub-heatwave
                ntimes=np.shape(temp)[0] #take the time dimension length in order to know the next index to use
                date_idx[ntimes]=year*92+t
                date_format[ntimes] = nc.stringtochar(np.array([calendar[year,t]], 'S10'))
                temp[ntimes,:,:]=ma.array(np.zeros((siz[1],siz[2])),mask=red[t,:,:].mask) #siz[1]=lat, siz[2]=lon
                stack_where=np.argwhere(detect_heatwave_table_bool)
                for i,j in stack_where:
                    temp[ntimes,i:i+scan_lon,j:j+scan_lat] = red[t,i:i+scan_lon,j:j+scan_lat] #save the temperature anomalies responsible for the sub-heatwave
                print("Number of recorded days :",ntimes+1)
    time[:]=range(ntimes+1)

It is my first time posting on a forum, so I hope my message is complete enough for users to help me.

Thank you in advance for your time and consideration,

Théo