I’m reading in a lot of excel files and based on a part of their filename they are being merged. This itself works quite well but I noticed that with a lot of files this takes ages and this seems to be caused by the concatenation:
filenames = glob.glob(os.path.join(path, "*.xls"))
dd = defaultdict(list)
for fn in filenames:
dd[fn.split('_202')[0]].append(fn)
dict_df = {}
for k, v in dd.items():
dict_df[k] = pd.concat([pd.read_csv(fn
, parse_dates = ['time']
, dayfirst=True
, skiprows=4
, sep="\t"
, decimal=','
).assign(Path = fn) for fn in v
]
, ignore_index=True
)
Re-writing this with the help of appending
didn’t work, or at least I do not understand how to do so:
dict_df2 = []
for k, v in dd.items():
dict_df2.append(pd.read_csv(fn
, parse_dates = ['time']
, dayfirst=True
, skiprows=4
, sep="\t"
, decimal=','
).assign(Path = fn) for fn in v)
Could you please help me out here?