I have multiple JSON files in multiple folders and subfolders C:\Users\kma\Desktop\Bots****\content.json
the stars (*) above represent multiple folders and each subfolder has one json file named as content.json and I want to extract some data from it. I need help to traverse recursively through these files and append all the data to one csv file.thanks
I would check out glob. It allows you to create recursive search patterns. For instance, you could search with a pattern: c:/Users/kma/Desktop/Bots/**/content.json.
Python also has a json module which is great for reading data from JSON.
Lastly, Python has a CSV module allowing you to write to a CSV file.
There are also plenty of 3rd party libraries that can enhance your experience with these tasks as well, but the aforementioned modules should be sufficient to get the job done.
@facelessuser I have used all those three modules and able to do the recursion and its working fine
For each row I want append one more record which is the base path of the folder which that file belongs to. Any suggestions on how to add it. I know how to extract it from the file path but appending it along with the output row , Im not able to figure out
import glob
import csv
import json
import os
files = glob.glob('C:\\Users\\kmutyala\\Desktop\\Bots\\**\\content.json', recursive = True)
count =1
#print(fnames)
for f in files[:3:]:
with open(f) as json_file:
data = json.load(json_file)
employee_data = data['packages']
data_file = open('data_file.csv', 'a')
csv_writer = csv.writer(data_file)
print(count)
count += 1
#print(employee_data)
#print(f)
for emp in employee_data:
print(emp)
csv_writer.writerow(emp.values())
data_file.close()
I assume part of the issue is that you are creating a new csv document on every new file? I would consider moving the creation of csv_writer to occur outside of the loop so you just have one CSV document created for all files.
this is resolved but I need help on this :
For each row I want append one more record which is the base path of the folder which that file belongs to. Any suggestions on how to add it. I know how to extract it from the file path but appending it along with the output row , Im not able to figure out
I assume emp.values() is a list? I assume something like this could work:
csv_writer.writerow(emp.values() + [filename])
I don’t fully understand the output you are looking for, but you just need to ensure you are passing a list that contains all the information you want.
EDIT: I originally had it prepending, not appending.
@facelessuser Actually I have huge volumes of data and CSV is not taking more that 5146 rows and Im getting this error:
Traceback (most recent call last):
File "C:/Users/kmutyala/Desktop/pkg3.py", line 13, in <module>
data = json.load(json_file)
File "C:\Users\kmutyala\AppData\Local\Programs\Python\Python39\lib\json\__init__.py", line 293, in load
return loads(fp.read(),
File "C:\Users\kmutyala\AppData\Local\Programs\Python\Python39\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 85173: character maps to <undefined>
Any limitations on the nu,ber of rows for CSV Writer
The byte 0x9d cannot be decoded by the default encoding being used. This is platform dependent and it seems to be cp1252 on yours.[1] The solution is to find out which encoding was used to create the JSON file and specify the encoding in the open() call: