How do I use Python to read multiple JSON files and export specific values to a CSV?

I am not a programmer and have no experience with Pyton, so I would really appreciate any help to solve the few remaining issues I’ll explain bellow:

What I am trying to do is collect a few but same values from many .json files and get them into a spreadsheet to work with later.
So far I can get the values I need from all JSON files in a folder, but I would also need the JSON filename and the resulting CSV is not written, I get an error.
my python file:

import json
import glob
from datetime import datetime
import csv

src = "data/"

date = datetime.now()
data = []

files = glob.glob('data/*', recursive=True)

for single_file in files:
  with open(single_file, 'r') as f:

    try:
      json_file = json.load(f)
      data.append([
        json_file['score'],
        json_file['otherValue']
      ])
    except KeyError:
      print(f'Skipping {single_file}')

data.sort()

data.insert(0, ['fileName','score','otherValue'])

print(data)
#printing works

# Export to CSV.
csv_filename = f'{str(date)}.csv'
with open(csv_filename, "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerows(data)

print("Updated CSV")

Thank you in advance for any help!
I think I am not far from this result just need some guidance by somebody who knows what they are doing. :slight_smile:

Possibly, it’s the way that you’ve named things. For example, I have a CSV write function that includes this:

with open(csv_file, 'a', encoding="utf8") as f:
        f.write(date)
        output = writer(f)
        output.writerow(data)

… which is appending data to the file, which already has the header names in place.

n.b: notice that the first line writes the ‘date’ and the third line writes the ‘data’; it’s easy to misread that.

edit to add:

Also I’ve used from csv import DictReader, writer, which means that I don’t need to have output = csv.writer(f), so there’s also that.

I’m convinced that you can’t use writer = as that will mean that you can’t use the writer() function from the csv import.

{Corrected for clarity}

1 Like

You are iterating the file names using a loop so you need to add the file name to the every item of your data list inside the loop (inside data.append()).

1 Like

Thanks @vbrozik , Yes, I figured out that that is where the filename should go, but how do I get the name of the file?

The glob.glob() function returns list of file names. Your for loop iterates this list so in every iteration single_file contains the file name which is currently being processed.

2 Likes

Thanks for your help @rob42

I tried those changes, so have now:

csv_file = f'{str(date)}.csv'
with open(csv_file, 'a', encoding="utf8") as f:
        f.write(date)
        output = writer(f)
        output.writerow(data)

But it still gives an error for that line:

Traceback (most recent call last):
File “C:\JsonLevels\Levels_json_To_csv_Exporter.py”, line 57, in
with open(csv_file, ‘a’, encoding=“utf8”) as f:
OSError: [Errno 22] Invalid argument: ‘2022-09-08 12:55:11.836601.csv’

Do you have any idea what could cause that?

Thank you. Now I have the file name also. Great progress!

I am glad that you were able to make the progress!

It looks like you are running the program on Windows :smiley: This system has serious limitations regarding allowed characters in file names. I think here the problem is the colon : character.

2 Likes

No worries.

It looks to me that your date variable needs to be reformatted before you can use it as a file name, which is not impossible.

What would you like, as the file name?

edit: It looks like @vbrozik has this in hand, so I’ll side line.

2 Likes

Well yes of course. Simple Windows peasant here. :slight_smile:

How can I name the file anything simple enough for Windows to accept?
(It does not need to be unique like the date.)

Can be anything. It does not need to be unique like the date would be.

What my output is now printed by this command:

print(data)

into the console looks like:

[[‘fileName’, ‘scoreMax’], [‘data\Level2000-5360.json’, 835860], [‘data\Level2000-5361.json’, 1296420], [‘data\Level2000-5362.json’, 837220], [‘data\Level2000-5363.json’, 781170], [‘data\Level2000-5364.json’, 947670]]

I can easily, with a bit of text replacing in a text editor get that to be like
fileName, scoreMax, 5360, 835860, 5361, 1296420, 5362, 837220, 5363, 781170, 5364, 947670

This is the data I would need, so good so far.
The only thing missing is a line break after every file’s data, so when importing it into spreadsheet it becomes a table.

How could I add a line break after every file’s data added?

Oh, this works.
It makes a file and puts the data into it:

csv_file = f'CSVFileName.csv'
with open(csv_file, 'a', encoding="utf8") as f:
        # f.write(date)
        output = writer(f)
        output.writerow(data)

Thank you guys for all the help!

So the only thing missing is a line break after every file’s data.
How do I add that?

Off of the top of my head: output.writerow(data+'\n')?

The new line needs to be in this iteration, right!?
So there is a new line after adding the data from each file.

for single_file in files:
  with open(single_file, 'r') as f:
    try:
      json_file = json.load(f)
      data.append([
        single_file,
        json_file['scoreMax']
        ])
    except KeyError:
      print(f'Skipping {single_file}')

This is where the road forks, so to speak – I’m doing CSV to CSV, whereas you’re doing JSON to CSV.

I’m sure someone will know.

Glad I could help, in some small way.

Have a good day.

2 Likes

Thanks for your help @rob42.
Have a nice day!

2 Likes

If you want to use date+time anyway - I for example prefer to follow ISO 8601 when possible. You can do it by omitting the separators. Using your stored date:

date.strftime('%Y%m%dT%H%M%S')

What is writer? Is not it csv.writer? Import just csv not from csv import writer. Your program will be easier to understand,

I have never used csv.writer but I suppose the writerow method should be used to write a single row while you feed it with all the data. I think you need to iterate the items (rows) of your data using for and call writerow for every row individually.

Look into the documentation to know how it should be used.

Thank you all for helping!
I managed to do the thing with your help and sharing here for anybody who a similar solution:

import json
import glob
from datetime import datetime
import csv
from csv import DictReader, writer

date = datetime.now()
data = []

files = glob.glob('dataFolder/*', recursive=True)

# Loop through files
for single_file in files:
  with open(single_file, 'r') as f:

    try:
      json_file = json.load(f)
      data.append([
        single_file,
        json_file['scores'],
        json_file['otherData']
        ])
    except KeyError:
      print(f'Skipping {single_file}')

# Sort the data
# data.sort()

# Add headers
data.insert(0, ['fileName','scores','otherData'])

print(data)
#printing works

csv_file = f'CSVFileName.csv'
with open(csv_file, 'a', encoding="utf8") as f:
        # f.write(date)
        output = writer(f)
        output.writerow(data)