CSV to JSON - Not quite

sailingbikeruk · June 27, 2022, 2:47pm

I am new to this and new to JSON. I thought JSON was always key value pairs and the values were always strings, now I have to be able to export a CSV file to “JSON” using specific data types for each field

These are the data types for each key:value pair

[{“FIELD_1”: String,
“FIELD_2”: Int,
“FIELD_1”: Int,
“FIELD_1”: Int,
“ACTION”: String,
“FIELD_1”: String or null,
“FIELD_1”: String or null,
“FIELD_1”:String,
“FIELD_1”: List[String],
“FIELD_1”: String,
“FIELD_1”: Int or null,
“FIELD_1”: Int or null,
“FIELD_1”: List[String],
“FIELD_1”: String or null,
“FIELD_1”: String,
“FIELD_1”: Int,
“FIELD_1”: List[String],
“FIELD_1”: String or null,
“FIELD_1”: String,
“FIELD_1”: String or null,
“FIELD_1”: String or null}]

I tried to do this in PowerShell by casting the variables types that hold each field to the appropriate type but the ConvertTo-Json cmdlet just converted them all to strings and puts an empty string (“”) in empty fields instead of a null character.

The only other scripting language I have access to is Python, but I am a complete novice, I have followed a few folk on youtube doing small scripts for network management or testing but I don’t know where or how to get started with this, to be fair I don’t even know if this is this possible in Python? Can anyone direct me to resources that might help get me started?

cameron · June 28, 2022, 12:55am

By Ian via Discussions on Python.org at 27Jun2022 14:57:

I am new to this and new to JSON. I thought JSON was always key value
pairs and the values were always strings, now I have to be able to
export a CSV file to “JSON” using specific data types for each field

JSON is more than that. It is short for JavaScript Object Notation and
described in detail here: JSON
but basicly it supports the core JavaScript types:

null, like Python None
numbers (floating point values)
strings
arrays, like Python lists
mappings, like Python dicts

Textually it is a lot like Python’s comprehension syntax, so it should
be easy to read.

These are the data types for each key:value pair

[{“FIELD_1”: String,
“FIELD_2”: Int,
“FIELD_1”: Int,
“FIELD_1”: Int,
“ACTION”: String,
“FIELD_1”: String or null,
“FIELD_1”: String or null,

So this suggests that they expect a list ([]) of dicts ({}), and
that the various fields might have the describes types, variously
strings, ints, null, lists of strings, etc.

I tried to do this in PowerShell by casting the variables types that
hold each field to the appropriate type but the ConvertTo-Json cmdlet
just converted them all to strings and puts an empty string (“”) in
empty fields instead of a null character.

The only other scripting language I have access to is Python, but I am a complete novice, I have followed a few folk on youtube doing small scripts for network management or testing but I don’t know where or how to get started with this, to be fair I don’t even know if this is this possible in Python? Can anyone direct me to resources that might help get me started?

It is certainly possible. We do stuff like this all the time.

See the csv module for reading (or writing) CSV data from a CSV file:

and the json module for writing (or reading) JSON data to a JSON file:

Both types are just “text” files with information in a particular
format.

I would expect to:

open a CSV file for read
make a csv.reader and read the rows of data from the file, storing
them in a list of the rows
maybe sanity check the values, or change some
open a JSON file for write
write the list of rows to the JSON file using json.dump

Try that and come back with questions (and the code you’ve written).

Remember to present code and error tracebacks inline in your messages
between triple backticks:

```
code pasted here
```

instead of screenshots.

Cheers,
Cameron Simpson cs@cskk.id.au

sailingbikeruk · June 29, 2022, 8:33am

Thanks for the starter, I’ll be back once i’ve written something that has some functionality

sailingbikeruk · July 24, 2022, 1:42pm

well, it took some time, but I finally have something that works. It reads a CSV file from Excel and outputs that as JSON in what appears to be the correct format, here is my code, please remember this is the first python script I have written beyond print(“Hello World”).

It is not yet finished, there is more error checking to do, I would like to be able to manage the fie paths as parameters and I still have to consider some keys that are mandatory but dont have a clear default. As a first “stab” I am quite pleased, I’d appreciate any comments and if you have the time to explain how to make it more “pythonic” it’ll help me moving forward, although I really only post it here so you can see my efforts and how much your post helped.

# import the json and csv modules
import json, csv

#delare a list to hold the dictionaries
jsonArray = []

# open the csv file and read the contents into rules_list
with open("C:\\Users\\saili\\Downloads\\XDR_Host Firewall_export.csv", "r") as csv_file:
	rules_list = csv.DictReader(csv_file)

# convert the single long list to a list of dictionaries by appending each row independetly to the 
# jsonArray[] 	
	for row in rules_list:
		jsonArray.append(row)

# iterate through each row of jsonArray check the element typing and values
# convert emptry strings to null values
# convert single strings to lists of strings (where necessary)
# output error messages if mandatory elements are missing
	for row in jsonArray:
		#remove GROUP_NAME
		row.pop("GROUP_NAME")

		# "SCHEMA_VERSION": Int 
		if "SCHEMA_VERSION" not in row:
			row.update({"SCHEMA_VERSION":1})
		else:
			row["SCHEMA_VERSION"] = int(row["SCHEMA_VERSION"])

# ----- 
#these need to be considered careully, they must exist but don't have a clear default
# -----
		# "OS_TYPE": List[String]
		row["OS_TYPE"] = row["OS_TYPE"].split(",")

		# "PRIORITY": Int 
		row["PRIORITY"] = int(row["PRIORITY"])

		# "ACTION": String 
		row["ACTION"] = str(row["ACTION"])

		# "DIRECTION": String 
		row["DIRECTION"] = str(row["DIRECTION"])
# -----
		
		# "WINDOWS_APPLICATION" :str
		if "WINDOWS_APPLICATION" not in row:
			row.update({"WINDOWS_APPLICATION":None})
		elif row["WINDOWS_APPLICATION"] == "":
			row.update({"WINDOWS_APPLICATION":None})
		else:
			row["WINDOWS_APPLICATION"] = str(row["WINDOWS_APPLICATION"])

		# "MACOS_APPLICATION": String or null 
		if "MACOS_APPLICATION" not in row:
			row.update({"MACOS_APPLICATION":None})
		elif row["MACOS_APPLICATION"] == "":
			row.update({"MACOS_APPLICATION":None})
		else:
			row["MACOS_APPLICATION"] = str(row["MACOS_APPLICATION"])

		# "MODE": String
		if "MODE" not in row:
			row.update({"MODE":"ENABLED"})
		elif row["MODE"] == "":
			row.update({"MODE":"ENABLED"})
		else:
			row["MODE"] = str(row["MODE"])
		
		# "VERSION": Int 
		if "VERSION" not in row:
			row.update({"VERSION":1})
		elif row["VERSION"] == "":
			row.update({"VERSION":1})
		else:
			row["VERSION"] = int(row["VERSION"])
		
		# "DESCRIPTION":String 		
		if "DESCRIPTION" not in row:
			row.update({"DESCRIPTION":"This is a default description nothing was imported"})
		elif row["DESCRIPTION"] == "":
			row.update({"DESCRIPTION":"This is a default description nothing was imported"})
		else:
			row["DESCRIPTION"] = str(row["DESCRIPTION"])
		
		# "ICMP_CODE": Int or null 
		if "ICMP_CODE" not in row:
			row.update({"ICMP_CODE":None})
		elif row["ICMP_CODE"] == "":
			row.update({"ICMP_CODE":None})
		else:
			row["ICMP_CODE"] = int(row["ICMP_CODE"])
		
		# "ICMP_TYPE": Int or null 
		if "ICMP_TYPE" not in row:
			row.update({"ICMP_TYPE":None})
		elif row["ICMP_TYPE"] == "":
			row.update({"ICMP_TYPE":None})
		else:
			row["ICMP_TYPE"] = int(row["ICMP_TYPE"])
		
		# "LOCAL_IP": List[String]
		if "LOCAL_IP" not in row:
			row.update({"LOCAL_IP":None})
		elif "LOCAL_IP" == "":
			row.update({"LOCAL_IP":None})
		else:
			row["LOCAL_IP"] = row["LOCAL_IP"].split(",")
		
		# "LOCAL_PORT": String or null 
		if "LOCAL_PORT" not in row:
			row.update({"LOCAL_PORT":None})
		elif row["LOCAL_PORT"] == "":
			row.update({"LOCAL_PORT":None})
		else:
			row["LOCAL_PORT"] = str(row["LOCAL_PORT"])
		
		# "NAME": String 		
		if "NAME" not in row:
			row.update({"NAME":"Name was not present during Import"})
		elif row["NAME"] == "":
			row.update({"NAME":"Name was not present during Import"})
		else:
			row["NAME"] = str(row["NAME"])
		
		# "PROTOCOL": Int 
		if "PROTOCOL" not in row:
			row.update({"PROTOCOL":None})
		elif row["PROTOCOL"] == "":
			row.update({"PROTOCOL":None})
		else:
			row["PROTOCOL"] = int(row["PROTOCOL"])
		
		# "REMOTE_IP": List[String] 
		if "REMOTE_IP" not in row:
			row.update({"REMOTE_IP":None})
		elif row["REMOTE_IP"] == "":
			row.update({"REMOTE_IP":None})
		else:
			row["REMOTE_IP"] = row["REMOTE_IP"].split(",")
		
		# "REMOTE_PORT": String or null 
		if "REMOTE_PORT" not in row:
			row.update({"REMOTE_PORT":None})
		elif row["REMOTE_PORT"] == "":
			row.update({"REMOTE_PORT":None})
		else:
			row["REMOTE_PORT"] = str(row["REMOTE_PORT"])
		
		# "REPORT_MODE": String
		if "REPORT_MODE" not in row:
			row.update({"REPORT_MODE":ENABLED})
		elif row["REPORT_MODE"] == "":
			row.update({"REPORT_MODE":ENABLED})
		row["REPORT_MODE"] = str(row["REPORT_MODE"])
		
		# "WINDOWS_SERVICE": String or null 
		if "WINDOWS_SERVICE" not in row:
			row.update({"WINDOWS_SERVICE":None})
		elif row["WINDOWS_SERVICE"] == "":
			row.update({"WINDOWS_SERVICE":None})
		else:
			row["WINDOWS_SERVICE"] = str(row["WINDOWS_SERVICE"])
		
		# "MACOS_BUNDLE_ID": String or null
		if "MACOS_BUNDLE_ID" not in row:
			row.update({"MACOS_BUNDLE_ID":None})
		elif row["MACOS_BUNDLE_ID"] == "":
			row.update({"MACOS_BUNDLE_ID":None})
		else:
			row["MACOS_BUNDLE_ID"] = str(row["MACOS_BUNDLE_ID"])

with open("E:\\Dropbox\\github\\XDRRulesJSONConverter\\ToImport.json","w") as export_file:
		jsonString = json.dumps(jsonArray, indent = 4)
		export_file.write(jsonString)

smontanaro · July 24, 2022, 3:15pm

It looks like you’re well on your way to Python stardom.

Two minor nits with the initial reading…

There is no need to declare objects in Python. I understand why you’re defining an empty list, however…

You can just slurp the entire CSV file in one fell swoop. Instead of your row-by-row iteration:

just grab it all in one shot:

jsonArray = list(rules_list)

I know one liners are sometimes frowned on, but you can even do this:

with open(...) as csv_file:
    jsonArray = list(csv.DictReader(csv_file))

Not necessary in your case, but it’s worth knowing it can be done when you encounter it later.

Later on you set a bunch of key-value pairs in the rows with statements like this:

It’s more typical in Python to just set the key-value pair directly, like so:

row["MODE"] = "ENABLED"

If nothing else, you’ve saved all those dictionary creations required by the row.update method.

You can simplify some other if/elif/else statements. Everything coming out of a CSV file will initially be a string, so calling str(row["DESCRIPTION"]) (for example) is superfluous. Also, if the action for the if and elif clauses is the same, you might consider using the short-circuit feature of the or operator, like so:

if "DESCRIPTION" not in row or row["DESCRIPTION"] == "":
    row.update({"DESCRIPTION":"This is a default description nothing was imported"})

[/quote]

cameron · July 24, 2022, 10:56pm

By Skip Montanaro via Discussions on Python.org at 24Jul2022 15:25:

You can simplify some other if/elif/else statements. Everything coming
out of a CSV file will initially be a string,

Not the case. CSV reads a few dialects, and by default is happy to
recognise an unquoted 5.1 as a float value for example.

str(row["DESCRIPTION"]) (for example) is superfluous.

That, OTHOH, is almost certainly true, because a description will
generally be a string. But most numeric data columns have unquoted
numbers, and CSV returns them as numbers, not str.

So a CSV row like this:

word,"two words",3.1,"3.1"

should get a list of str, str, float, str.

Don’t get me started on my solar data, which quotes the numeric data and
seems to quietly rely on Excels “I know what you really mean!”
presumptions. My importer for that goes the other way:
float(column_value).

Cheers,
Cameron Simpson cs@cskk.id.au

smontanaro · July 24, 2022, 11:42pm

I’d like to see a concrete example of the csv module (not something fancy like Pandas) doing that. I’ve never encountered a situation where it did any kind of type conversion. That said, it’s been a long while since I worked on the guys of the csv module. Things would appear to have changed (progressed? not so sure) while I wasn’t looking.

cameron · July 25, 2022, 3:07am

By Skip Montanaro via Discussions on Python.org at 24Jul2022 23:52:

Cameron Simpson:

You can simplify some other if/elif/else statements. Everything coming
out of a CSV file will initially be a string,

Not the case. CSV reads a few dialects, and by default is happy to
recognise an unquoted 5.1 as a float value for example.

str(row["DESCRIPTION"]) (for example) is

I’d like to see a concrete example of the csv module (not something
fancy like Pandas) doing that. I’ve never encountered a situation where
it did any kind of type conversion. That said, it’s been a long while
since I worked on the guys of the csv module. Things would appear to
have changed (progressed? not so sure) while I wasn’t looking.

Hmm. I stand corrected. I could have sworn I’d see this with plain
csv. Yet an experiment proves you correct and I incorrect.

I have been faffing about with pandas recently, I wonder if it’s broken
my thinking.

Sorry,
Cameron Simpson cs@cskk.id.au

smontanaro · July 25, 2022, 11:22am

Pandas has a tendency to do that.