Parse CSV file and output to new file


New to Python and understand its ability to parse files well.

I currently have CSV audit files that are produced from Office 365 (json set data files). I need to read each file, check to see if there is certain text (text will be report names so will be a list of about 20 items) on each line and write the entire line to a new CSV file.

The files are different sizes, from 2000 lines to some over 1M lines.

Here is snippet of sample data:

Each file can have different set of columns based on the day it was run.

Basically what I’m trying to do is parse the file to find only the lines that I care about and write them to a new file since the original file has too much data in it.

Not sure what the best approach is, what functions I should be using and so on. Being new to Python, i just need some high-level ideas and then I can do more research on how to get it coded.


Have you checked out Python’s csv module?

I’m a bit puzzled by mention of JSON here, but what you’ve posted is the heading and a line from a typical CSV file. It’s easier for reader to see BTW if you use the </> button to insert text as it is in your computer. It looks like this:


I put the word “text” after the first “```” to suppress the mistaken highlighting.

Skip’s tip to look into the csv module is good. The number of rows shouldn’t pose a problem. If you are doing heavyweight manipulation, pandas may be useful, but there’s more to learn getting started and it isn’t in the standard library that comes with Python.

If it is really the case, as you say, that “Each file can have different set of columns based on the day it was run”, then that’s a quite tricky to deal with from your program’s point of view.