# Delete First Row - NameError

Hi, any help is appreciated, I am trying to delete the first row in csv files before running the full script. I added (before creating column lines) the # Delete First Row line but receiving an NameError: name ‘df’ is not defined error. I can manage to resolve it. Any ideas? thanks

import libraries

import json
import pandas as pd
import os
import sys

specify data directories

locfile = “c:/files/test/”
raw_files = os.listdir(locfile)

# Delete First Row
df.drop(index=df.index[0], axis=0, inplace=True)

Define columns - these are from JSON attributes

cols = [“Id”,“Activity”,“CreationTime”,“OrganizationId”,“RecordType”,“WorkspaceId”,“WorkSpaceName”,“Workload”,“DataflowType”,“DatasetId”,“DatasetName”,“IsSuccess”,“ObjectId”,“ItemName”,“ReportId”,“ReportName”,“ReportType”,“UserKey”,“UserId”,“UserAgent”,“ClientIP”,“AcitveUser”]

In[306]:

sub routine to accept json data and return table format array

def convDelim (j):

try:
    js = json.loads(j, strict=False)
    df.columns = df.columns.str.strip()
    return [(js.get("Id") , js.get("Activity"),  js.get("CreationTime"),  js.get("OrganizationId"),  str((js.get("RecordType") if js.get("RecordType") else "")),
         (js.get("WorkspaceId") if js.get("WorkspaceId") else ""),   (js.get("WorkSpaceName") if js.get("WorkSpaceName") else ""),  (js.get("Workload") if js.get("Workload") else ""),
         (js.get("DataflowType") if js.get("DataflowType") else ""),   (js.get("DatasetId") if js.get("DatasetId") else "" ),
         (js.get("DatasetName") if js.get("DatasetName") else "") ,
         str(js.get("IsSuccess")),
         (js.get("ObjectId") if js.get("ObjectId") else ""),
         (js.get("ItemName") if js.get("ItemName") else "" ),
         (js.get("ReportId") if js.get("ReportId") else "" ),
         (js.get("ReportName") if js.get("ReportName") else ""),
         (js.get("ReportType") if js.get("ReportType") else ""),
         js.get("UserKey"),
         js.get("UserId"),
         js.get("UserAgent"),     
         js.get("ClientIP"),
      js.get("AcitveUser"))] 
except:
    return [("" ,  "", "",  "", "",  "", "", "",  "", "", "" ,"","","","","","","","","","")] 

In[304]:

final_df = pd.DataFrame(columns=cols)

for file in (raw_files):
if file.endswith(“csv”):
print("Processing: " + file)
df = pd.read_csv(locfile + file)
#print(df.info())
df[“JsonData”] = df[“AuditData”].map(lambda x: convDelim(x))
js_arr = df[“JsonData”]
a =
for x in (js_arr):
a.append(x[0])
tmp_df = pd.DataFrame(data=a,columns=cols)
final_df = pd.concat([final_df, tmp_df])

final_df = final_df.drop_duplicates()

In[303]:

final_df.to_csv(locfile + “data_2022.csv”, index=False, header=True)

Please wrap code in triple backticks to preserve the formatting:

```python
if True:
    print(''Hello world!')
```

As to your problem, you’re trying to delete the first row of df without first saying what df is.

I also notice another thing: your code has a bare except, i.e. except without specifying any error. That’s a bad idea because it’ll swallow all errors, even ones such as NameError caused by misspelling a name. You should catch only those errors that you’re going to handle.

thanks for the reply… I am brand new to python, what I would like to do is an if statement,

If first row contains the string “test” skip row and use second row as column names. This is what I have but I am stuck:

‘’’

def file():
df = pd.read_csv(locfile + file)
for file in (raw_files):
if file.endswith(“csv”):
print("Processing: " + file)
df = pd.read_csv(locfile + file)
#print(df.info())

def somenewfunction (row):
if row[‘a’].contains(‘test’)==True:
return skipLine(f, 1)

return

Define columns - these are from JSON attributes

cols = [“ID”,“Activity”,“CreationTime”,“OrganizationId”,“RecordType”]

Blockquote

To preserve format the code you post, select it and then click on </>.

# Read the CSV file.
with open(path, encoding='utf-8') as file:
    lines = list(file)

# Delete the first row if it contains "test".
if 'test' in lines[0]:
    del lines[0]

# read_csv requires a file for input, so first write the lines into a StringIO buffer...
from io import StringIO
sio = StringIO()
sio.writelines(lines)

# ..then rewind to the start to the buffer...
sio.seek(0)

# ...and read it in.
df = pd.read_csv(sio)

thank you for you help