I’m trying to read an XLSX spreadsheet with pandas and am having a fit with empty cells that contain nan. Here’s my code which worked in Python 3.11.5. I’d like all nan converted to an empty string ‘’.
I’m trying to get 3 fields: Firstname, Lastname, and Degree then contcat the together. But this gets an error when one field is nan. So I’m assuming I have to change all nan to an empty string first.
try:
df = pandas.read_excel(options.inputfile, na_values='', usecols='A:W') # This no longer works.
df.fillna('', inplace=True) # Replace nan with empty string.
except PermissionError:
print(f"{procname} ERROR: File '{options.inputfile}' might be opened. Please close it then rerun program")
sys.exit()
except Exception as e:
print(f"{procname} ERROR-openss opening spreadsheet {options.inputfile}. {e}")
sys.exit()
# A bit later
df['Firstname'].str.strip()
df['Lastname'].str.strip()
df['Degree'].str.strip()
This line df.fillna('', inplace=True) is something new I tried but gives me the error: “FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value ‘’ has dtype incompatible with float64, please explicitly cast to a compatible dtype first.”
I’m not sure how to fix this. Would someone mind helping me out? Surely others have dealt with nan in spreadsheets.
That’s not an error, it’s a warning (FutureWarning). In a future version of pandas, it will result in an error, but not in the version you’re using. Did you get the result you were expecting?
The reason for the warning is that your data frame (or at least some of its columns) are being stored as numeric. Trying to replace nan (a numeric value), with an empty string (a string value) might require changing the type of the affected columns. This warning is telling you (a) that this auto conversion happened and (b) the pandas will stop auto-converting in the future.
There are a few ways to resolve the warning. In order to suggest the most helpful one, we need to know what are you trying to do any why do you want to replace nan with the empty string?
(More specifically: do you want everything to be stored as a string, or do you want a mix of numbers and strings in your data frame?)
Edit: re-reading your post makes clear you want everything as a string. I will make a suggestion shortly
I’m reading a spreadsheet row by row. For each row I get the data from these columns: Firstname, Lastname, Degree. Then I concat them into a key like this:
# The "row" variable has one row from the dataframe.
fname = row['Firstname']
lname = row['Lastname']
degree = row['Degree']
curkey = fname + " " + lname + " " + degree
But there are some blank rows in the spreadsheet. When I find a blank row, fname contains nan, which I think is a type float. When I concant that to make curkey I get an error.
Thanks. I’m pretty sure the blank rows must stay in there as I’m modifying the spreadsheet and returning it to the sender.
I have solved this I believe.
# First change column to string type which will convert nan to 'nan'.
# Change types to str.
df['Firstname'] = df['Firstname'].astype(str)
df['Lastname'] = df['Lastname'].astype(str)
df['Degree'] = df['Degree'].astype(str)
# Later check for 'nan' in 2 columns. If they are both 'nan' (string) then it's a blank row, skip the blank row.
if (row['Firstname'] == 'nan') and (row['Lastname'] == 'nan'):
outarr.append('') # Append blank row
continue # Skip blank row, go to top of loop
That should also work, so long as you never encounter someone named “nan nan” :-). If you run into issues like that, you could combine fillna with astype: df["Firstname"] = df["Firstname"].fillna("").astype(str).
An alternative approach would be to leave the nan values as is and use pd.isna() to replace your current check for "nan", but I find using astype to enforce your assumptions about the type of each column to more robust than needing to include a nan-check before every string operation.
For those who need to do fillna() right after the dataframe is read here is what did not work, and what works.
try:
df = pandas.read_excel(filename, usecols='A:H')
#df.fillna('', inplace=True) # Does not work
df = df.fillna('') # This works.
except Exception as e:
print(f"{procname} ERROR-read_excel: {e}")
sys.exit(1)