How to strip() all cells in a Pandas dataframe?

c-rob · July 29, 2024, 5:48pm

Python 3.11 on Windows 10.

I have a spreadsheet and an original Pandas data frame called df that I need to do filtering on. I grab some of the columns in a variable, strip() those variables, and use the variables to filter the data frame to produce variable df2. However what I’m finding is that the data in the data frame df still has an extra space at the end so my filtered data set df2 is always empty.

So when I tried this function on the dataframe:

def trim_whitespace_df_old(dataframe):
    '''Remove leading/trailing whitespace from whole dataframe.'''
    procname = libfilename + ":" + str(inspect.stack()[0][3]) + ":"
    # iterating over the columns
    for i in dataframe.columns:
        # checking datatype of each columns
        if dataframe[i].dtype == 'object':
        # if dataframe[i].dtype == 'object' and isinstance(dataframe[i],str):
            # applying strip function on column
            dataframe[i] = dataframe[i].map(str.strip)

I get errors because not every cell is a string. The error is reporting some cells are a float and you cannot strip() a float value.

Also this does not do anything at all: df['ColumnName'].str.strip() I still have problems with an extra space on the end of some cells so the df filters won’t work.

How do I strip()/trim all cells in a data frame without using the above method?

I could do it in excel before I run the Python program on the spreadsheet but the only method I have found yet is using a formula, which requires me to add a new column with the formula, next to every old column. We have about 25 columns, and up to 12,000 rows in one spreadsheet.

doomsdaychicken · July 29, 2024, 6:02pm

If you have to conditionally strip only str values, and your rows and columns have mixed types, then you can use a lambda that conditionally applies strip:

import pandas as pd
df = pd.DataFrame({"col1": [" adasd ", "babab ", "lalala"]})
df_stripped = df.applymap(lambda x: x.strip() if isinstance(x, str) else x)
print(df_stripped)

But, you should also consider - why do you have mixed types in a single column?

c-rob · July 29, 2024, 6:16pm

This is normal for a spreadsheet from a customer. It can have all kinds of garbage and inconsistencies in it.

hansgeunsmeyer · July 29, 2024, 9:01pm

When you use df.ColumnName.str.strip() the column needs to be a column of strings (otherwise NaNs are generated for non-strings or you will get runtime errors) and you need to re-assign the returned value to df.ColumnName. When used on mixed-type columns, you first need to cast the column to type string:

df.ColumnName = df.ColumnName.astype(str).str.strip()

c-rob · July 30, 2024, 9:21am

doomsdaychicken:

import pandas as pd
df = pd.DataFrame({"col1": [" adasd ", "babab ", "lalala"]})
df_stripped = df.applymap(lambda x: x.strip() if isinstance(x, str) else x)
print(df_stripped)

Thank you to all of you. I will try this. I have not used lambda functions much at this point. I did read some Pandas docs but must have missed some of these caveats.

It appears .applymap() is deprecated in Python 3.11 so we would use: df_stripped = df.map(lambda x: x.strip() if isinstance(x, str) else x)