Python 3.11 on Windows 10.
I have a spreadsheet and an original Pandas data frame called df that I need to do filtering on. I grab some of the columns in a variable, strip() those variables, and use the variables to filter the data frame to produce variable df2. However what I’m finding is that the data in the data frame df still has an extra space at the end so my filtered data set df2 is always empty.
So when I tried this function on the dataframe:
def trim_whitespace_df_old(dataframe):
'''Remove leading/trailing whitespace from whole dataframe.'''
procname = libfilename + ":" + str(inspect.stack()[0][3]) + ":"
# iterating over the columns
for i in dataframe.columns:
# checking datatype of each columns
if dataframe[i].dtype == 'object':
# if dataframe[i].dtype == 'object' and isinstance(dataframe[i],str):
# applying strip function on column
dataframe[i] = dataframe[i].map(str.strip)
I get errors because not every cell is a string. The error is reporting some cells are a float and you cannot strip() a float value.
Also this does not do anything at all: df['ColumnName'].str.strip() I still have problems with an extra space on the end of some cells so the df filters won’t work.
How do I strip()/trim all cells in a data frame without using the above method?
I could do it in excel before I run the Python program on the spreadsheet but the only method I have found yet is using a formula, which requires me to add a new column with the formula, next to every old column. We have about 25 columns, and up to 12,000 rows in one spreadsheet.