Clearing ValueError While Trying To Create A New Column

sapy · May 22, 2024, 2:20pm

Hi,

I have a function that I’m using to create a new column, called ‘Email Type’. The new column is supposed to have two values, “Triggered” or “BNB”; and it is based on the condition that a substring is found within another string, in an existing column, called ‘Content’. The purpose of the function is to identify the substring in the existing ‘Content’ column and return the appropriate label for the new column, "Email Type’ .

This is my function:

def emailtype(type):
if df[‘content’].str.contains(‘Welcome’):
return ‘Triggered’
elif df[‘Content’].str.contains(‘Signup’):
return ‘Triggered’
elif df[‘Content’].str.contain(‘Batch’):
return ‘BNB’
else:
return ‘BNB’

and this is how I’m trying to creat the column:

df[‘Email Type’] = df.apply(emailtype, axis=1)

But when I run try to run this command, I get a value error:

ValueError: The truth value of a series is ambiguous. Use a.empty, a.bool(), a.item(), a.any(), or a.all()

How do I resolve this error and create the new column? Any help is appreciated.

Sajeel

kknechtel · May 22, 2024, 9:09pm

First, please check this advice:

And please also read the pinned thread in order to understand how to post code properly for this forum. You will need it for future questions. Even a simple code example like the one you posted here, was hard to understand without proper formatting.

Now we can talk about what’s wrong in the actual code.

When you use df.apply, the function that you call will be passed either a row or a column (according to axis) each time it’s called. With axis=1, you get rows, which is what you want - good. In that function, you need to use that row or column to compute the result. If you write that function to use the global df again, then you missed the point completely.

You get an error because df['content'] means an entire column of the original DataFrame, and so df['content'].str.contains('Welcome') means an entire column of boolean results. We can’t use that with if:

It’s the same as in NumPy:

So, instead when we write the emailtype function, we must use the value that was passed in. That value is a row of the DataFrame, so let’s call it row instead of type.

When we index by column into that row, we get a single cell value - which is what we want. But this means we don’t use special tools like .str or .contains any more, because those are for Series. Instead, we directly convert the single cell to string, and use ordinary string methods on it.

And, of course, we must be careful about the capitalization for our column name.

So, now we can write something like:

def emailtype(row):
    if 'Welcome' in str(row['Content']):
        return 'Triggered'
    ... # etc.