I’ve got an assignment to use a lambda function in agg to aggregate an imported dataframe returning the number of rows with a value greater than 1.5. So far I have the following, but I don’t think its exactly what I’m supposed to pull and further so, it doesn’t have the built in function.
print(df.iloc[:,:-1].agg([lambda x : [i for i in x if i> 1.5]]))
Throughout the DataFrame or per column?
Your attempt is pretty close if your want it per column. Your lambda should be
lambda x: len([i for i in x if i > 1.5]).
However, I don’t understand why the assignment says to use
agg. It’s typically used for applying multiple independent operations to a DataFrame and aggregating the results. If all you want is a single result (“number of rows containing a value greater than 1.5”)
apply is a better choice.
Thank you very much. It is a training exercise mostly learning to work with agg . That’s why its asking for it specifically.
The question is written as follows:
Aggregate the dataframe to show the number of rows greater than 1.5 in each columb using a lambda expresion in agg(), and built in functions.
If you read it as “show, for each column, the number of rows where …”, then @abessman’s suggestion works very nicely.
If you want to count the number of rows where all columns need to be > 1.5 then it can be simplified:
0 2 3
1 2 3
2 2 3
3 1 2
4 1 2
5 0 1
6 0 1
>>> sum(df.agg(lambda row: all(row > 1.5), axis=1))