DataFrame select condition only selecting single column value which matches instead of entire row

patricio2626 · December 28, 2024, 10:08pm

Now, this is interesting. I’ve checked the type of df and df_scaled and they’re both pandas.core.frame.DataFrame. Why would df_scaled behave differently (first matching column of row returned) than the expected behavior exhibited by df (all row returned from matches)?

import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler

df = pd.read_csv("C:\\Users\\pb\\Downloads\\test.csv")
ss = StandardScaler()
df_scaled = pd.DataFrame(ss.fit_transform(df[['first', 'second', 'third']]), columns=[['first', 'second', 'third']])
print(df_scaled)
print(df_scaled[ df_scaled['first'] > 0])
print(df)
print(df[df['first'] > 0])

Output:

first    second     third
0 -1.227998 -1.488388 -0.953198
1       NaN       NaN -0.822287
2 -0.955109 -1.151395 -0.789559
3 -0.818665       NaN -0.691376
4  0.818665  1.039064  1.403206
5  1.773775  1.123312  1.828668
6  0.136444  0.196580 -0.004091
7  0.272888  0.280828  0.028637
      first second third
0       NaN    NaN   NaN
1       NaN    NaN   NaN
2       NaN    NaN   NaN
3       NaN    NaN   NaN
4  0.818665    NaN   NaN
5  1.773775    NaN   NaN
6  0.136444    NaN   NaN
7  0.272888    NaN   NaN
   first  second  third  Gender   Color        Type
0    1.0     2.0      4    Male     Red         Meh
1    NaN     NaN      8    Male   Green  Super cool
2    3.0     6.0      9  Female  Purple        Cool
3    4.0     NaN     12    Male     Red         Meh
4   16.0    32.0     76  Female  Orange        Cool
5   23.0    33.0     89  Female     NaN  Super cool
6   11.0    22.0     33  Female     Tan        Cool
7   12.0    23.0     34  Female   Black  Super cool
   first  second  third  Gender   Color        Type
0    1.0     2.0      4    Male     Red         Meh
2    3.0     6.0      9  Female  Purple        Cool
3    4.0     NaN     12    Male     Red         Meh
4   16.0    32.0     76  Female  Orange        Cool
5   23.0    33.0     89  Female     NaN  Super cool
6   11.0    22.0     33  Female     Tan        Cool
7   12.0    23.0     34  Female   Black  Super cool

TIGirardi · December 30, 2024, 4:26pm

You are using columns=[['first', 'second', 'third']], try this: columns=['first', 'second', 'third']