Now, this is interesting. I’ve checked the type of df and df_scaled and they’re both pandas.core.frame.DataFrame. Why would df_scaled behave differently (first matching column of row returned) than the expected behavior exhibited by df (all row returned from matches)?
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
df = pd.read_csv("C:\\Users\\pb\\Downloads\\test.csv")
ss = StandardScaler()
df_scaled = pd.DataFrame(ss.fit_transform(df[['first', 'second', 'third']]), columns=[['first', 'second', 'third']])
print(df_scaled)
print(df_scaled[ df_scaled['first'] > 0])
print(df)
print(df[df['first'] > 0])
Output:
first second third
0 -1.227998 -1.488388 -0.953198
1 NaN NaN -0.822287
2 -0.955109 -1.151395 -0.789559
3 -0.818665 NaN -0.691376
4 0.818665 1.039064 1.403206
5 1.773775 1.123312 1.828668
6 0.136444 0.196580 -0.004091
7 0.272888 0.280828 0.028637
first second third
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 0.818665 NaN NaN
5 1.773775 NaN NaN
6 0.136444 NaN NaN
7 0.272888 NaN NaN
first second third Gender Color Type
0 1.0 2.0 4 Male Red Meh
1 NaN NaN 8 Male Green Super cool
2 3.0 6.0 9 Female Purple Cool
3 4.0 NaN 12 Male Red Meh
4 16.0 32.0 76 Female Orange Cool
5 23.0 33.0 89 Female NaN Super cool
6 11.0 22.0 33 Female Tan Cool
7 12.0 23.0 34 Female Black Super cool
first second third Gender Color Type
0 1.0 2.0 4 Male Red Meh
2 3.0 6.0 9 Female Purple Cool
3 4.0 NaN 12 Male Red Meh
4 16.0 32.0 76 Female Orange Cool
5 23.0 33.0 89 Female NaN Super cool
6 11.0 22.0 33 Female Tan Cool
7 12.0 23.0 34 Female Black Super cool