Inspecting in the REPL gives us a hint of what’s happening:
>>> df = pd.DataFrame({'a': [7,8,9], 'b':[4,5,6]})
>>> df['b']
0 4
1 5
2 6
Name: b, dtype: int64
>>> 0 in df['b']
True
>>> 1 in df['b']
True
>>> 2 in df['b']
True
>>> 3 in df['b']
False
>>> 4 in df['b']
False
>>> 5 in df['b']
False
The condition x in df['b'] tests whether x is one of the elements of the index of df['b'] (which you can access directly via df['b'].index). When you write list(df['b']), you are transforming the pandas series df['b'] into a Python list, which has no concept of “index”, so x in some_list really tests whether x is one of the elements of some_list.
In order to get intersections in a straightforward way, you can use sets:
Hey John @fonini has put it well clear…its all about data types. elements in the df are being stored as pandas series objects while a list a can have diferrent data types which I was playing about and found comparing a "6" to an int 6 with the list method will return true
The set method con limits to unique intersection.
nywy also found numpy already has a method for this which would be nicer for large dataset
import pandas as pd
import numpy as np
df = pd.DataFrame({'a': [7, 8, 9, 8], 'b': [4, 5, 6, 9]})
y = [11, 9, '6', 6]
# Convert the 'b' column to a numpy array
b_values = df['b'].values
intersection = np.intersect1d(b_values, y) # exolicit with the data types
print(intersection)
print([x for x in y if x in df['b'].to_list()]) # if you use this method it will compare string and int
print(set(df['b']).intersection([11, 1, 6]))