# Find the intersection of one list and one dataframe

``````df = pd.DataFrame({'a': [7,8,9], 'b':[4,5,6]})
y = [11,1,6]
``````

I want to find the intersection of df[‘b’] and y, but this

``````[x for x in y if x in df['b']]
``````

gives me

``````[1]
``````

but I expect

``````[6]
``````

In order to get [6], I need

``````[x for x in y if x in list(df['b'])]
``````

Why doesn’t my first way give me a [6]?

Inspecting in the REPL gives us a hint of what’s happening:

``````>>> df = pd.DataFrame({'a': [7,8,9], 'b':[4,5,6]})
>>> df['b']
0    4
1    5
2    6
Name: b, dtype: int64
>>> 0 in df['b']
True
>>> 1 in df['b']
True
>>> 2 in df['b']
True
>>> 3 in df['b']
False
>>> 4 in df['b']
False
>>> 5 in df['b']
False
``````

The condition `x in df['b']` tests whether `x` is one of the elements of the index of `df['b']` (which you can access directly via `df['b'].index`). When you write `list(df['b'])`, you are transforming the pandas series `df['b']` into a Python `list`, which has no concept of “index”, so `x in some_list` really tests whether `x` is one of the elements of `some_list`.

In order to get intersections in a straightforward way, you can use sets:

``````>>> set(df['b']).intersection([11, 1, 6])
{6}
``````
1 Like

Hey John @fonini has put it well clear…its all about data types. elements in the df are being stored as pandas series objects while a list a can have diferrent data types which I was playing about and found comparing a `"6"` to an int 6 with the list method will return true
The `set` method con limits to unique intersection.
nywy also found numpy already has a method for this which would be nicer for large dataset

``````import pandas as pd
import numpy as np

df = pd.DataFrame({'a': [7, 8, 9, 8], 'b': [4, 5, 6, 9]})
y = [11, 9, '6', 6]

# Convert the 'b' column to a numpy array
b_values = df['b'].values

intersection = np.intersect1d(b_values, y)   # exolicit with the data types

print(intersection)

print([x for x in y if x in df['b'].to_list()])  # if you use this method it will compare string and int

print(set(df['b']).intersection([11, 1, 6]))

``````