Find the intersection of one list and one dataframe

john316 · October 19, 2023, 1:38pm

df = pd.DataFrame({'a': [7,8,9], 'b':[4,5,6]})
y = [11,1,6]

I want to find the intersection of df[‘b’] and y, but this

[x for x in y if x in df['b']]

gives me

[1]

but I expect

[6]

In order to get [6], I need

[x for x in y if x in list(df['b'])]

Why doesn’t my first way give me a [6]?

fonini · October 19, 2023, 2:35pm

Inspecting in the REPL gives us a hint of what’s happening:

>>> df = pd.DataFrame({'a': [7,8,9], 'b':[4,5,6]})
>>> df['b']
0    4
1    5
2    6
Name: b, dtype: int64
>>> 0 in df['b']
True
>>> 1 in df['b']
True
>>> 2 in df['b']
True
>>> 3 in df['b']
False
>>> 4 in df['b']
False
>>> 5 in df['b']
False

The condition x in df['b'] tests whether x is one of the elements of the index of df['b'] (which you can access directly via df['b'].index). When you write list(df['b']), you are transforming the pandas series df['b'] into a Python list, which has no concept of “index”, so x in some_list really tests whether x is one of the elements of some_list.

In order to get intersections in a straightforward way, you can use sets:

>>> set(df['b']).intersection([11, 1, 6])
{6}

kyle · October 20, 2023, 3:10am

Hey John @fonini has put it well clear…its all about data types. elements in the df are being stored as pandas series objects while a list a can have diferrent data types which I was playing about and found comparing a "6" to an int 6 with the list method will return true
The set method con limits to unique intersection.
nywy also found numpy already has a method for this which would be nicer for large dataset

import pandas as pd
import numpy as np

df = pd.DataFrame({'a': [7, 8, 9, 8], 'b': [4, 5, 6, 9]})
y = [11, 9, '6', 6]

# Convert the 'b' column to a numpy array
b_values = df['b'].values

intersection = np.intersect1d(b_values, y)   # exolicit with the data types

print(intersection)

print([x for x in y if x in df['b'].to_list()])  # if you use this method it will compare string and int

print(set(df['b']).intersection([11, 1, 6]))

Topic		Replies	Views
Multiple loop in comprehension Python Help	2	594	March 24, 2023
Pandas, how to join dataframes if one has multiindex Python Help help	4	975	September 13, 2023
Why would there be more than one value in a specific index in the Pandas series Python Help help	1	1536	January 14, 2021
Strange dataframes behaviour- is there something wrong with the code? Python Help	0	241	February 23, 2023
For Loop Compared to Static Number Python Help	6	451	December 7, 2022

Find the intersection of one list and one dataframe

Related Topics