Hope someone will be able to help me here, it’s been days Im trying to merge two dataframe with value that are not exactly the same.
To be able to do that Im using recordlinkage:
df1 : 8411 rows, 15 columns
df2 : 79 rows, 8columns
I want to add information from df1 to df2 using the column INV_NO but the information is not written properly in both columns :
for example in df2 you have T49528 and in df1 you have ST49528
an other example df2 you have 2587 and df1 you have C5635-002587.
my code so far :
import pandas as pd
df1 = pd.read_excel(r’C:…\File.xlsx’)
df2 = pd.read_excel(r’C:…\File2.xlsx’)
#to make sure all lines are at string format
df2[‘INV_NO’] = df2[‘INV_NO’].astype(str)
df1[‘INV_NO’] = df1[‘INV_NO’].astype(str)
indexer = recordlinkage.Index()
candidates = indexer.index(df2,df1)
compare = recordlinkage.Compare()
features = compare.compute(candidates,df2,df1)
When I look at the result (features df) I can see the line 2587 is not there that means it did not find a match on df1 but I know there is one.
I really don’t understand why. If someone can help me please.
Thanks in advance
(Is there a way to add files to the post ?)