Check for values in a table from a different table & keep only the resulting values

Hi All,

I have 2 tables

Table 1:

BUSINESS PRODUCT_1 BUSINESS PRODUCT_1_PK BUSINESS PRODUCT_2 BUSINESS PRODUCT_2_PK BUSINESS PRODUCT_3 BUSINESS PRODUCT_3_PK BUSINESS PRODUCT_4 BUSINESS PRODUCT_4_PK BUSINESS PRODUCT_5 BUSINESS PRODUCT_5_PK BUSINESS PRODUCT_6 BUSINESS PRODUCT_6_PK BUSINESS PRODUCT_7 BUSINESS PRODUCT_7_PK BUSINESS PRODUCT_8 BUSINESS PRODUCT_8_PK
1 PK 1 PK 1 PK 1 PK 1 PK 1 PK 1 PK 1 PK
2 PK 3 PK PK 3 PK 4 PK CRDT_TRD PK 5 PK 6 PK
2 PK 3 PK PK 3 PK 4 PK 4 PK 5 PK 6 PK
2 PK 3 PK PK 3 PK 4 PK 4 PK 5 PK 6 PK
2 PK 3 PK PK 3 PK 4 PK 4 PK 5 PK 6 PK
2 PK 3 PK PK 3 PK 4 PK 4 PK 5 PK 6 PK
2 PK 3 PK PK 3 PK 4 PK 4 PK 5 PK 6 PK
2 PK 3 PK PK 3 PK 4 PK 4 PK 5 PK 6 PK
2 PK 3 PK PK 3 PK 4 PK 4 PK 5 PK 6 PK
2 PK 3 PK PK 3 PK 4 PK 4 PK 5 PK 6 PK
2 PK 3 PK PK 3 PK 4 PK 4 PK 5 PK 6 PK
TOTAL_PRODUCT PK 3 PK PK 3 PK 4 PK 4 PK 5 PK 6 PK
3 PK 3 PK PK MACRO_TRD PK 4 PK 4 PK 5 PK 6 PK
3 PK 3 PK PK 3 PK 4 PK 4 PK 5 PK 6 PK
3 PK 3 PK PK 3 PK 4 PK 4 PK 5 PK 6 PK
3 PK 3 PK PK 3 PK 4 PK 4 PK 5 PK 6 PK
3 PK 3 PK PK 3 PK 4 PK 4 PK 5 PK 6 PK
3 PK 3 PK PK 3 PK 4 PK 4 PK 5 PK 6 PK
3 PK 3 PK PK 3 PK 4 PK 4 PK 5 PK 6 PK
3 PK 3 PK PK 3 PK 4 PK 4 PK 5 PK 6 PK
3 PK 3 PK PK 3 PK 4 PK 4 PK 5 PK 6 PK

table 2 :

RULE Business Product
R5D
R11
R8
R4
R14A
R2 TOTAL_PRODUCT
R7
R14B
R2 MACRO_TRD
R5B
R5A
R6
R1
R3
R12
R10
R9
R5C
R2 CRDT_TRD
R13
R5E

the code has to check for the values from table 2 in table, ie, business product in all the columns of table 1 & return the output as follows:

RULE BUSINESS PRODUCT_1 BUSINESS PRODUCT_1_PK BUSINESS PRODUCT_2 BUSINESS PRODUCT_2_PK BUSINESS PRODUCT_3 BUSINESS PRODUCT_3_PK BUSINESS PRODUCT_4 BUSINESS PRODUCT_4_PK BUSINESS PRODUCT_5 BUSINESS PRODUCT_5_PK BUSINESS PRODUCT_6 BUSINESS PRODUCT_6_PK BUSINESS PRODUCT_7 BUSINESS PRODUCT_7_PK BUSINESS PRODUCT_8 BUSINESS PRODUCT_8_PK
R1
R2 CRDT_TRD PK
R3
R5B
R5C
R6
R8
R9
R11
R14A
R14B
R2 TOTAL_PRODUCT PK
R2 MACRO_TRD PK
R4
R5A
R5D
R5E
R7
R10
R12
R13

Just an FYI : the values u see after a cell is matched, say for eg: TOTAL_PRODUCT was found in BUSINESS_PRODUCT_1, then the next column value, ie , BUSINESS_PRODUCT_1_PK has to be retained.

Hope its clear

thanks in advance

Post your code and let us know what you are need help with.

1 Like

thanks for the response.

i’m not good with Python & have been using ChatGpt to get the code.

This section i’m working on is a part Dataiku (ETL tool) workflow where i have to use a python script to use this 2 tables & get the desired output table as shown in the question i have posted.
<
import dataiku
import pandas as pd

table1 = dataiku.Dataset(“Table1”).get_dataframe()
table2 = dataiku.Dataset(“Table2”).get_dataframe()

rule_to_product = dict(zip(table2[‘Rule’], table2[‘Product’]))

product_columns = [col for col in table1.columns if col.startswith(“Product”)]

updated_table = table2[[‘Rule’]].copy()

for product_column in product_columns:
updated_table[product_column] = table1[product_column].apply(lambda x: rule_to_product.get(x, ‘’))

output_dataset = dataiku.Dataset(“Output_Dataset”) # Replace with the name of your output dataset
output_dataset.write_with_schema(updated_table)
/>
this is the code, but i’m only getting Rule column as output.

Hope its clear

Please format you code using the </> button.
You can edit your post to fix this so that it is possible to read your code.

Beware that chatgpt does not “know” how to code. It can suggest possible code, but it is only a suggestion and often will not work.

i have made a small change, not sure if its readable now

Sorry I was not clear. You do not insert the </> as text, its a button you can click on.
What you need is the triple back quotes around your code like this:

```
type or paste code here
```
import dataiku
import pandas as pd

table1 = dataiku.Dataset(“Table1”).get_dataframe()
table2 = dataiku.Dataset(“Table2”).get_dataframe()

rule_to_product = dict(zip(table2[‘Rule’], table2[‘Product’]))

product_columns = [col for col in table1.columns if col.startswith(“Product”)]

updated_table = table2[[‘Rule’]].copy()

for product_column in product_columns:
updated_table[product_column] = table1[product_column].apply(lambda x: rule_to_product.get(x, ‘’))

output_dataset = dataiku.Dataset(“Output_Dataset”) # Replace with the name of your output dataset
output_dataset.write_with_schema(updated_table)

any update on this ? @barry-scott