Rose
(Rose)
October 19, 2022, 9:58am
1
Hello , I am trying to convert a household level dataset to an item level dataset and struggling to know where to start. Any recommendations on what functions I could use would be greatly appreciated. The iteration of the person column and the item column matches
Here is an example of the data:
household_id
person_1
person_2
person_3
item_1
item_2
item_3
123
1
chair
134
2
car
156
3
1
2
toy
vase
bowl
144
1
1
1
tv
radio
fabric
and here is what I am trying to create…
Household
Person
Item
123
1
Chair
134
2
car
156
1
vase
156
2
bowl
156
3
toy
144
1
tv
144
1
radio
144
1
fabric
there might be an easier way to do this, but maybe,
for i in [1, 2, 3]:
household_df[f'person_item_{i}'] = household_df[f'person_{i}'].astype(str) + '_' + household_df[f'item_{i}']
pd.melt(household_df, id_vars=['household_id']).drop(columns='variable').dropna(subset='value')
household_df[['person', 'item']] = household_df.value.str.split('_', expand=True)
Rose
(Rose)
October 24, 2022, 9:59am
3
Thank you! I have been trying to get this to work but having some issues with the last line. I get the error ‘DataFrame’ object has no attribute ‘value’.
we will have to keep dropping columns,
import pandas as pd
household_df = pd.read_clipboard()
for i in [1, 2, 3]:
household_df[f'person_item_{i}'] = household_df[f'person_{i}'].astype(str) + '_' + household_df[f'item_{i}']
household_df = household_df.drop(columns=['person_1', 'person_2', 'person_3', 'item_1', 'item_2', 'item_3'])
result_df = pd.melt(household_df, id_vars=['household_id']).drop(columns='variable').dropna(subset='value')
result_df[['person', 'item']] = result_df.value.str.split('_', expand=True)
result_df = result_df.drop(columns='value')