How do you create household level flags based off person level data?

I am trying to create a function which creates a new column that identifies the oldest member of a household. The data looks something like this:

Household_id Person Date_of_birth
1234 1 12/09/1994
1234 2 04/01/1967
1234 3 21/10/1953
4321 1 11/04/1981
4321 2 18/06/1988

and I want to create this:

Household_id Person Date_of_birth Oldest_member
1234 1 12/09/1994 0
1234 2 04/01/1967 0
1234 3 21/10/1953 1
4321 1 11/04/1981 1
4321 2 18/06/1988 0

Where the oldest member is assigned 1 and everyone else in the household is assigned 0.
Any suggestions on how to tackle this or functions I could look up would be greatly appreciative, I’m still new to python and I’m quite stuck on this.

1 Like

I’d scan the rows and make a dict where the key is the household id and the value is a list of dates of birth for that household, then from that make a dict where the key is the household id and the value is minimum (earliest) date of birth, then scan the rows again to make the new table, using the second dict to identify whether that person in that household has the earliest date of birth.

2 Likes

Should be easy with pandas, using group-by and numpy.where:

mask = df.groupby("Person").idxmax()["Date_of_birth"].values

df["Oldest_member"] = np.where(df.index.isin(mask), 1, 0)
2 Likes

Thank you!

2 Likes