Hi ,
It’s an easy math question. I want to know how to write the code clearly and efficiently (when the data size is large). We have passengers waiting for boats (in this example, on dates 1,2,3,4,5) but we have boats only on a few of these days (in this example, dates 1,3,5). The passengers arrive at the dock on the day with a boat will depart on that day; if the passenger arrive on other days will wait until the day when there is a boat. Given the data of passenger arrival dates, we want to find the passenger departure dates. Assume that there is a boat on the last day. The input/outputs are in pandas, and I wonder if there is a good way to improve my code. In particular, is there a way to do without a for loop (which I think may be more efficient)?
# Inputs: passenger waiting data and boat schedule
df_passenger_waiting = pd.DataFrame({'date': [5, 2, 3, 4, 1], 'name': [['Bill', 'John'], ['Chris', 'Bob'], ['Alice', 'Rob', 'Ed'], ['Albert'], ['Joe']]})
df_passenger_waiting = df_passenger_waiting.sort_values('date')
boat_date = [1, 5, 3]
boat_date.sort()
# initialize a pandas dataframe for passenger departure
df_passenger_departing = pd.DataFrame(columns = ['date', 'name'])
df_passenger_departing['date'] = df_passenger_waiting['date']
# Let the passenger arrive on no-boat days wait until there is a boat
passenger_waiting_overnight = []
for i in range(len(df_passenger_departing)):
df_passenger_departing['name'][i] = []
if i in boat_date:
df_passenger_departing['name'][i] += (df_passenger_waiting['name'][i] + passenger_waiting_overnight)
passenger_waiting_overnight = []
else:
passenger_waiting_overnight += df_passenger_waiting['name'][i]
# drop the dates when there is no boat, but unfortunately the following line does not work
df_passenger_departing.dropna(inplace = True)