Why is my script removing Date column when i groupby().SUM() but not when i groupby().MEAN()?

My script:

def flowData():
flowData = pd.read_csv(‘flow_data_imperial_units.csv’)

Converting date column from string to datetime:

flowData['date']= pd.to_datetime(flowData['date'])

Adding columns for year:

flowData['year']=np.NaN

Convert to correct units excluding the date column:

for colm in flowData.columns[1:]: 
    flowData[colm] = flowData[colm]*1.983  # conversion from cfs to AcFt
for i in range(len(flowData)):
    flowData.loc[i, 'year']=flowData.loc[i, 'date'].year
GroupedFlow=flowData.groupby(['year']).sum(numeric_only=False)
return GroupedFlow

If i change “sum” to “mean” in the next-to-last line, the “date” column appears, but not of it’s “sum”. Why is that, and how do i fix it? Is there something i can add to ignore the date column when summing?

Btw, this script was written FOR me, so please reply as though i’m five. :slight_smile:

It’s possible that it is because possible to compute the mean of a column of dates, but not the sum of a column of dates. You’d need to confirm that in the pandas documentation.

Thanks, Kevin. I wondered that too, but couldn’t think of why it wouldn’t just sum the date code if it can average it. Is it because it’s in date format?

Because the result would not be usable as a date, it would be illogical. If you want a sum of the dates represented as numbers (and not as dates), change the type of the column after converting the string to a date.

1 Like

Because the average of now and an hour from now makes sense (it’d be half an hour from now, right?) but the sum does not (it would depend on the arbitrary point from which you start counting dates/times).

2 Likes

Thanks, guys. Makes sense. Any suggestions for adding back-in a year column for this summed table?