Pandas Dataframes - exporting using the groupby statement into separate sheets in asc/dsc order

Hi-

I created a dataframe, and it’s all in one Excel sheet. So I used groupby to export each value into a separate worksheet, as such:

with pd.ExcelWriter(full_path) as writer:
for key, g in df.groupby(‘Field Name’):
g.to_excel(writer, sheet_name=str(key.strftime(‘%Y-%m-%d’)), index=False, header=True)

The key is a date field representing the last day of the week, and it works correctly…took me a bit to figure this out, but it’s working - it’s exporting each set of records for a given date to a separate sheet. But I want to take it one step further - I want to change the order of the sheets so the latest date gets put as the first sheet, etc. I have tried sort_values, I have tried grouping and sorting the data into a dataframe and then regrouping upon export, to no avail.

Hopefully someone can help - it doesn’t look like it should be too difficult, but I have yet to find an answer on Google.

Thanks!

David

Hi.

I think the problem is caused by the groups not being in your desired order, prior to writing to the Excel workbook.

So the first step is to ensure your groups are in the desired order. Sorting your groups in descending order of your datetime field should work if you pre-sort the rows by Field Name with ascending=False, before the groupby, AND you specify sort=False in the groupby.

Here’s an example below, and I’m using a datetime column named Field as the groupby key, with values created out of order.

In [1]: from datetime import timedelta; import pandas as pd

In [2]: df = pd.DataFrame().assign(
     ...:     Field=[now - timedelta(days=2), now, now - timedelta(days=1)],
     ...:     SomeOtherField=[2, 0, 1]
     ...: )

In [3]: df
Out[3]: 
                       Field  SomeOtherField
0 2024-02-06 23:02:46.253608               2
1 2024-02-08 23:02:46.253608               0
2 2024-02-07 23:02:46.253608               1

Note: The Field column is created from datetime objects so it will end up as datetime64[ns].

Now I pre-sort the rows by Field with ascending=False, chained with the groupby on Field with sort=False:

In [4]: for key, _ in df.sort_values('Field', ascending=False).groupby('Field', sort=False):
     ...:     print(key)
     ...: 
2024-02-08 23:02:46.253608
2024-02-07 23:02:46.253608
2024-02-06 23:02:46.253608

The documentation on Pandas groupby key sorting suggests that the default option of sort=True uses ascending ordering, so sort=False would reverse that.

Without the pre-sorting step the groupby with sort=False does not work.

In [5]: df = pd.DataFrame().assign(
     ...:     Field=[now - timedelta(days=2), now, now - timedelta(days=1)],
     ...:     SomeOtherField=[2, 0, 1]
     ...: )

In [6]: for key, _ in df.groupby('Field', sort=False):
     ...:     print(key)
     ...: 
2024-02-06 23:02:46.253608
2024-02-08 23:02:46.253608
2024-02-07 23:02:46.253608

I’ve tested the steps to write the Excel workbook, and it seems to work.

In [7]: with pd.ExcelWriter('test.xlsx') as writer:
     ...:     for key, g in df.sort_values('Field', ascending=False).groupby('Field', sort=False):
     ...:         g.to_excel(writer, sheet_name=str(key.strftime("%Y-%m-%d")), index=False, header=True)
     ...: 

I don’t have Excel installed, but if I open this test.xlsx file with a free online Excel editor I can see the sheets ordered in descending order of date.

Thanks! I’ve tried some of this, but I will look at your example in more detail tomorrow…hopefully I made a mistake in there somewhere. The other option would be to reorder the sheets, but I would think there would be a way to do it on export.

Thx again for your help

David

Np. I used Pandas version 2.2.0, and it does work for me, at least.

So first I had to add the sort_values to the groupby statement…my first attempt failed as I put it on the end of the statement…then I reviewed your code, and it was before the group by statement. Still wasn’t working (although no error was generated). So now I was at the point where I double check everything - I thought I had included the sort=False, but I had modified that line so many times I probably tried removing it and left it out…once I added that back in, it worked! Now the rest is essentially formatting…Many Thanks!

1 Like

Great.

PS This is more a question about Pandas, but if groupby with sort=False does not work on its own, without presorting the rows in descending order using sort_values(ascending=False), what is the point of it?

To answer my own question, re-reading the documentation it looks like groupby sort=False on its own is not designed to perform a descending sort. But can achieve this in combination with a presorting of rows with sort_values(ascending=False).

I tried sorting the dataframe before doing the groupby statement....that failed - I tried Ascending and Desc, no difference. I think it got reset when it did the groupby. So I had to sort it in the groupby statement, and apparently sort=False has nothing to do with the order of the observations in each group. It's kind of a weird command, because you can't do anything with it - you can't change the sort order, so that leaves only one sort direction, and maybe not the one you want!
sortbool, default True

Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group. If False, the groups will appear in the same order as they did in the original DataFrame. This argument has no effect on filtrations (see the filtrations in the user guide), such as head(), tail(), nth() and in transformations (see the transformations in the user guide).

Yes, but your problem was about the order of groups of rows, not of rows within each group, as each key represents a group.

The example I reproduced above proves that a presort of rows on Field with ascending=False is required to get the desired descending key order. Without the presort, and only groupby('Field', sort=False) you do not get the desired order.

Apparently that command doesn’t do exactly what we understand the documentation to say, as I tried True and False without adding the sort portion to the groupby statement, and it didn’t change anything. But at least now I’m aware that the sort has to be set to false, and it says if you make it True it will slow you down…

Many Thanks!

Have a great weekend!

Actually, maybe it does do something…maybe the sort order that command puts the groups in is the order it is already in, and so it’s already sorted that way by chance…hmm…I’ll keep that in mind next time I use the groupby statement.

Yes of course it does, otherwise I would not have posted my example. :slight_smile: Hopefully I’m not belabouring this point, but the row presorting on the key column, with ascending=False, puts the rows in descending order of the key column (datetime), and that makes groupby on the key column, with sort=False, work, because the unique keys representing each group are put in the same order as the rows.

I would suggest posting your excerpts/examples so that it’s clear what you’ve actually tried. Otherwise it’s hard to know.

1 Like

It’s on my work computer…so unfortunately I can’t share any of the output, but maybe I can dummy up a few rows so you can see what it looks like. I’ll have to do that tomorrow…enjoy your evening.

Not required now, as it seems to work for you.

It was a general suggestion for posts in this forum.