Pandas, order groups in groupby object

caonima · September 19, 2023, 12:28pm

Is there no pandas specific forum?

Anyway, given a dataframe D and column A & B, how do you group D by A, then order the groups by the sum of column B?
i.e.

groupby = D.groupby(‘A’)
groupby = groupby[groupby.B.sum().reset_index().sort_values().index] #this line doesn’t work

?

Thanks for your answer

olekskrav · September 19, 2023, 5:52pm

I am not sure you can index groupby-object in the same way as a dataframe.
How do you want to use it further? Which dataframe do you want to get at the end?

caonima · September 19, 2023, 8:25pm

def groupby_sorted(data,columns,group_key_f,ascending=1):
    gangbang = data.groupby(columns)
    return map(lambda a:(a[0],data.loc[a[1],:]),sorted(gangbang.groups.items(),key=lambda a:group_key_f(data.loc[a[1],:]),reverse=not ascending))

this is what i want

olekskrav · September 19, 2023, 9:50pm

But how do you want to use it further? It does not seem that groupby-objects are designed for preserving the order. Probably, you can remember the proper order of group keys from here, and later use it when you are working with the resulting dataframe.

hansgeunsmeyer · September 19, 2023, 10:30pm

FYI There is a special Discourse for pandas:

(Not sure if this is really alive anymore, though - definitely doesn’t seem to be getting much traffic.)

hansgeunsmeyer · September 19, 2023, 10:38pm

This is the (correct, and pretty clean) code that ChatGPT came up with (I only added one more print statement).

import pandas as pd

# Sample DataFrame
data = {'A': ['Group1', 'Group2', 'Group1', 'Group2', 'Group3'],
        'B': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

print(df)

# Group by column A and calculate the sum of column B for each group
grouped = df.groupby('A')['B'].sum()

# Sort the groups by the sum in descending order
sorted_groups = grouped.sort_values(ascending=False)

# Display the sorted groups
print(sorted_groups)

If you want the df back with columns A and B at the end, you could also add:

sorted_groups = sorted_groups.reset_index()

I should note however that ChatGPT warns beginners:

Using AI to generate code without understanding it can hinder your programming growth and result in poor-quality solutions.

hansgeunsmeyer · September 19, 2023, 11:06pm

Coding style is always a bit subjective, but if I may comment on that, I’d say that generally speaking it’s better to not use special function names (such as “groupby”) also as variable names: this can quickly make the code unreadable (and invite bugs).

Also, when using pandas I think it’s almost never necessary to us map functions, and lambdas are only needed when calling df.apply(...) on some DataFrame (and even then it may be better to replace them by actual function calls, unless it’s a very simple column conversion).

caonima · September 20, 2023, 10:31am

sorting the groupby only yields a series of the grouped by column though, but i want to iterate the groups (including the rows contained in each group sorted by the sum of each group)

hansgeunsmeyer · September 20, 2023, 3:28pm

This doesn’t make sense to me - I don’t quite know what you mean by “iterate the groups, including the rows contained…”. An aggregation is a kind of destructive operation: the original rows are merged together, so you cannot iterate over those anymore. A groupby is essentially a reindexing based on some column values, so to make a new date frame and keep the other columns, you have to apply some aggregation to those other columns too, using the implied mapping of indices to sets of indices. If you can apply the same aggregation to each of the other columns, it’s easy:

df.groupby('A')[['B', 'C']).sum() \
  .sort_values('B', ascending=False) \
  .reset_index()

If you have multiple aggregation functions, you could apply them for instance like this:

df.groupby('A')[df.columns].agg({'B': sum, 'C':list}) \
  .sort_values('B', ascending=False) \
  .reset_index()

If you want to keep the original B values (using list as aggregation for instance) but yet sort on the sums of the rows in column B, you could do that for instance like this:

def vecsum(series):
      return [sum(x) for x in series]

df.groupby('A')['B'].agg(list) \
  .reset_index() \
  .sort_values('B', axis=0, key=vecsum, ascending=False)

Applied to

        A   B   C
0  Group1  10  10
1  Group2  20  20
2  Group1  30  30
3  Group2  40  40
4  Group3  50  50

this generates

        A         B
1  Group2  [20, 40]
2  Group3      [50]
0  Group1  [10, 30]

(If from that last dataframe you want to go back to a dataframe that contains numbers instead of lists, there is also the explode function. Call df.explode('B') on the last df.)

Topic		Replies	Views
Pandas Dataframes - exporting using the groupby statement into separate sheets in asc/dsc order Python Help	13	645	February 11, 2024
Pandas groupby doesn't work Python Help	6	6554	March 11, 2023
Can someone explain this groupby function Python Help help	0	282	November 3, 2021
Is there any concise way to create a column of groupby mean in a pandas df? Python Help	1	1749	May 23, 2023
Access the content of an object / group by Python Help	0	370	February 22, 2023

Pandas, order groups in groupby object

Related Topics