Question about rank

Hi, I have a question about how rank works.

According to the documentation, “By default, equal values are assigned a rank that is the average of the ranks of those values.” Sounds like chicken and the egg issue. How do, say two equal values, be assigned a rank that is the average of the ranks of these two values before they even got ranked? Considering Example 1. Could you please tell me how do those 6.5 and 4.5 values be calculated?

# Example 1

In [222]: num = pd.Series([7, -5 , 7, 4 ,2, 0, 4])

In [223]: num.rank()

Out[223]:

0 6.5

1 1.0

2 6.5

3 4.5

4 3.0

5 2.0

6 4.5

dtype: float64

Your data looks like this: [7, -5 , 7, 4 ,2, 0, 4]

Sorted, that becomes [-5, 0, 2, 4, 4, 7, 7]. So:

  • the value -5 is ranked first;
  • the value 0 is ranked second;
  • the value 2 is ranked third;
  • the value 4 is ranked fourth and fifth, for an average of 4.5;
  • the value 7 is ranked sixth and seventh, for an average of 6.5,

So the table of ranks are:

values: [7,  -5,   7,   4,   2,   0,   4]
ranks:  [6.5, 1.0, 6.5, 4.5, 3.0, 2.0, 4.5]
2 Likes

Understood. Thank you.