After getting an unexpected value from scipy.stats.iqr I discovered the error was coming from np.percentile. Here’s an example of my issue:
There are 10 values in x, the 25th percentile should be mean of 2nd and 3rd value, both of which are ‘1’, so the result should = 1,
But np.percentile(x, 25) returns 1.5
I get that Python starts counting at 0, but when using percentile it shouldn’t just ignore the first value in the list.
Presumably, this relatively basic and common function of NumPy in 2023 doesn’t have an “Error”, but I truly feel like it’s returning the wrong value. Am I crazy? Can someone explain this issue to me?
What version of numpy is this? I get
It turns out “percentile” doesn’t have a strict definition. The docs for
numpy.percentile lay out the many different methods you can use.
options = [
x = [1,1,1,2,2,2,3,3,3,4]
for opt in options:
print(np.percentile(x, 25, method=opt))
outputs the following
1.25 # this one is the default :D
From the documentation:
Given a vector
V of length
q-th percentile of
V is the value
q/100 of the way from the minimum to the maximum in a sorted copy of
V… This function is the same as the median if
q=50, the same as the minimum if
q=0 and the same as the maximum if
It goes into detail after that, but the application is clear. Since there are 10 elements in your array, it takes 9 steps through the array to get from the minimum value to the maximum value. 25 percent of 9 is 2.25, so it conceptually starts at index 0 and takes 2.25 steps from there, landing between elements 2 and 3 (in zero-based indexing).