I’m currently working on Python code for my thesis, and for part of it, I need to use the numpy quantile function to calculate the 85th percentile of a moving window across a temperature (*F) dataset seasonally. The moving window is 30 years long and advances a singular year per step. When I take the average of this moving window, I get a fluctuating line, but when I take the 85th percentile using the quantile or percentile functions I get a stepwise trend across the dataset, which doesn’t make sense because it should be varying slightly every year. The dataset pictured is for spring if I remember correctly. Does anyone know what’s going on? I’d ideally like to make an 85th percentile graph that fluctuates in a similar way to the average graph.
Note that PrecipPercentile85 is an empty dictionary, and fall, winter, spring and summer are all lists that contain temperature float values in Fahrenheit. The left graph is using the average and the right graph is using the quantile function. Apologies for the lack of titles, also ignore the red dot on the first graph, don’t know how that got there.
Are you able to post a minimal example with working code? Without the full context it is hard to provide help.
Note that for a quantile on a moving average I would expect a stepwise result. Even if the values are changing a bit each year, the quantile will stay the same if the values moving in and out of the window are on the same side of the quantile.
Thanks for your response. I’m not sure a code sample is entirely necessary to answer my question. I’m really just confused about how the quantile function works versus average. I assume it’s taking a list, ordering the elements, and then finding the value, or in between value in the case of an even number of elements, that defines where 25 percent of the values fall above that specific value (in the case of the 85th percentile). What you’re saying makes sense to me, but the 30 year window advances yearly for each season, meaning that there will be around 90 elements taken out of each season list and 90 new elements put into each season list every time the window advances, and I think it’s unlikely that a 180 element change in each season list wouldn’t affect the 85th percentile calculation, because average daily temperature fluctuates a decent amount. Does that make sense? If you need a code sample, let me know and I’ll try to make one.
I suggest you inspect the raw values for each time point to convince yourself that the calculation is correct.
Maybe this is standard for the field but to me it feels strange to show the average (I assume the mean?) alongside a quantile. What does the median look like?
This is probably due to the temperature values being integers (at least the larger ones that influence the calculation of the 85th percentile). If that’s the case, the default interpolation method (linear) will average two integers, resulting in either an integer or an integer ± 0.5. This is effectively a step function. The grand average, on the other hand, is computed over the entire dataset, so it appears smoother even if all values are integer.
Fair point. Honestly when I posted this it didn’t occur to me that the 85th percentile was calculated in the same way as the median. Good suggestion, I’ll take a look. Thanks for the help
Sorry I confused ‘midpoint’ with ‘linear’. Linear doesn’t necessarily produce a step function. I think @eendebakpt’s explanation is more likely (but the values are most likely integers).