Statistics.stdev() does not agree with my own standard deviation calculation

Hello all,

some might argue this is more of a maths question, but I feel like the answer is something Python related, possibly something to do with Python floats - though I’m not sure.

I’ve written a small script that generates a set of ten numbers between one and ten inclusive and then calculates the standard deviation using an algorithm that I just looked up on Wikipedia. Then, finally, it calculates the standard deviation of the set using statistics.stdev() and compares the two.

Both calculated values are sort of in the same ballpark, but also miles off depending on how you look at it, and I can’t work out why. If someone could weigh on on this I’d appreciate it greatly. Here comes the script…

from random import randrange
from statistics import mean, sqrt, stdev

values = []
values_mean = None
deviations = []
deviations_squared = []
variance = None
standard_deviation = None

# generate ten random values
for _ in range(10):
    values.append(randrange(1,10))

# calculate the mean of the random values
values_mean = mean(values)

# calculate the deviation from the mean for each value
for i in range(len(values)):
    deviations.append(values_mean - values[i])

# build a list of squared deviations
for i in range(len(deviations)):
    deviations_squared.append(deviations[i]**2)

# calculate the mean of the squared deviations (the "variance")
variance = mean(deviations_squared)

# calculate the standard deviation
standard_deviation = sqrt(variance)

print('the manually calculated standard deviation is >', str(standard_deviation) + '\nthe standard deviation according to stdev() is >', str(stdev(values)))
~                  

Not at my PC to run it, but your script certainly looks correct.

How much difference are we talking about? 1.0, 0.1, 0.001? Depending on what you’re seeing I’d imagine it could be accrued floating point error, but only if the difference is small.

It’s due to “Bessel’s correction”.

If you’re calculating the standard deviation of a complete population, then you divide the sum of the squared deviations by the size of the population:

variance = sum(deviations_squared) / len(deviations_squared)

But if you’re calculating the standard deviation of a sample from a larger population, then you divide the sum of the squared deviations by one less than the size of the sample:

variance = sum(deviations_squared) / (len(deviations_squared) - 1)

This is because if it’s only a sample, so you might be “missing” some of the true variance.

3 Likes

Thanks for the speedy response, I added a loop to run the script x1000 and collected the difference between the two calculations for each iteration, then calculated the mean. It came to 2.434051111955767, so not insignificant :thinking:

It could be, but I think there is something else going on. I modified the script so instead of calculating the mean of the squared deviations via statistics.mean(), it calculates the sum of the squared deviations and then divides them by len(deviations) - 1.

When the algorithm is run in a loop x1000 and the difference between the two calculated stddevs is collected for each iteration, the mean average of all the differences is still coming out as 2.5368491195361726, or there abouts, which is actually a greater difference than before, so I can only assume that statistics.stdev() assumes it’s argument is a complete group and not a subset from a larger group?

Your result matches statistics.pstdev(values) so look at the docs for the difference between pstdev() and stdev().

3 Likes

To synthesize:

This means (pun intended) that the sum of the squared deviations is calculated, and then divided by 10 (because there are 10 values).

pstdev does the same calculation that you are doing: the sum of the squared deviations is divided by the number of samples (10), in order to compute the population variance; then the square root is taken to get the population standard deviation.

However, stdev computes a sample standard deviation, which divides by 9 (one less than the number of values instead). It assumes that the input values are drawn from a larger population, rather than representing a complete population. In these circumstances,

This applies, as Matthew describes in detail.

Thus, the answers differ, not by an offset, but by a constant factor of sqrt(10/9) (for an input of 10 values).

1 Like

Thanks @steven.rumbalski and @kknechtel. I’ve substituted .stdev() for .pstdev() and the algorithm in the script now calculates the same value as the function call. I literally know nothing about it, but it would seem more appropriate for .stdev() to calculate the population standard deviation, and then to have a function named something like .sstdev() that calculates a sample standard deviation. Perhaps it’s because calculating a sample stdev is a more common operation?

One other interesting thing: as I’ve mentioned above, I encapsulated the stdev calculation in a loop that runs 1000 times, and at the end of the loop it calculates the difference between the stdev calculated by the algorithm and that calculated by .pstdev() and then stores the difference for that iteration of the loop in a list. Remember, the values for which the stdev is calculated are randomly generated for each iteration of the loop, but if the algorithm I wrote in there is correct, the calculated difference should be 0 each time.

In practice however, the difference is not always 0. When the algorithm is run x1000, on a small handful of the loop’s iterations, the algo and .pstdev() disagree and the difference is some very long scientific number.

Not sure what’s going on there, perhaps some technical, low level thingy? I’ve included the final version of the script below.

from random import randrange
from statistics import mean, sqrt, pstdev

diffs = []

for _ in range(1000):
    values = []
    values_mean = None
    deviations = []
    deviations_squared = []
    variance = None
    standard_deviation = None

    # generate ten random values
    for _ in range(10):
        values.append(randrange(1,10))

    # calculate the mean of the random values
    values_mean = mean(values)

    # calculate the deviation from the mean for each value
    for value in values:
        deviations.append(values_mean - value)

    # build a list of squared deviations
    for deviation in deviations:
        deviations_squared.append(deviation**2)

    # calculate the mean of the squared deviations
    variance = mean(deviations_squared)

    # calculate the standard deviation
    standard_deviation_algo = sqrt(variance)

    # calculate the standard deviation using .pstdev
    standard_deviation_func = pstdev(values)

    diffs.append(max(standard_deviation_algo, standard_deviation_func) - min(standard_deviation_algo, standard_deviation_func))

print(diffs)

I tried it, and the “very long scientific number” is always of order 1e-16 — this is a very small number, of order the precision of a 64-bit floating point number. For a computation like this, details of the order of operations actually matters, and the difference between your calculation and the statistics modules effects the result.

2 Likes

By which you mean, a number that is very close to zero, that is provided with a lot of decimal-digit precision in a rather obnoxious way.

This is the standard issue of floating-point imprecision. Please read:

https://0.30000000000000004.com/

2 Likes

Welcome to the wonderful world of float calculations!

It’s been covered, but this is why you might want o use math.isclose() to determine if you got the same result, rather than ==.

The algorithm you wrote is “approximately correct” but each floating point step is subject to round-off error. In contrast, the code in the statistics module computes exact values at each step including a correctly rounded square root of an exact ratio at the end.

Here is an edit to your code that will reproduce the standard library results as you expected:

from random import randrange
from statistics import mean, sqrt, pstdev
from fractions import Fraction                         # <-- Edited

diffs = []

for _ in range(1000):
    values = []
    values_mean = None
    deviations = []
    deviations_squared = []
    variance = None
    standard_deviation = None

    # generate ten random values
    for _ in range(10):
        values.append(Fraction(randrange(1,10)))       # <-- Edited

    # calculate the mean of the random values
    values_mean = mean(values)

    # calculate the deviation from the mean for each value
    for value in values:
        deviations.append(values_mean - value)

    # build a list of squared deviations
    for deviation in deviations:
        deviations_squared.append(deviation**2)

    # calculate the mean of the squared deviations
    variance = mean(deviations_squared)

    # calculate the standard deviation
    h = sqrt(variance)                                 # <-- Edited
    x = variance - Fraction(h) ** 2                    # <-- Edited
    standard_deviation_algo = float(h + x / (2.0 * h)) # <-- Edited

    # calculate the standard deviation using .pstdev
    standard_deviation_func = pstdev(values)

    diffs.append(max(standard_deviation_algo, standard_deviation_func) - min(standard_deviation_algo, standard_deviation_func))

print(diffs)

FWIW, the statistics module docs have a link to the pure Python source code, so it is easy to investigate how everything works.

3 Likes