Adding Median Absolute Deviation (MAD) to statistics

Median Absolute Deviation, or MAD, it’s a alternative to the “classic” Standard Deviation. Its formula is deadly simple: it’s the median of the deviations from the median:

median(|Xi - Xbar|)

where Xbar is the median of the data.

It’s used when your data depends only from a single variable, and it’s more resilient from outliers than stdev.

It’s obviously not a replacement of stdev, but it’s useful in certain cases. It’s used in many scientific fields, like biology, and also in benchmarks.

I think it’s a good addition for statistics:

def mad(data, xbar=None):
    if xbar is None:
        xbar = median(data)
    
    return median([abs(x - xbar) for x in data])
1 Like

It is a more specialised function. Given its usual use cases isn’t it more helpful to get it implemented in the more specialist data mangling libraries?
If you search, you can find code for Polars and Pandas and SciPy has it defined.
Is it used enough for the stdlib?

4 Likes

Not so much, but more than other statistics functions. For example, sourcegraph counts 3.3k mad() and median_abs_deviation(), while kde_random(), for example, only 182.

MAD is implemented also in Matlab, in R, in Wolfram Mathematica, in GNU Octave and in Julia. The only great exception is Dask.

Notably, outside scientific world, pyperf suggests to use MAD instead of stdev if you want to exclude outliers. Even if, AFAIK, it’s only a suggestion and it doesn’t implement it.

Another statistical function that removes the outliers is interquartile range (IQR).

MAD is an ambiguous term; it sometime is used as “median absolute deviation” and sometimes “mean absolute deviation”.

But either way, statistics currently only provides very basic functionality at the moment, and robust statistics like MAD (one or the other), IQR, L-moments, trimmed moments, etc. would be a good fit.

3 Likes

Yeah, you’re right. And there’s also mode, for the sake of simplicity… :sweat_smile:

Matlab, Octave an Polars use a unique function with a flag to calculate mean vs median. It can be an idea, but I’m not really a fan of.

R uses mad() for median and madstat() for mean. A bit confusing.

Wolfram uses MedianDeviation and MeanDeviation. Not bad.

In Julia, mad() uses the median and meanad uses the mean. More confusing than R.

In SciPy, there’s median_abs_deviation(). Too long. It seems there’s no Mean Absolute Deviation function.

In Pandas, mad() uses the mean, and now that I’m getting a better look, it’s deprecated.

IMHO the best solutions are to name it median_dev, or just mad and maybe add an optional param to use mean instead of median.

I just prefer median_dev.

2 Likes