Median Absolute Deviation, or MAD, it’s a alternative to the “classic” Standard Deviation. Its formula is deadly simple: it’s the median of the deviations from the median:
median(|Xi - Xbar|)
where Xbar is the median of the data.
It’s used when your data depends only from a single variable, and it’s more resilient from outliers than stdev.
It’s obviously not a replacement of stdev, but it’s useful in certain cases. It’s used in many scientific fields, like biology, and also in benchmarks.
I think it’s a good addition for statistics:
def mad(data, xbar=None):
if xbar is None:
xbar = median(data)
return median([abs(x - xbar) for x in data])
It is a more specialised function. Given its usual use cases isn’t it more helpful to get it implemented in the more specialist data mangling libraries?
If you search, you can find code for Polars and Pandas and SciPy has it defined.
Is it used enough for the stdlib?
Not so much, but more than other statistics functions. For example, sourcegraph counts 3.3kmad() and median_abs_deviation(), while kde_random(), for example, only 182.
MAD is implemented also in Matlab, in R, in Wolfram Mathematica, in GNU Octave and in Julia. The only great exception is Dask.
Notably, outside scientific world, pyperf suggests to use MAD instead of stdev if you want to exclude outliers. Even if, AFAIK, it’s only a suggestion and it doesn’t implement it.
Another statistical function that removes the outliers is interquartile range (IQR).
MAD is an ambiguous term; it sometime is used as “median absolute deviation” and sometimes “mean absolute deviation”.
But either way, statistics currently only provides very basic functionality at the moment, and robust statistics like MAD (one or the other), IQR, L-moments, trimmed moments, etc. would be a good fit.