Question about an article by Nassim Nicholas Taleb about the problems with standard deviation.

Question

The article by Nassim Nicholas Taleb says that using standard deviation is flawed and is poorly understood and mean absolute deviation should be used.

Because this article is short, and I don't fully understand the difference, could the wonderful and nice users here please explain the thesis of this article in more details and more clearly, please.

You can't apply anything like the central limit theorem with mean absolute deviations. — Michael Hardy, Jan 16 '14 at 03:11

score 4 · Answer 1 · answered Apr 25 '20 at 21:26

Although as pointed by user452 MAD is a less sensitive statistic to outliers than the standard deviation, I think that N. Taleb has a different perspective. In fact, quite opposite to it.

First, within the domain of Robust Statistics, outliers have always been considered as negative artifacts. Robust Statistics, like the median, avoid such distortions. Instead, N. Taleb considers outliers as providing information about the tails of the probability distributions (e.g. the crash of 1929, Black Monday of 1987, Dot-com bubble, etc.). So, N. Taleb is not advocating to use the MAD to reduce the effect of outliers, but precisely to be able to consider them as part of the analysis.

Second, the existence of the standard deviation depends on the tail of the distribution to decrease at least as $O(x^{-(2+\epsilon)})$, and the variance and so the convergence of the corresponding estimator depends on the tail of the distribution to decrease at least as $O(x^{-(4+\epsilon)})$ (i.e. the kurtosis of the distribution should exist). In fact, there are many known natural and man-made processes that generate distributions with low exponents in the tails, also known fat-tailed or heavy-tailed. Therefore, the use of the standard deviation is technically dubious in many cases (and the same apply to the correlation, PCA, etc.).

Third, from a epistemologically perspective, when the underlying process generating the data is not well-known, it is incorrect to assume that the process won't be fat-tailed (i.e. absence of evidence is not the same as evidence of absence).

Finally, in a decision-making scenario involving unbounded risks (e.g. financial ruin, wars, etc.), when the underlying process generating the data is not well-known, then we should take actions that avoid these unbounded risks. It does not make sense to take actions to reduce the probability of occurrence, because any repeated exposure to a small probability will certainly end in catastrophe (i.e. probability of catastrophe after N attempts $p_N$ grows as $(1-p)^N-1$, exponentially fast with the number of attempts).

Given the above, MAD is a better estimator than standard deviation. However, for decisions involving unbounded risk with with not well-known processes, N. Taleb advocates even for the incorrectness of such estimates. Instead he advocates for precautionary principle.

Although in draft form, a more mathematically oriented exposition of his ideas can be found in the Statistical Consequences of Fat Tails: Real World Preasymptotics, Epistemology, and Applications.

Not sure why this isn't the top answer. – NeoCortex64 Jun 09 '22 at 15:25 — NeoCortex64, Jun 09 '22 at 15:25

score 4 · Accepted Answer · answered Jan 16 '14 at 00:58

Taleb's answer is kind of a rant, but I can give a pro and con to mean absolute deviation (MAD). MAD is an example of a robust statistic, and among the many properties these have is resistance to outliers; i.e. large values in the sample will not overly influence MAD compared to SD. But a problem with MAD on a theoretical basis is that the statistical distribution of MAD is often very hard to know in a closed form, whereas there are often nice closed formulas for the distribution of SD. Taleb is arguing that the availability of lots of computing power renders the need for theoretical analysis less useful for applied use, and so robust statistics ought to be preferred. But being able to manipulate nice theoretical equations which well-approximate your problem is also useful (Taleb would likely deny SD has this property). So it's all debatable. I use them both!

thanks, that help clears it up and you gave me more to read about. — yiyi, Jan 16 '14 at 01:07
Here is a good example of this: standard deviation is square-additive ($\sigma_a^2 + \sigma_b^2 = \sigma_{a+b}^2$) which might be very useful if you need to combine aggregate information. — Andrew Dudzik, Jan 16 '14 at 01:07

Question about an article by Nassim Nicholas Taleb about the problems with standard deviation.

2 Answers2