Over a million developers have joined DZone.

The Mean of the Mean is the Mean

DZone's Guide to

The Mean of the Mean is the Mean

· Big Data Zone
Free Resource

Learn best practices according to DataOps. Download the free O'Reilly eBook on building a modern Big Data platform.

There’s a theorem in statistics that says

E( \bar{X} ) = \mu

You could read this aloud as “the mean of the mean is the mean.” More explicitly, it says that the expected value of the average of some number of samples from some distribution is equal to the expected value of the distribution itself. The shorter reading is confusing since “mean” refers to three different things in the same sentence. In reverse order, these are:

  1. The mean of the distribution, defined by an integral.
  2. The sample mean, calculated by averaging samples from the distribution.
  3. The mean of the sample mean as a random variable.

The hypothesis of this theorem is that the underlying distribution has a mean. Lets see where things break down if the distribution does not have a mean.

It’s tempting to say that the Cauchy distribution has mean 0. Or some might want to say that the mean is infinite. But if we take any value to be the mean of a Cauchy distribution — 0, ∞, 42, etc. — then the theorem above would be false. The mean of n samples from a Cauchy has the same distribution as the original Cauchy! The variability does not decrease with n, as it would with samples from a normal, for example. The sample mean doesn’t converge to any value as n increases. It just keeps wandering around with the same distribution, no matter how large the sample. That’s because the mean of the Cauchy distribution simply doesn’t exist.


Find the perfect platform for a scalable self-service model to manage Big Data workloads in the Cloud. Download the free O'Reilly eBook to learn more.


Published at DZone with permission of John Cook, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}