Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Why Isn't Everything Normally Distributed?

DZone's Guide to

Why Isn't Everything Normally Distributed?

· Big Data Zone
Free Resource

Need to build an application around your data? Learn more about dataflow programming for rapid development and greater creativity. 

Adult heights follow a Gaussian, a.k.a. normal, distribution [1]. The usual explanation is that many factors go into determining one’s height, and the net effect of many separate causes is approximately normal because of the central limit theorem.

If that’s the case, why aren’t more phenomena normally distributed? Someone asked me this morning specifically about phenotypes with many genetic inputs.

The central limit theorem says that the sum of many independent, additive effects is approximately normally distributed [2]. Genes are more digital than analog, and do not produce independent, additive effects. For example, the effects of dominant and recessive genes act more like max and min than addition. Genes do not appear independently—if you have some genes, you’re more likely to have certain other genes—nor do they act independently—some genes determine how other genes are expressed.

Height is influenced by environmental effects as well as genetic effects, such as nutrition, and these environmental effects may be more additive or independent than genetic effects.

Incidentally, if effects are independent but multiplicative rather than additive, the result may be approximately log-normal rather than normal.


Fine print:

[1] Men’s heights follow a normal distribution, and so do women’s. Adults not sorted by sex follow a mixture distribution as described here and so the distribution is flatter on top than a normal. It gets even more complicated when you considered that there are slightly more women than men in the world. And as with many phenomena, the normal distribution is a better description near the middle than at the extremes.

[2] There are many variations on the central limit theorem. The classical CLT requires that the random variables in the sum be identically distributed as well, though that isn’t so important here.

Check out the Exaptive data application Studio. Technology agnostic. No glue code. Use what you know and rely on the community for what you don't. Try the community version.

Topics:

Published at DZone with permission of John Cook, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}