Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Computing Skewness and Kurtosis in One Pass

DZone's Guide to

Computing Skewness and Kurtosis in One Pass

· Big Data Zone
Free Resource

Effortlessly power IoT, predictive analytics, and machine learning applications with an elastic, resilient data infrastructure. Learn how with Mesosphere DC/OS.

If you compute the standard deviation of a data set by directly implementing the definition, you’ll need to pass through the data twice: once to find the mean, then a second time to accumulate the squared differences from the mean. But there is an equivalent algorithm that requires only one pass and that is more accurate than the most direct method. You can find the code for implementing it here.

You can also compute the higher sample moments in one pass. I’ve extended the previous code to compute skewness and kurtosis in one pass as well.

The new code also lets you split your data, say to process it in parallel on different threads, and then combine the statistics, in the spirit of map-reduce.

Lastly, I’ve posted analogous code for simple linear regression.

Learn to design and build better data-rich applications with this free eBook from O’Reilly. Brought to you by Mesosphere DC/OS.

Topics:

Published at DZone with permission of John Cook, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}