Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

DZone's Guide to

More Sides or More Dice?

· Big Data Zone ·
Free Resource

Comment (0)

Save
{{ articles[0].views | formatCount}} Views

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

My previous post looked at rolling 5 six-sided dice as an approximation of a normal distribution. If you wanted a better approximation, you could roll dice with more sides, or you could roll more dice. Which helps more?

Whether you double the number of sides per dice or double the number of dice, you have the same total number of spots possible. But which approach helps more? Here’s a plot.

We start with 5 six-sided dice and either double the number of sides per die (the blue dots) or double the number of dice (the green triangles). When the number of sides n gets big, it’s easier to think of a spinner with n equally likely stopping points than an n-sided die.

At first, increasing the number of sides per die reduces the maximum error more than the same increase in the number of dice. But after doubling six times, i.e. increasing by a factor of 64, both approaches have the same error. But further increasing the number of sides per die makes little difference, while continuing to increase the number of dice decreases the error.

The long-term advantage goes to increasing the number of dice. By the central limit theorem, the error will approach zero as the number of dice increases. But with a fixed number of dice, increasing the number of sides only makes each die a better approximation to a uniform distribution. In the limit, your sum approximates a normal distribution no better or worse than the sum of five uniform distributions.

But in the near term, increasing the number of sides helps more than adding more dice. The central limit theorem may guide you to the right answer eventually, but it might mislead you at first.

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

Topics:

Comment (0)

Save
{{ articles[0].views | formatCount}} Views

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}