Need to generate test data in your SQL database? The team over at Periscope has had a couple of blog posts recently reminding us that an evenly random distribution is not always the most useful solution.
As pointed out in their first post on the matter, Beyond Random() — Normal Distributions in SQL, even distributions rarely simulate actual data. A more realistic distribution is the normal distribution, for which the folks at Periscope recommend using the Marsaglia Polar Method, which "converts a pair of uniformly distributed random numbers into a pair of normally distributed random numbers." In the post, they show the steps for using SQL to input random numbers using generate_series into the Marsaglia formulas:
This formula creates a Gaussian bell curve like this:
Let's say you typically sell 5 widgets per day. How likely is it that you'll sell 5 widgets tomorrow? What about between 4 and 6 widgets tomorrow? Obviously we can't just guess randomly. And the normal distribution won't help either.
Fortunately, this is what the Poisson Distribution is for. Its formula is:
Our Poisson Distribution formula takes 3 inputs:
R: Our known rate, in this case 5.
e: Euler's Number, 2.71828.
k: tomorrow's expected rate.
This creates a distribution that looks like this:
(Credit: Periscope.io)Periscope's blog entries both give specific details on using these distributions for test data in SQL. It's worth a look; you can check out their full blog at https://periscope.io/blog.