A familiar topic for me for the January 2010 issue: testing a pseudo-random number generator’s (PRNG) output for randomness. I say familiar because I’ve talked about it before, most recently in my book. Well, OK that was 10 years ago, but still, the techniques don’t change. And it’s extremely fascinating, to boot.
I began the article with Knuth’s classic joke on random numbers: is 2 a random number? (Volume 2 of The Art of Computer Programming, Seminumerical Algorithms, page 2.) I’ll note here that xkcd riffed off this joke with a hilarious C version as shown below.
I then invited my readers to think about the problem: could a random digit sequence include three 4s in a row? (Yes.) Could it include a sequence, like 4, 5, 6? (Yes.) And so on. In essence, I made the point that humans are pretty bad at evaluating whether a sequence of numbers is random or not.
Possibly one way to evaluate whether a sequence is random or not is to try and compress it: random sequences do not compress well since they don’t have repeatable (redundant) information that we can take advantage of for compression. I didn’t use this example in the article because it would have meant talking about information theory and about encoding the random numbers in something other than ASCII, or using 8-bit random bytes or something. Anyway...
Basically the only way to test a sequence for randomness is to use statistical tests on the numbers in the sequence. I then derived χ2 (chi-square) testing, but in a way that made sense to my semi-technical layman audience. Once I had that statistical tool, I was away: the uniformity test, the gap test, the poker test, the coupon-collector’s test, all mentioned in Knuth.
I threw in a couple of nice images showing that the output from a PRNG should also not exhibit regularity when grouped in pairs and plotted on a graph. This kind of test could be extended to multiple dimensions equally as well (although I must admit that the Minimal Standard Generator I briefly discussed would fail other Knuth tests badly).
I also mentioned in the article George Marsalgia’s DIEHARD tests which encompass and extend Knuth’s. Wikipedia has a little on the tests involved, but it’s best to go to the source for full information.
One story I didn’t talk about is about the Delphi PRNG (seeded through the Randomize() procedure and accessed using the Random() function) and how it meant that an online poker game could compromised, despite the fact that its output passes Knuth’s tests. In one sense the output is random, and in another it’s completely predictable. It’s a salutary story and you read up on it here.
This article first appeared in issue 290, January 2010.
You can download the PDF here.