Lesson two: Let's get gambling!
What we're actually going to be doing today is applying something called the “Monte Carlo method” to estimate the value of Π, which is a Greek letter called “pi.” (You pronounce this, in English, with the same pronunciation as “pie,” so when someone says “It's as easy as pie!” what they mean is that “it's as easy as the ratio of the any circle's circumference to its diameter,” or perhaps that “it's as easy as the ratio of a circle's area to the square of its radius.”Doesn't sound so easy when you put it this way, does it?)
We used a lot of neat (and new) terms there. First, let's go back to the beginning, and define things.
A circle is a figure, a drawing composed of all of the points that are the same distance from another point. In other words, if you threw a dart at a wall, and then drew a lot of dots that were exactly two inches from where that dart hit the wall, you'd have something that looked like a circle. (Theoretically, it's impossible to have a perfect circle because there's no way you can draw every possible point on the circle.)
A circle's circumference is a measurement of the length of a line composed of the circle's points. In other words, if you measured the amount of space on a circle's curve, you'd have the circle's circumference.
A circle's diameter is a measurement of the distance between any two points on opposite ends of a circle – in other words, how far the circle is across. A circle's radius is half that distance. If the radius is r, the diameter (d) is always 2r. (If this isn't the case, then you don't have a circle! It's hard to say what you'd actually have, but “a circle” isn't it.)
We've already covered squares, but the “square” of a number is the result of that number times itself - “two squared” is four, three squared is nine, four squared is sixteen, etc. (A “square root” is the other way: the square root of sixteen is four, the square root of nine is three, the square root of four is two, the square root of two is obviously – and roughly - 1.41421356.)
Now that we can pretend with quite a bit of accuracy that we know what a circle is (“It's round, right?”) we can move on to more fun things.
We care about pi because it ... it... teachers use it a lot. Actually, it is a fairly odd number because it shows up in physics quite a bit. Scratch reality enough to get to the mathematics underneath, and you'll find pi1 lurking not far beneath the surface. Looking at Einstein? Pi shows up. Heisenberg? Pi is there, making sure we're uncertain. Coulomb? Kepler? Pi shows up in their most important works.
All irrelevant for what we're trying to do today, though.
All we want to do today is determine pi, within a certain range. We can't actually work out pi all the way, because one of the odd things about pi is that it's irrational, which means it goes on forever and ever. Pi has been worked out to more than a trillion digits. It's safe to assume that it goes on for a few more trillion after that. After a while, you get so much precision that it doesn't matter – for example, according to Wikipedia, a value of pi truncated to thirty-nine decimal places is sufficient to compute the circumference of any circle that fits in the observable universe to a precision comparable to the size of a hydrogen atom. That's... remarkably accurate, to say the least.
We can, however, fake pi to enough accuracy to make ourselves happy, and in the process, we learn a heck of a lot about simulations.
The “Monte Carlo method” is one of the ways that we can sort of estimate what the value of pi is. One way of explaining this is that if you drew a square on the ground, and then a circle as large as possible in that square, and lastly, scattered dots in it evenly, then the number of dots in the circle would, multiplied by four, divided by the number of dots in the square would get closer and closer to pi as the number of dots increased.
Wow. That was almost – almost – as clear as mud.
First off, it's really important that we remember our Pythagorean theorem from the first lesson. We use that to determine the distance of point A to point B: the distance between A and B, expressed as C, can be expressed as such:
We even wrote some Python to help us determine the length of the hypotenuse – the c in the equation. We'll be re-using that code.
Now, first off, what we need to do is work out a process for the determination of pi.
We need our triangle.py from last lesson, which looked like this:
def hypotenuse(a, b):
We'll also need to import another module, called “random.” This python module is used to generate pseudo-random numbers, as you might have guessed from the name.
Yikes! What does “pseudo-random” mean? Well, let's spend just a bit of time in what randomness is, so that we understand what we're about.
Randomness is a quality of unpredictability. That means that if we say “give us a random number between one and ten,” we expect that we shouldn't be able to guess what the result is (more than, say ten percent of the time – we're going to be right at some point!)
That said, randomness has some odd properties that don't seem obvious on first glance. For example, consider this sequence of numbers from one to ten: 9, 9, 9, 9, 9, 9. It doesn't look random – looks like a sequence of nines, over and over again. It's fair to say that, assuming this pattern holds, it's not random – but it could be! After all, if we define “random” as being unable to give us the same result as a prior request, then we're limiting what randomness is!
There are ways to determine actual randomness, but they involve quite a bit of work, and at the end of them, from the observer's point of view, you still can't be sure they're random. (Whoa, I think we just paraphrased Heisenberg, in a roundabout, vague, sort of random way.)
Computers are generally terrible at true randomness. What most “random number generators” for programming languages do is use what's called a “seed,” a way of generating an initial random number – based on, say, the current time or something like that – and then alter that seed as new random numbers are generated. If you know the seed in use, you can actually predict what random numbers will be generated.
That sounds like an invitation to code. We've talked an awful lot about math: let's get codin' some so we can show off our randomness – or lack of it.
Python 2.5.1 (r251:54863, May 18 2007, 16:56:43)
[GCC 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import random
>>> for x in range(5):
... print random.random()
>>> for x in range(5):
... print random.random()
As usual, a few observations.
We still need to “import random” because this is a Python module, and we have to tell Python we're planning on using it. Two methods in the “random” module interest us: the random.seed() method and the random.random() method.
The first – random.seed() - allows us to set the seed to a specific number. We don't need to call this at all, normally, but one of the things about randomness is unpredictability – and we're showing here that random isn't quite so random!
By setting the seed, we can guarantee predictability, which means we can run tests over and over again without worrying that our test results change because our number sources change.
The second call – random.random() - gives us a number between 0.0 and 1.0. The Python documentation says exactly that, which is a little disappointing – most random number generators give us a number in the range such that 0.0 is part of the range, and 1.0 is not. The normal way to see this is “0.0 ? random < 1.0”, where the line under the “<” means “less than or equal to” and the “<” without the underline means “less than.” Based on the Python documentation, I'm not sure if random.random() can actually generate “1.0” as a result, and I have a feeling I could test for quite some time before getting that number at random, so to speak.
The next odd thing is the “for x in range(5):” bit. This is a control structure, which says something like this: “assign x to every number between 0 and 5” - and since we took the time to specify ranges for random number generators, let's use our newfound knowledge: x will be in the following range: 0 ? random < 5. (To be specific: 0,1,2,3,4 will be the values of x.)
Now, just assigning x those numbers is all good and stuff, but the colon in the expression means the value of x can be used in a “block,” in this case the “print random.random()” in the code example.
With all that aside, running the blocks twice should show us the exact same results – if we were looking at a truly random number generator, we'd not be able to get that.2
So how does this affect our pi? Well... the problem here is that an actual estimation of pi relies on randomness. A good bit of it. We can get close, but we need to keep in mind that this is an estimation of pi, and nowhere near an actual value. We might get a number that's very very very far from the actual value of pi – or very close. It's going to be a bit random, so to speak.
Let's just say that Monte Carlo methods are great for some things, but this is a lesson on simulations and estimations – and calculating pi with Monte Carlo is fun, but not reliable.
The ... interesting thing is, however, that four pages in, we're ready to estimate pi! What's more, with all this handy-dandy knowledge under our belts (or sashes, or robes, or ... what-have-you) doing the actual estimation is really easy!
Remember, we described the Monte Carlo method for pi: “if you drew a square on the ground, and then a circle as large as possible in that square, and lastly, scattered dots in it evenly, then the number of dots in the circle would, multiplied by four, divided by the number of dots in the square would get closer and closer to pi as the number of dots increased.”
We also said that was almost as clear as mud. Let's write some code to make it clearer – the code to run the entire Monte Carlo method is actually shorter than the explanation!
Remember, we need to have our triangle.py from lesson one:
def hypotenuse(a, b):
Our montecarlo.py script, first try:
Let's watch this puppy in action:
$ python montecarlopi.py
This isn't quite 3.14159... but it's sort of close! (For varying definitions of “sort of” and “close.” Even 22/7 – a poor approximation of pi, indeed – is 3.1428.)
That said, if we take that “1000” in the code, which means “take 1000 samples” and increase it, we get more and more accurate. Changing to 100000 instead of 1000 gives us a value of 3.14024 – still not as good as we'd like, but we are already more accurate than 22/7!
Basically, the higher and higher we go in our sample count, the better our result of pi will be. This is the weakness of the Monte Carlo method, but it's also something that teaches us about sample frequency – in math, as in music, more samples give us better data. It's not perfect – using 10000000 samples gives us a result of 3.1413076 – but you can see that we're getting closer, and closer, and closer.