Algorithm of the Week: Modeling Complex Cause and Effect with Bayesian Networks
What problem am I trying to solve?
Let’s not be too tool oriented. Let’s start from the problems I hope to solve and I’ll offer up a couple solutions. Of course one of them will be what I discuss in detail here.
 Risk modeling
 Decision making
 Rigorous intuition
Let’s use an example that’s fairly well known with a very certain right answer: The Monty Hall problem. In the Monty Hall problem we are presented with three doors (red, green, and blue). Monty Hall says there’s a prize behind one of them (a new laptop of your choosing!) and the other two have goats. I know what you’re thinking: Monty Hall’s a fool because a pet goat would be the best! Let’s pretend we don’t want a goat at all (use your imagination). Let’s pretend we REALLY want that new laptop. Ok? OK.
So Monty Hall asks you to pick the door you think your laptop is behind. Let’s say you choose the green door #BecauseMoney. Well, now Monty Hall is going to work at mind f***ing you. He opens one of the doors you didn’t pick and shows you that it’s got a goat. What I wouldn’t give for a goat right now. Anyway. So now you’ve got the original door you picked and Monty Hall has opened the one of the two other doors and shown it’s a goat. Now shit gets real. He offers you a choice.
You can either open your door OR the other door that remains closed. Based on what you know, what’s the best decision?
Turns out that we can model this with a Bayesian Network. Before I explain how, let me explain what I mean by a Bayesian Network first.
You can skip directly to the executable code by going to my github repo for this here: Bayesian Networks Git Repo
What is a Bayesian Network?
A Bayesian Network is a graphical model of events, their probabilities, and how they affect one another. By graphical model I don’t mean you use pictures and crayons. A graphical model is one that consists of a series of entities (typically referred to as nodes) and lines (aka edges) that connect them. It turns out graphical models are AMAZING for mechanically reasoning about relationships between nodes. An example most people connect with is that of Facebook seeing you and your friends as nodes in a graph and your friendships as edges between you all.
In this case we have different events and they are connected by the fact that some events happening give us information about the likelihood of other events happening. Let’s relate this back to the Monty Hall problem.
How would I use this?
So now that we have a very general idea about what a Bayesian Network is, let’s see some high level code that sets up the network and evaluates it.
First we need to create our Bayesian Nodes. Think of these as the possible events that might happen.
door_picked_node = BayesianNode('door_picked', door_picked_table) prize_behind_node = BayesianNode('prize_behind', prize_behind_door_table) shown_empty_node = BayesianNode('shown_empty', shown_empty_table) win_prize_node = BayesianNode('win_prize', win_prize_table) door_after_choice_node = BayesianNode('door_after_choice', door_after_choice_table) switch_node = BayesianNode('switch_or_stay', switch_table) informed_agent_node = BayesianNode('informed_agent', informed_agent_table)
For now, don’t be concerned with where the corresponding tables are coming from. I just want you to see how the concepts are put into action.
Now that we have all of the possible events created, we need to articulate how they’re related. We do that with code that looks like this:
door_picked_node.affects(shown_empty_node) prize_behind_node.affects(shown_empty_node) shown_empty_node.affects(door_after_choice_node) door_picked_node.affects(door_after_choice_node) door_after_choice_node.affects(win_prize_node) prize_behind_node.affects(win_prize_node) switch_node.affects(door_after_choice_node)
What’s happening here? Hopefully it reads pretty clearly but in case it doesn’t, let me explain in different words. Here we’re basically saying “When this event happens, it tells us something about how likely it is for this other event to happen or not.” Here’s a more readable version of the above:
door_picked_node.affects(shown_empty_node) # The contestant picking a door tells us something # about which doors Monty can show as empty since Monty # won't show us what's behind our own door ever. prize_behind_node.affects(shown_empty_node) # Knowing which door has a prize behind it tells us something # about which door Monty might choose to show as empty since # he would never show us the door with the prize behind it. shown_empty_node.affects(door_after_choice_node) # Showing the empty door affects our choices in doors since # there's now one less door to choose to switch to. door_picked_node.affects(door_after_choice_node) # The door we originally picked affects the door we might pick # once we're given the choice to switch doors. door_after_choice_node.affects(win_prize_node) # The door we decide to choose after being shown the empty door # affects whether or not we win the prize. The reason for this # is... well. If we chose the winning door, we'll win and if we # didn't, we won't. prize_behind_node.affects(win_prize_node) # Which door the prize is behind affects whether or not we # win the prize. switch_node.affects(door_after_choice_node) # Whether we switch doors or keep our original door affects the # door we have.
Some of these events and their relationships probably seem uselessly obvious. If you’re thinking that, you’re totally correct. This example is very contrived and somewhat over complicated to show how a network of several different beliefs works. By over complicating a simpler problem it’s easier for us to know if we’re getting the right answers at each step of the process.
Now that we’ve connected all of these events/nodes together we can actually begin to see what happens as the events unfold. We start by evaluating the probabilities of all of the events prior to doing anything:
Results:

Door picked
 blue: 0.3333
 green: 0.3334
 red: 0.3333

Prize behind door:
 blue: 0.3333
 green: 0.3334
 red: 0.3333

Door shown empty:
 blue: 0.33333333333333337
 green: 0.33333333333333337
 red: 0.33333333333333337

Win prize:
 False: 0.6666666666666667
 True: 0.3333333333333333

Updated door choice:
 blue: 0.3333333333333333
 green: 0.3333333333333333
 red: 0.3333333333333333

Switch or stay:
 switch: 0.5
 stay: 0.5
Run through those and make sure they match up with your reasoning.
Remember above how we walked through the set up of the game and we had a choice to make? Let’s get there by updating our Bayesian Network:
door_picked_node.given('red') shown_empty_node.given('green')
This is the result of running these two updates through our network:

Door picked
 blue: 0.0
 green: 0.0
 red: 1.0

Prize behind door
 blue: 0.6666666666666666
 green: 0.0
 red’: 0.3333333333333333

Door shown empty
 blue: 0.0
 green: 1.0
 red: 0.0

Win prize
 False: 0.5
 True: 0.5

Updated door choice
 blue: 0.5
 green: 0.0
 red: 0.5

Switch or stay
 switch: 0.5
 stay: 0.5
Now something interesting has happened. Look through the results. Some of the events are more than obvious. For example, Door picked is 100% red. Well duh. Yeah. We made that choice. That was a given, no biggie. Our choice in doors now is 50/50 since one of the doors has been opened (the green door which is verified by the door shown empty node).
One other thing… The probability of the prize being behind the blue door is .66666 which is double the probability of the prize being behind our own red door. Also! Note how our chance to win the prize is just 50/50. Why would our probability of winning just be 50/50 if we now know that the prize is probably behind the other door? Because we never modeled anything about our likelihood to make optimal choices. I would expect however that if we reran these numbers given that we switched doors, we’d see that “Win prize” probability jump up to match the probability that the prize is behind the blue door.
I reran it. Given that we switch our choice, our probability of choosing the blue door is 1 and our probability of winning a prize is now .66666!
And that’s all there is to calculating probabilities with the model. For whatever event that has happened, grab the appropriate node, update its beliefs by giving it a given and the rest of the nodes update in response.
The real magic in all of this isn’t in how each node communicates. This is just a directed acyclic graph. Information can be passed from one node to every node downstream with a method call. No.
Instead the real magic is in the joint probability tables that are hidden away within each node.
Joint Probability Tables, Bayes, and You
A quick refresher on what a joint probability table is. Here’s a simple table with two variables
Door Picked  Winning Door  Probability 

Red  True  0.11111 
Red  False  0.22222 
Blue  True  0.11111 
Blue  False  0.22222 
Green  True  0.11111 
Green  False  0.22222 
This is a table that shows all possible combinations of events in the game. All possibilities of the doors you could pick along with whether or not you might win. Take the first row as an example. It is read as, “Of all possibilities including the door picked event and a door being picked the winner there is an 11.11% chance the red door was chosen AND the red door is the winning door.” The next row tells us that it’s twice as likely that the red door loses though.
Conditional probabilities work by saying something like: “Given that the door picked is red find the probability of having a winning door.” When we do that, we take the table above and limit it to just the rows with a red door picked like this:
Door Picked  Winning Door  Probability 

Red  True  0.11111 
Red  False  0.22222 
Next we “normalize” the table so that all the rows add up to 100% again.
Door Picked  Winning Door  Probability 

Red  True  0.33333 
Red  False  0.66666 
Now we can say “Given a red door is picked, there is a 33.33% chance it is a winning door.” You might wonder how I did the normalization. All I had to do was take my new table and divide each value by the sum total of all of the values. For the first row that would be .11111 / .33333 which is .33333.
You could also say “What is the probability of choosing a winning door in general?” In this case we would just forget about the “Door Picked” column and then sum up all of the probabilities for each unique value of “Winning Door”. This would be that table:
Step 1:
Winning Door  Probability 

True  0.11111 
False  0.22222 
True  0.11111 
False  0.22222 
True  0.11111 
False  0.22222 
Step 2:
Winning Door  Probability 

True  0.33333 
False  0.66666 
Now we can say The probability of choosing the winning door is 33.33%. That’s all there is to conditional probabilities. It’s all about choosing a subset of possibilities and normalizing your probabilities across them. It’s easy to do these… as long as the table is small. When you get to a table with three different variables with 3 possible values each however… now you’re talking 27 rows to fill out. And it gets much worse MUCH faster as each variable added grows the row count exponentially.
To deal with this, I made a class in Python to do the calculations for me. I call it “JointProbabilityTable”. CREATIVITY IN MOTION!
Processing Large Joint Probability Tables
Before we process a large joint probability table let me introduce you to how the code is set up.
door_picked_table = JointProbabilityTable( columns=['door_picked'], data = [ ['red', 0.3333], ['blue', 0.3333], ['green', 0.3334], ])
This code represents the joint probability table for which door is picked. The table has one variable called “door_picked” and its rows of data are listed below it. You might noticed we have one column named in “columns” but two present in each row. The right most column is for our probabilities. Since it will always exist it isn’t necessary to specify it explicitly.
This table isn’t very interesting since it’s really just a list of probabilities with no interesting way of rolling them up. Let’s move on to a REAL table.
shown_empty_table = JointProbabilityTable( columns=['door_picked', 'prize_behind', 'shown_empty'], data = [ ['red', 'red', 'red', 0.0], ['red', 'red', 'green', 0.5], ['red', 'red', 'blue', 0.5], ['red', 'green', 'red', 0.0], ['red', 'green', 'green', 0.0], ['red', 'green', 'blue', 1.0], ['red', 'blue', 'red', 0.0], ['red', 'blue', 'green', 1.0], ['red', 'blue', 'blue', 0.0], ['green', 'red', 'red', 0.0], ['green', 'red', 'green', 0.0], ['green', 'red', 'blue', 1.0], ['green', 'green', 'red', 0.5], ['green', 'green', 'green', 0.0], ['green', 'green', 'blue', 0.5], ['green', 'blue', 'red', 1.0], ['green', 'blue', 'green', 0.0], ['green', 'blue', 'blue', 0.0], ['blue', 'red', 'red', 0.0], ['blue', 'red', 'green', 1.0], ['blue', 'red', 'blue', 0.0], ['blue', 'green', 'red', 1.0], ['blue', 'green', 'green', 0.0], ['blue', 'green', 'blue', 0.0], ['blue', 'blue', 'red', 0.5], ['blue', 'blue', 'green', 0.5], ['blue', 'blue', 'blue', 0.0], ])
Now then. 32 rows of awesomeness. This joint probability table describes our universe of possibilities in terms of three events that effect one another: Which door was picked, which door the prize is behind, and which door was shown empty. You might notice that this table doesn’t add up to 100%. The JointProbabilityTable class normalizes my tables for me on creation so they are easier for me to reason about. How so? It allowed me to break this larger table up into groups of three mentally.
Let’s look at the first three rows of data. They should be interpreted as “Given the red door is picked and the prize is behind the red door, what’s the probability for each door that it will be shown as empty?”
So for the first row we want to know the probability the red door will be shown as empty. Well. It has the prize behind it so there’s no chance Monty is gonna open that door up. Notice the probability to the right of that first row is 0. So there’s two other doors and I have no other information so I can say the other two doors have a 50/50 chance of being opened by Monty.
Let’s do one more joint probability table. This one will be unrelated to our Monty Hall problem but it will show some of the power of this technique. We’ll use the classic medical test example.
Let’s say you have blood work done and your results come back positive for the disease Bozorhea. 1,219 people were included in the study for the disease. 61 people had the disease and tested positive, 14 had the disease and tested negative, 14 did not have the disease and tested positive, with 1130 people not having the disease and testing negative. What is the probability that you actually have the disease given that you tested positive?
First we start with an unnormalized table that just lists the facts:
Has Disease  Tested Results  Number of people 

True  True  61 
True  False  14 
False  True  14 
False  False  1130 
Next we normalize the table to get probabilities
Has Disease  Tested Results  Number of people 

True  True  .0500 
True  False  .0115 
False  True  .0115 
False  False  .9270 
Let’s interpret this table. How many people have the disease regardless of their test results? Add up all of the rows with “Has Disease” equal to true. So 6.15% do. Leaving 93.85% not having the disease.
Now let’s answer the question. Given that we tested positive, what’s our probability of having the disease? That’s this table:
Has Disease  Tested Results  Number of people 

True  True  .0500 
False  True  .0115 
Next we need to renormalize the probabilities. When we do that we get this table:
Has Disease  Tested Results  Number of people 

True  True  .8130 
False  True  .1870 
The probability that you have the disease is 81.3%. It’s likely, but far from certain.
Updating Beliefs in Joint Probability Tables
Back to our Monty Hall example, we saw that updating our Bayesian Network with information about events having happened it affected the probability of many other events in the network. How does this work?
When we mark an event as a given and effectively say it is now certain to have happened the probabilities of the associated events change as well. We saw this above when we picked the red door and then the green door was shown to us as empty. Again all of this follows what we’ve already covered above. Except now these new beliefs are passed between events and applied where ever it makes sense.
How do I do this generally?
def update_belief(self, event_name, new_beliefs): current_beliefs = self._get_current_beliefs_for_event(event_name) belief_shifts = self._get_belief_shifts(current_beliefs, new_beliefs) event_column_index = self._columns.index(event_name) new_table = self._clone_data() for row in new_table: row[1] = row[1] * belief_shifts[row[event_column_index]] return JointProbabilityTable(self._columns, new_table)
 We only update beliefs for one event in a node at a time.
 The new_beliefs parameter that is passed in is a list of every value for the given event and the new probabilities.
 Get the current beliefs ONLY for the event we’re updating. This is equivalent to summing up all of the probabilities and grouping them by the different event values for this specific event we’re updating.
 Now we know what the probabilities were and we know what they should be updated to we can calculate the percentage by which to adjust each probability in the node’s joint probability table.
 We go through every row in the underlying joint probability table and adjust it by the amount specified by which event value the row in the table refers to.
 I tried to use immutability when possible so I create a brand new joint probability table and return that with the updated data.
Digging a little deeper this is specifically how I’m calculating the belief shifts:
def _get_belief_shifts(self, current_beliefs, new_beliefs): belief_shifts = {} for event_value in new_beliefs: updated_probability = new_beliefs[event_value] current_probability = current_beliefs[event_value] if current_probability != 0: probability_shift = updated_probability / current_probability else: probability_shift = 0 belief_shifts[event_value] = probability_shift return belief_shifts
Tying It All Together
Finally we can come back to talking about the network itself and how I’m propagating the information throughout the network. I’m not currently doing this is what I consider a perfectly correct way… just good enough to communicate the idea. On certain networks you might find my propagation technique does not work.
The magic happens in the “given” method:
class BayesianNode: def __init__(self, name, joint_probability_table): self._name = name self._original_joint_probability_table = joint_probability_table self._joint_probability_table = joint_probability_table self._affects_nodes = [] self._affected_by = [] self._known = False def affected_by(self, other_node): self._affected_by.append(other_node) def affects(self, node): self._affects_nodes.append(node) node.affected_by(self) def _forward_propagate(self, joint_probability_table): self._joint_probability_table.update_applicable_beliefs(self._name, joint_probability_table) for affected_node in self._affects_nodes: affected_node._forward_propagate(self._joint_probability_table) def _backward_propagate(self, joint_probability_table): self._joint_probability_table.update_applicable_beliefs(self._name, joint_probability_table) for affected_node in self._affected_by: affected_node._backward_propagate(self._joint_probability_table) def given(self, value): if not self._known: self._joint_probability_table = self._joint_probability_table.given(self._name, value) self._known = True jpt = self._joint_probability_table.clone() for affected_node in self._affects_nodes: affected_node._forward_propagate(jpt) for affected_node in self._affected_by: affected_node._backward_propagate(jpt) for affected_node in self._affects_nodes: affected_node._backward_propagate(jpt) for affected_node in self._affected_by: affected_node._forward_propagate(jpt) def probability(self): return self._joint_probability_table.probability(self._name) def __str__(self): return str(self._joint_probability_table)
The whole purpose of this class is to model the nodes in our bayesian network and how they relate to one another. Since this object keeps track of which nodes affect which it’s the appropriate place to control the propagation of updated beliefs. Because of this, when we mark a node as having a certain event value being given we can update the node and send the new joint probability table to each other node to update their beliefs with.
Notice that I need to set “self._known” to true. This is so a node doesn’t propagate into itself ad infinitum.
In future posts, I’ll use this code to solve a more complex conditional probability problem so we can see that this works even on something a bit less contrived. In the meantime, if you’ve got any comments or questions post them below!
Again, here’s the code if you’d like to look at it a bit more in depth: Bayesian Networks Git Repo
Thanks for reading.
(Note: This article and the opinions expressed are solely my own and do not represent those of my employer.)
{{ parent.title  parent.header.title}}
{{ parent.tldr }}
{{ parent.linkDescription }}
{{ parent.urlSource.name }}