# From Simpson's Paradox to Pies

Today, I wanted to publish a post on economics, and decision theory. And probability too… Those who do follow my blog should know that I am a big fan of Simpson’s paradox. I also love to mention it in my econometric classes. It does raise important questions, that I do relate to multicolinearity, and interepretations of regression models, with multiple (negatively correlated) explanatory variables. This paradox has amazing pedogological virtues. I did mention it several times on this blog (I should probably mention that I discovered this paradox via Marco Scarsini, who did learn me a lot of things, in decision theory and in probability). For those who do not know this paradox, here is an example that Marco gave in one of his talk, a few years ago. Consider the following statistics, when healthy people entered in some hospital

hospital | total | survivors | deaths | survival rate |

hospital A | 600 | 590 | 10 | 98% |

hospital B | 900 | 870 | 30 | 97% |

while, when sick people entered in the same hospitals

hospital | total | survivors | deaths | survival rate |

hospital A | 400 | 210 | 190 | 53% |

hospital B | 100 | 30 | 70 | 30% |

Somehow, whatever your health situation, you should choose hospital A. Now, if we agregate

hospital | total | survivors | deaths | survival rate |

hospital A | 1000 | 800 | 200 | 80% |

hospital B | 1000 | 900 | 100 | 90% |

i.e. without any doubts, people should choose hospital B.

Actually, Simpson’s paradox is called Simpson’s paradox because Colin Blyth named it that way in 1972, in his paper entitled on Simpson’s paradox and the sure-thing principle (an economic article in a statistical journal), that can be downloaded from http://www.stat.cmu.edu/~fienberg/…. He found this paradox in apaper published in 1951 by Edward Simpson, even if other papers actually did mention it earlier. The most popular application is probably admission at Berckley’s graduate studiesprograms, and sex bias, seeBickel, Hammel & O’Connell (1975), that can be downloaded from http://www.unc.edu/~nielsen/…. I also mentioned a geometric interpretation of this paradox a few years ago on my blog, which is so simple to understand that the paradox is no longer a paradox actually, since on the example above, we had

and

while

With symbolic notations, one can have at the same time

and

with also

as shown on the graph below

There should be connection between Simpson’s paradox and the ecological fallacy (which is an issue I recently discovered and that I found extremely interesting, related again to difficulties of interpreting

regressions). But that’s another story. My point today is that Colin Blyth did mention another nice paradox, that is related, this time, to stochastic orderings. The idea is the following. Consider the three spinnersdrawn below (imagine some arrows in those circles)

- spinner A: no matter where the arrow stops, the gain is 3,
- spinner B: 56% chances to gain 2, 22% chances to gain 4, and 22% chances to gain 6,
- spinner C: 51% chances to gain 1, 49% chances to gain 5.

Instead of spinners, it is also possible to consider three different lotteries,

You play against a friend, you pick a spinner, while the friend picks another. Everyone flick his arrow, the highest number wins (no matter the difference). Let us compute the odds. First case, A against B, from

A’s perspective

B-2 | B-4 | B-6 | |

A-3 | 56% +1 win | 22% -1 lose | 22% -3 lose |

In that case, A has 56% chance of beating B. Second case, A against C, from A’s perspective,

C-1 | C-5 | |

A-3 | 51% +1 win | 49% -2 lose |

C-1 | C-5 | |

B-2 | 28.56% +1 win | 27.44% -3 lose |

B-4 | 11.22% +3 win | 10.78% -1 lose |

B-6 | 11.22% +5 win | 10.78% +1 win |

- A is the best choice, since it beats both with – always – more than 50% chance,
- C is the worst choice, since it is beaten by both with – always - more than 50% chance,

us compute the odds, one more time. First case, A against B and C, from A’s perspective

B-2 C-1 | B-2 C-5 | B-4 C-1 | B-4 C-5 | B-6 C-1 | B-6 C-5 | |

A-3 | 28.56% +1 win | 27.44% -2 lose | 11.22% -1 lose | 10.78% -1 lose | 11.22% -3 lose | 10.78% -3 lose |

A-3 C-1 | A-3 C-5 | |

B-2 | 28.56% -1 lose | 27.44% -2 lose |

B-4 | 11.22% +1 win | 10.78% -1 lose |

B-6 | 11.22% +3 win | 10.78% +1 win |

A-3 B-2 | A-3 B-4 | A-3 B-6 | |

C-1 | 28.56% -2 lose | 11.22% -3 lose | 11.22% -5 lose |

C-5 | 27.44% +2 win | 10.78% +1 win | 10.78% -1 lose |

In that case, C has 38.22% chance of beating A and B. So, if we try to summarize, this time

- C is the best choice, since has (strictly) more than 1/3 chances to win, which the highest probability
- A is the worst choice, since has (strictly) less than 1/3 chances to win, which the lowest probability

Odd isn’t it ? Now, is there an interpretation of that paradox ? Yes, Martin Gardner, in his paper on induction and probability, mentioned the case of drug testing. The value we had with the spinner is the health level, rated from 1 to 6. Thus, taking drug A, you always get an average health level of 3. With drug C, on the other hand, you get either very sick (level 1) or very well (level 5). Consider now a doctor who wants to maximize the patient’s chance of being well. If only pills A and C are available, then the doctor should choose A. This is what we’ve seen in the first part. Assume that now a company delivers a third pill, called drug B. Then the doctor should find C more interesting…. Odd, isn’t it ?

Colin Blyth gave a more amusing application. Assume that you like to go to the restaurant, and you like get a dessert there. Dessert A – the apple pie – is the average one, with a standard level, that you rank 3 (on a scale from 1 to 6). Dessert C – the cheese cake - can either be awfull (ranked 1) or delicious (ranked 5). You’d better go for the apple pie if you want to maximize the probability of not being disappointed (i.e. maximizing your “best chance“ according to Colin Blyth, but I guess it can be interpreted as regret minimization too). Now assume that dessert B – the blueberry pie – is available (with ranks given by the spinner). Then you should go for the cheese cake. I let you imagine the discussion that you can have, then, with your favorite waitress

- Hi Mr Freakonometrics, do you want a piece of apple pie ? (yes, actually she also comes frequently on my blog, and knows me from my pseudo…)

- Probably. But actually, I was wondering if you did have your blueberry pie today ?

- Yes, in fact we do….

- Great, in that case, I’ll go for the cheese cake.

She’ll probably think that I am freak… so I hope she’ll come and read my post, to understand that, actually, it does make a lot of sense to go for what was supposed to be my worst case.

Published at DZone with permission of {{ articles[0].authors[0].realName }}, DZone MVB. (source)

Opinions expressed by DZone contributors are their own.

{{ nComments() }}