From Simpson's Paradox to Pies
From Simpson's Paradox to Pies
Join the DZone community and get the full member experience.
Join For FreeHow to Simplify Apache Kafka. Get eBook.
Today, I wanted to publish a post on economics, and decision theory. And probability too… Those who do follow my blog should know that I am a big fan of Simpson’s paradox. I also love to mention it in my econometric classes. It does raise important questions, that I do relate to multicolinearity, and interepretations of regression models, with multiple (negatively correlated) explanatory variables. This paradox has amazing pedogological virtues. I did mention it several times on this blog (I should probably mention that I discovered this paradox via Marco Scarsini, who did learn me a lot of things, in decision theory and in probability). For those who do not know this paradox, here is an example that Marco gave in one of his talk, a few years ago. Consider the following statistics, when healthy people entered in some hospital
hospital  total  survivors  deaths  survival rate 
hospital A  600  590  10  98% 
hospital B  900  870  30  97% 
while, when sick people entered in the same hospitals
hospital  total  survivors  deaths  survival rate 
hospital A  400  210  190  53% 
hospital B  100  30  70  30% 
Somehow, whatever your health situation, you should choose hospital A. Now, if we agregate
hospital  total  survivors  deaths  survival rate 
hospital A  1000  800  200  80% 
hospital B  1000  900  100  90% 
i.e. without any doubts, people should choose hospital B.
Actually, Simpson’s paradox is called Simpson’s paradox because Colin Blyth named it that way in 1972, in his paper entitled on Simpson’s paradox and the surething principle (an economic article in a statistical journal), that can be downloaded from http://www.stat.cmu.edu/~fienberg/…. He found this paradox in apaper published in 1951 by Edward Simpson, even if other papers actually did mention it earlier. The most popular application is probably admission at Berckley’s graduate studiesprograms, and sex bias, seeBickel, Hammel & O’Connell (1975), that can be downloaded from http://www.unc.edu/~nielsen/…. I also mentioned a geometric interpretation of this paradox a few years ago on my blog, which is so simple to understand that the paradox is no longer a paradox actually, since on the example above, we had
and
while
With symbolic notations, one can have at the same time
and
with also
as shown on the graph below
There should be connection between Simpson’s paradox and the ecological fallacy (which is an issue I recently discovered and that I found extremely interesting, related again to difficulties of interpreting
regressions). But that’s another story. My point today is that Colin Blyth did mention another nice paradox, that is related, this time, to stochastic orderings. The idea is the following. Consider the three spinnersdrawn below (imagine some arrows in those circles)
 spinner A: no matter where the arrow stops, the gain is 3,
 spinner B: 56% chances to gain 2, 22% chances to gain 4, and 22% chances to gain 6,
 spinner C: 51% chances to gain 1, 49% chances to gain 5.
Instead of spinners, it is also possible to consider three different lotteries,
You play against a friend, you pick a spinner, while the friend picks another. Everyone flick his arrow, the highest number wins (no matter the difference). Let us compute the odds. First case, A against B, from
A’s perspective
B2  B4  B6  
A3  56% +1 win 
22% 1 lose 
22% 3 lose 
In that case, A has 56% chance of beating B. Second case, A against C, from A’s perspective,
C1  C5  
A3  51% +1 win 
49% 2 lose 
C1  C5  
B2  28.56% +1 win 
27.44% 3 lose 
B4  11.22% +3 win 
10.78% 1 lose 
B6  11.22% +5 win 
10.78% +1 win 
 A is the best choice, since it beats both with – always – more than 50% chance,
 C is the worst choice, since it is beaten by both with – always  more than 50% chance,
us compute the odds, one more time. First case, A against B and C, from A’s perspective
B2 C1 
B2 C5 
B4 C1 
B4 C5 
B6 C1 
B6 C5 

A3  28.56% +1 win 
27.44% 2 lose 
11.22% 1 lose 
10.78% 1 lose 
11.22% 3 lose 
10.78% 3 lose 
A3 C1 
A3 C5 

B2  28.56% 1 lose 
27.44% 2 lose 
B4  11.22% +1 win 
10.78% 1 lose 
B6  11.22% +3 win 
10.78% +1 win 
A3 B2 
A3 B4 
A3 B6 

C1  28.56% 2 lose 
11.22% 3 lose 
11.22% 5 lose 
C5  27.44% +2 win 
10.78% +1 win 
10.78% 1 lose 
In that case, C has 38.22% chance of beating A and B. So, if we try to summarize, this time
 C is the best choice, since has (strictly) more than 1/3 chances to win, which the highest probability
 A is the worst choice, since has (strictly) less than 1/3 chances to win, which the lowest probability
Odd isn’t it ? Now, is there an interpretation of that paradox ? Yes, Martin Gardner, in his paper on induction and probability, mentioned the case of drug testing. The value we had with the spinner is the health level, rated from 1 to 6. Thus, taking drug A, you always get an average health level of 3. With drug C, on the other hand, you get either very sick (level 1) or very well (level 5). Consider now a doctor who wants to maximize the patient’s chance of being well. If only pills A and C are available, then the doctor should choose A. This is what we’ve seen in the first part. Assume that now a company delivers a third pill, called drug B. Then the doctor should find C more interesting…. Odd, isn’t it ?
Colin Blyth gave a more amusing application. Assume that you like to go to the restaurant, and you like get a dessert there. Dessert A – the apple pie – is the average one, with a standard level, that you rank 3 (on a scale from 1 to 6). Dessert C – the cheese cake  can either be awfull (ranked 1) or delicious (ranked 5). You’d better go for the apple pie if you want to maximize the probability of not being disappointed (i.e. maximizing your “best chance“ according to Colin Blyth, but I guess it can be interpreted as regret minimization too). Now assume that dessert B – the blueberry pie – is available (with ranks given by the spinner). Then you should go for the cheese cake. I let you imagine the discussion that you can have, then, with your favorite waitress
 Hi Mr Freakonometrics, do you want a piece of apple pie ? (yes, actually she also comes frequently on my blog, and knows me from my pseudo…)
 Probably. But actually, I was wondering if you did have your blueberry pie today ?
 Yes, in fact we do….
 Great, in that case, I’ll go for the cheese cake.
She’ll probably think that I am freak… so I hope she’ll come and read my post, to understand that, actually, it does make a lot of sense to go for what was supposed to be my worst case.
12 Best Practices for Modern Data Ingestion. Download White Paper.
Published at DZone with permission of Arthur Charpentier , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
{{ parent.title  parent.header.title}}
{{ parent.tldr }}
{{ parent.linkDescription }}
{{ parent.urlSource.name }}