Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

DZone's Guide to

# From Simpson's Paradox to Pies

· Big Data Zone ·
Free Resource

Comment (0)

Save
{{ articles[0].views | formatCount}} Views

How to Simplify Apache Kafka. Get eBook.

Today, I wanted to publish a post on economics, and decision theory. And probability too… Those who do follow my blog should know that I am a big fan of Simpson’s paradox. I also love to mention it in my econometric classes. It does raise important questions, that I do relate to multicolinearity, and interepretations of regression models, with multiple (negatively correlated) explanatory variables. This paradox has amazing pedogological virtues. I did mention it several times on this blog (I should probably mention that I discovered this paradox via Marco Scarsini, who did learn me a lot of things, in decision theory and in probability). For those who do not know this paradox, here is an example that Marco gave in one of his talk, a few years ago. Consider the following statistics, when healthy people entered in some hospital

 hospital total survivors deaths survivalrate hospital A 600 590 10 98% hospital B 900 870 30 97%

while, when sick people entered in the same hospitals

 hospital total survivors deaths survivalrate hospital A 400 210 190 53% hospital B 100 30 70 30%

Somehow, whatever your health situation, you should choose hospital A. Now, if we agregate

 hospital total survivors deaths survivalrate hospital A 1000 800 200 80% hospital B 1000 900 100 90%

i.e. without any doubts, people should choose hospital B.

and

while

With symbolic notations, one can have at the same time

and

with also

as shown on the graph below

There should be connection between Simpson’s paradox and the ecological fallacy (which is an issue I recently discovered and that I found extremely interesting, related again to difficulties of interpreting
regressions). But that’s another story. My point today is that Colin Blyth did mention another nice paradox, that is related, this time, to stochastic orderings. The idea is the following. Consider the three spinnersdrawn below (imagine some arrows in those circles)

• spinner A: no matter where the arrow stops, the gain is 3,
• spinner B: 56% chances to gain 2, 22% chances to gain 4, and 22% chances to gain 6,
• spinner C: 51% chances to gain 1, 49% chances to gain 5.

Instead of spinners, it is also possible to consider three different lotteries,

You play against a friend, you pick a spinner, while the friend picks another. Everyone flick his arrow, the highest number wins (no matter the difference). Let us compute the odds. First case, A against B, from
A’s perspective

 B-2 B-4 B-6 A-3 56%+1win 22%-1lose 22%-3lose

In that case, A has 56% chance of beating B. Second case, A against C, from A’s perspective,

 C-1 C-5 A-3 51%+1win 49%-2lose
In that case, A has 51% chance of beating C. Third (an final) case, B against C, from B’s perspective. Assuming independence between the spinners, joint probabilities can easily be computed,
 C-1 C-5 B-2 28.56%+1win 27.44%-3lose B-4 11.22%+3win 10.78%-1lose B-6 11.22%+5win 10.78%+1win
In that case, B has 61.78% chance of beating C. So, if we try to summarize,
• A is the best choice, since it beats both with – always – more than 50% chance,
• C is the worst choice, since it is beaten by both with – always - more than 50% chance,
Now, assume that you play not against one friend, but two friends. An everyone picks a different spinner. Let
us compute the odds, one more time. First case, A against B and C, from A’s perspective
 B-2C-1 B-2C-5 B-4C-1 B-4C-5 B-6C-1 B-6C-5 A-3 28.56%+1win 27.44%-2lose 11.22%-1lose 10.78%-1lose 11.22%-3lose 10.78%-3lose
In that case, A has 28.56% chance of beating B and C. Second case, B against A and C, from B’s perspective,
 A-3C-1 A-3C-5 B-2 28.56%-1lose 27.44%-2lose B-4 11.22%+1win 10.78%-1lose B-6 11.22%+3win 10.78%+1win
In that case, B has 33.22% chance of beating A and B.Third (an final) case, C against A, from C’s perspective,
 A-3B-2 A-3B-4 A-3B-6 C-1 28.56%-2lose 11.22%-3lose 11.22%-5lose C-5 27.44%+2win 10.78%+1win 10.78%-1lose

In that case, C has 38.22% chance of beating A and B. So, if we try to summarize, this time

• C is the best choice, since has (strictly) more than 1/3 chances to win, which the highest probability
• A is the worst choice, since has (strictly) less than 1/3 chances to win, which the lowest probability

Odd isn’t it ? Now, is there an interpretation of that paradox ? Yes, Martin Gardner, in his paper on induction and probability, mentioned the case of drug testing. The value we had with the spinner is the health level, rated from 1 to 6. Thus, taking drug A, you always get an average health level of 3. With drug C, on the other hand, you get either very sick (level 1) or very well (level 5). Consider now a doctor who wants to maximize the patient’s chance of being well. If only pills A and C are available, then the doctor should choose A. This is what we’ve seen in the first part. Assume that now a company delivers a third pill, called drug B. Then the doctor should find C more interesting…. Odd, isn’t it ?

Colin Blyth gave a more amusing application. Assume that you like to go to the restaurant, and you like get a dessert there. Dessert A – the apple pie – is the average one, with a standard level, that you rank 3 (on a scale from 1 to 6). Dessert C – the cheese cake - can either be awfull (ranked 1) or delicious (ranked 5). You’d better go for the apple pie if you want to maximize the probability of not being disappointed (i.e. maximizing your “best chance“ according to Colin Blyth, but I guess it can be interpreted as regret minimization too). Now assume that dessert B – the blueberry pie – is available (with ranks given by the spinner). Then you should go for the cheese cake. I let you imagine the discussion that you can have, then, with your favorite waitress

- Hi Mr Freakonometrics, do you want a piece of apple pie ? (yes, actually she also comes frequently on my blog, and knows me from my pseudo…)

- Probably. But actually, I was wondering if you did have your blueberry pie today ?

- Yes, in fact we do….

- Great, in that case, I’ll go for the cheese cake.

She’ll probably think that I am freak… so I hope she’ll come and read my post, to understand that, actually, it does make a lot of sense to go for what was supposed to be my worst case.

Topics:

Comment (0)

Save
{{ articles[0].views | formatCount}} Views

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.