Over a million developers have joined DZone.

More on "Staggering Odds" - Visualizing Probabilities

· Big Data Zone

Learn how you can maximize big data in the cloud with Apache Hadoop. Download this eBook now. Brought to you in partnership with Hortonworks.

Following my previous post, a few more things. As mentioned by Frédéric, it is – indeed – possible to compute the probability of all pairs. More precisely, all pairs are not as likely to occur: some teams can play against (almost) eveyone, while others cannot. From the previous table, it is possible to compute probability that the last team plays against team 1. Or team 2 (numbers are from the  xls file mentioned previously). To make it simple

> table(M[,2*n])/length(M[,2*n])*100

       1        2        3        5        7       10       11 
11.82500 12.61212 12.61212 13.25279 19.31173 18.70767 11.67856

Here, the last team (as I did rank them) has 11.8% chances to play against team 1, and 19.3% to play against team 7. If we compute all the probabilities, we obtain

> S
       1     2     3     5     7    10    11    13
4   0.00 14.16 14.16  0.00 22.22 21.25 13.05 15.13
6  12.52 13.19 13.19 14.11 20.13  0.00 12.35 14.47
8  18.78  0.00 19.54 21.50  0.00  0.00 18.39 21.76
9  18.78 19.54  0.00 21.50  0.00  0.00 18.39 21.76
12 14.68 15.54 15.54 16.56  0.00 23.19 14.47  0.00
14 11.64 12.37 12.37 13.05 18.96 18.25  0.00 13.34
15 11.77 12.55 12.55  0.00 19.36 18.59 11.64 13.50
16 11.82 12.61 12.61 13.25 19.31 18.70 11.67  0.00

that can be visualized below

White areas cannot be reached, while red ones are more likely. Here, we compute probability that home team (given on the x-axis) plays against some visitor team (on the y-axis). The fact that those probabilities are not uniform seems odd. But I guess it comes from those constraints…

Another weird point: it is possible to reach a deadlock. At least with the technique I have been using. So far, I did not count them. But we can, simply the following code

> U=c(4,6,8,9,12,14,15,16)
> a1=U[1]
> b1=U[2]
> c1=U[3]
> d1=U[4]
> e1=U[5]
> f1=U[6]
> g1=U[7]
> h1=U[8]
> a2=b2=c2=d2=e2=f2=g2=h2=NA
> posa2=(1:n)%notin%c(LISTEIMPOSSIBLE[,a1])
> if(length(posa2)==0){na=na+1}
> for(a2 in posa2){
+ posb2=(1:n)%notin%c(LISTEIMPOSSIBLE[,b1],a2)
+ if(length(posb2)==0){na=na+1}
+ for(b2 in posb2){
+ posc2=(1:n)%notin%c(LISTEIMPOSSIBLE[,c1],a2,b2)
+ if(length(posc2)==0){na=na+1}
+ for(c2 in posc2){
+ posd2=(1:n)%notin%c(LISTEIMPOSSIBLE[,d1],
+ a2,b2,c2)
+ if(length(posd2)==0){na=na+1}
+ for(d2 in posd2){
+ pose2=(1:n)%notin%c(LISTEIMPOSSIBLE[,e1],
+ a2,b2,c2,d2)
+ if(length(pose2)==0){na=na+1}
+ for(e2 in pose2){
+ posf2=(1:n)%notin%c(LISTEIMPOSSIBLE[,f1],
+ a2,b2,c2,d2,e2)
+ if(length(posf2)==0){na=na+1}
+ for(f2 in posf2){
+ posg2=(1:n)%notin%c(LISTEIMPOSSIBLE[,g1],
+ a2,b2,c2,d2,e2,f2)
+ if(length(posg2)==0){na=na+1}
+ for(g2 in posg2){
+ posh2=(1:n)%notin%c(LISTEIMPOSSIBLE[,h1],
+ a2,b2,c2,d2,e2,f2,g2)
+ if(length(posh2)==0){na=na+1}
+ for(h2 in posh2){
+ s=s+1
+ V=c(a1,a2,b1,b2,c1,c2,d1,d2,e1,e2,f1,f2,g1,g2,h1,h2)
+ }}}}}}}}

On the initial ordering of home team, the number of deadlocks was

> na
[1] 657

The probability of obtaining a deadlock is then

> 657/(657+5463)
[1] 0.1073529

(657 scenarios ended in a dead end, while 5463 ended well). The worst case was obtained when we considered

 [1]    6    4   16   14   12   15    8    9

In that case, the probability of obtaining a deadlock was

> 4047/(4047+5463)
[1] 0.4255521

Here, it clearly depends on the ordering. So if we draw – randomly – the order of the home teams, i.e.

> Urandom=sample(U,size=8)

the distribution of the probablity of having a deadlock is

All those computations were based on my understanding of the drawings. But Kristof (aka @ciebiera), on his blog krzysztofciebiera.blogspot.ca/… obtained different results. For instance, based on my previous computations, the probability to obtain identical pairs was 0.018349% (1 chance out of 5463), but Kristof obtained – based on the UEFA procedure (as he called it) – a probability of 0.0181337%. Which is not _ strictly – the same, but both computations yield relatively close results…

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

Topics:

Published at DZone with permission of Arthur Charpentier, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}