DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. The 'Probability to Win' Is Hard to Estimate

The 'Probability to Win' Is Hard to Estimate

A big data and data science expert looks into this difficult problem of statistics using the R language to find results and visualize the data.

Arthur Charpentier user avatar by
Arthur Charpentier
·
Nov. 09, 18 · Tutorial
Like (3)
Save
Tweet
Share
4.52K Views

Join the DZone community and get the full member experience.

Join For Free

Real-time computation (or estimation) of the "probability to win" is difficult. We've seem that in soccer games, in elections... but actually, as a professor, I see that frequently when I grade my students.

Consider a classical multiple choice exam. After each question, imagine that you try to compute the probability that the student will pass. Consider here the case where we have 50 questions. Students pass when they have 25 correct answers, or more. Just for simulations, I will assume that students just flip a coin at each question... I have n students, and 50 questions.

set.seed(1)
n=10
M=matrix(sample(0:1,size=n*50,replace=TRUE),50,n)

Let Xi,j denote the score of student i at question j. Let Si,j denote the cumulated score, i.e. Si,j=Xi,1+⋯+Xi,j. At step j, I can get some sort of prediction of the final score, using Ti,j=50×Si,j/j. Here is the code:

SM=apply(M,2,cumsum)
NB=SM*50/(1:50)

We can actually plot it:

plot(NB[,1],type="s",ylim=c(0,50))
abline(h=25,col="blue")
for(i in 2:n) lines(NB[,i],type="s",col="light blue")
lines(NB[,3],type="s",col="red")


But that's simply the prediction of the final score, at each step. That's not the computation of the probability to pass! Let’s try to see how we can do it… If after j questions, the student has 25 correct answers, the probability should be 1 – i.e. if Si,j≥25 – since he cannot fail. Another simple case is the following: if after j questions, the number of points he can get with all correct answers until the end is not sufficient, he will fail. That means if Si,j+(50−i+1) < 25 the probability should be 0. Otherwise, to compute the probability to sucess, it is quite straightforward. It is the probability to obtain at least 25−Si,j correct answers, out of 50−j questions, when the probability of success is actually Si,j/j. We recognize the survival probability of a binomial distribution. The code is then simply:

PB=NB*NA
for(i in 1:50){
  for(j in 1:n){
    if(SM[i,j]&gt;=25) PB[i,j]=1
    if(SM[i,j]+(50-i+1)&lt;25)   PB[i,j]=0
    if((SM[i,j]&lt;25)&amp;(SM[i,j]+(50-i+1)&gt;=25)) PB[i,j]=1-pbinom(25-SM[i,j],size=(50-i),prob=SM[i,j]/i)
  }}

So if we plot it, we get:

plot(PB[,1],type="s",ylim=c(0,1))
abline(h=25,col="red")
for(i in 2:n) lines(PB[,i],type="s",col="light blue")
lines(PB[,3],type="s",col="red")

which is much more volatile than the previous curves we obtained! So yes, computing the "probability to win" is a complicated exercise! Don't blame those who find it hard to do!

Of course, things are slightly different if my students don't flip a coin... this is what we obtain if half of the students are good (2/3 probability to get a question correct) and half are not good (1/3 chance):

Image title

If we look at the probability to pass, we usually do not have to wait until the end (the 50 questions) to know who passed and who failed.

PS: I guess a less volatile solution can be obtained with a Bayesian approach... if I find some spare time this week, I will try to code it...

Pass (software) COinS Exercise (mathematics) Sort (Unix) Computing Distribution (differential geometry)

Published at DZone with permission of Arthur Charpentier, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Using AI and Machine Learning To Create Software
  • Easy Smart Contract Debugging With Truffle’s Console.log
  • A Simple Union Between .NET Core and Python
  • Automated Performance Testing With ArgoCD and Iter8

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: