Academics Data Science the S*@t Out Of the FIFA World Cup
Academics Data Science the S*@t Out Of the FIFA World Cup
Even if you're not a huge soccer fan (excuse me, footballer), you still get sucked into the World Cup. Read on to learn how big data is being used in the tournament.
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
As I write, the 2018 FIFA World Cup is underway in Russia. Everyone has their favorite team to win, maybe because they’re a fervent supporter of their national side or have entered a sweepstake and been allocated a country. Others take the word of the prognostications of animals, such as Paul the Octopus before having a flutter on the result.
However, here at DZone, we like to keep things techy, so what does data science say about the probable outcome of the competition?
Well, researchers at the Technical University of Dortmund in Germany have repeatedly run a simulation of the tournament (100,000 times). The team combined machine learning and conventional statistics, using a random-forest approach, to identify the most likely winner.
The random-forest technique is commonly used for large datasets. It determines future events using a decision tree, calculating the outcome for each branch using a set of training data. Decision trees typically suffer from overfitting (distortion occurring when there is insufficient training data), but the random-forest approach avoids overfitting by making repeated calculations on randomly selected branches and taking the average.
The researchers modeled the outcome of each game to determine the potential outcome of the tournament. The factors they considered included each country’s GDP and population, FIFA’s ranking, number of Champions League players, average age, and more.
The Winner Is...
So, who is the predicted winner? To save you reading the whole paper, I'll share the result...Spain...with a probability of 17.8 percent.
But, the way the tournament is organized into groups and games is a factor in the result. Assuming Germany gets through the initial group stages, it will encounter a strong opponent (Brazil, Switzerland, Serbia or Costa Rica) in the next stage (which could knock it out of the tournament). Spain has an easier match if it reaches that stage (against Uruguay, Russia, Saudi Arabia, or Egypt). However, if Germany gets through that stage and into the quarterfinals, it catches Spain and becomes the favored team.
So it’s either Spain or Germany, right?
Or Maybe Not...
Well, others have made different predictions using a statistical approach that has been successful in previous championships. The model takes bookmakers’ odds, converts them into winning probabilities, and simulates the tournament by repeatedly playing through every conceivable match pairing. According to academics at the University of Innsbruck who used this approach, Brazil is predicted to win with a probability of 16.6%, followed by Germany (15.8%), and Spain (12.5%).
An Australian academic also predicts Brazil to be the winning team with a probability of 15.4%, using a Monte Carlo simulation of the possible outcomes of the tournament’s 63 matches to assess the probability of how far each team will progress.
So it’s definitely Brazil, Germany or Spain. Maybe.
How Are The Teams Using Big Data?
Turning away from using AI and big data to predict the result of the tournament, the teams involved are also using data science and AI to improve their chances of a win.
Forbes reports that the German team is working with SAP, who have built software and analytics tools to offer new insights. These are already used by a number of top-flight teams including the UK Premier League champions (and my personal favorite team) Manchester City. The philosophy behind the software is that tactics are essential, and these should be derived by “observing and analyzing the various data sources of a game” according to head of scouting and match analysis, Cristofer Clemens.
One of the ways that the SAP software helps shape a team’s tactics is when it comes to penalty shootouts. The "Penalty Insights" tool is full of information on the opposing team’s penalty records, including videos of previous matches and statistics about run-up techniques and the area of the goal the player is likely to shoot at.
The Economist is also thinking ahead to the knockout stage of the World Cup when penalty shoot-outs can be used to determine the outcome of a game (and send a team crashing out of the competition) if the result is a draw after full and extra time. Amazingly, before shoot-outs were introduced in 1982, if a match was undecided after 120 minutes of play, the winner was determined by the toss of a coin.
Since the 1982 World Cup, there have been 26 penalty shoot-outs to decide matches in the knockout stages of the World Cup. The Economist reports that, of the 9 final matches occurring since their introduction, 7 of the 18 teams playing came to the final as a result of a successful shoot-out result. Furthermore, two of the finals themselves have been decided by penalty shoot-outs. So they are indeed crucial. Fortunately, the article has used analysis by Ignacio Palacios-Huerta of the London School of Economics to determine the best strategy to win a penalty shoot-out. Let’s just hope the international teams playing in this year’s tournament have time to subscribe! Maybe there's hope for the Panama team yet!
Opinions expressed by DZone contributors are their own.