Using Wikipedia to fight flu
Join the DZone community and get the full member experience.Join For Free
The idea of using the way we search or the information we share on social media to track and monitor the spread of flu is not a new one. For instance, a Kansas State University project from a few years ago used social media content to both track and prevent the spread of disease.
What about Wikipedia though? We’ve seen previously that researchers have tried to use the frequency and volume of updates on the site to predict the success of movies, but can the same method help manage flu season?
A new study led by researchers at Los Alamos National Laboratories suggests it may well be possible. They used data from Wikipedia to forecast flu data around two weeks ahead of the Centers for Disease Control and Prevention.
The researchers believe that the model they’ve created will allow officials to predict flu with a similar level of accuracy as is currently available for weather forecasting.
The approach can also offer a significant improvement on current methods of flu monitoring, which usually fail to account for those who don’t seek treatment, nor indeed for those who falsely seek treatment.
This data is usually also very slow at propagating itself through the system, therefore offering little in the way of predictive, nor indeed responsive possibilities.
Whilst we’ve seen Google offer flu tracking services in the past, their data tends to be locked away. Wikipedia therefore offers a possible platform for bringing flu tracking into the public domain.
The researchers hope that both the number of readers of articles pertaining to flu can act as a reliable indicator of the spread of the disease.
The team used several years worth of flu article data to train their algorithm to spot connections between the official data collected by the CDC and the viewing figures on Wikipedia.
This algorithm was then used to predict the spread of flu during last year’s flu season. It turned out that the computer was actually pretty good.
“Wikipedia article access logs are shown to be highly correlated with historical influenza-like illness records and allow for accurate prediction of influenza-like illness data several weeks before it becomes available,” the researchers say.
That isn’t to say that the algorithm performed flawlessly. For instance, it emerged that the predictions underestimated the end of flu season quite significantly. The researchers suggest this is because people only tend to look to Wikipedia when they first get flu, so if they re-catch it later in the year, they won’t need to access the information again.
“Since our model does not account for reinfection or multiple strains of influenza, the tail of the epidemic is not predicted well after the peak of flu season has past,” they admit.
So there are certainly some chinks to be ironed out, but it is nonetheless a good advance in the kind of forecasting abilities officials have at the moment. The researchers plan to continue monitoring the performance of their algorithm and improving it continuously, even in real-time.
With flu season fast approaching, these kind of innovations are very welcome indeed.
Opinions expressed by DZone contributors are their own.