Over a million developers have joined DZone.

Can Wikipedia predict movie success?

DZone's Guide to

Can Wikipedia predict movie success?

· Big Data Zone
Free Resource

Need to build an application around your data? Learn more about dataflow programming for rapid development and greater creativity. 

Using online data to predict trends is increasingly popular.  Whilst we’ve seen a number of these endeavours take an altruistic angle, such as attempts to use social data to co-ordinate disaster response, there have been an even larger number of projects with a commercial intent.

For instance there was the project that tried using Google search data to predict stock prices, or Twitter mentions to predict box office success.

The latest attempt has set its sight on the box office, although this time researchers from Oxford University are attempting to predict success based upon that movies Wikipedia entry.

They believed that the activity level of editors combined with the number of views a page receives can predict the success a movie will have at the box office.

To test out the theory, they analysed activity levels at 312 Wikipedia pages for movies prior to their release at the cinema. The analysis included the number of views the page received, the number of editors who had contributed to the article, the number of individual edits made and the collaborative rigour of the editing train of the article.

They found that there were clear links between the activity of the Wikipedia page and the revenue earned at the box office.

The analysis presented here can make predictions with reasonable accuracy as early as one month before release. It is evident that the prediction is more precise for more successful movies. Some examples of the movies whose box office receipts were predicted accurately are Iron Man 2Alice in WonderlandToy Story 3InceptionClash of the Titans, and Shutter Island.

The researchers also believed that monitoring Wikipedia represented a better approach than scouring Twitter.

The predicting power of the Wikipedia-based model, despite its simplicity compared to the Twitter, can be explained by the fact that many of the Wikipedia editors are committed followers of movie industry who gather information and edit related articles significantly earlier than the release date, whereas the “mass” production of tweets only occurs very close to the release time, mostly evoked by marketing campaigns.

What’s more, the researchers are also confident that the approach can easily be extended to other fields, including finance and public policy.

Original post

Check out the Exaptive data application Studio. Technology agnostic. No glue code. Use what you know and rely on the community for what you don't. Try the community version.


Opinions expressed by DZone contributors are their own.


Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.


{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}