Over a million developers have joined DZone.

Can Wikipedia predict movie success?

DZone's Guide to

Can Wikipedia predict movie success?

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

Using online data to predict trends is increasingly popular.  Whilst we’ve seen a number of these endeavours take an altruistic angle, such as attempts to use social data to co-ordinate disaster response, there have been an even larger number of projects with a commercial intent.

For instance there was the project that tried using Google search data to predict stock prices, or Twitter mentions to predict box office success.

The latest attempt has set its sight on the box office, although this time researchers from Oxford University are attempting to predict success based upon that movies Wikipedia entry.

They believed that the activity level of editors combined with the number of views a page receives can predict the success a movie will have at the box office.

To test out the theory, they analysed activity levels at 312 Wikipedia pages for movies prior to their release at the cinema. The analysis included the number of views the page received, the number of editors who had contributed to the article, the number of individual edits made and the collaborative rigour of the editing train of the article.

They found that there were clear links between the activity of the Wikipedia page and the revenue earned at the box office.

The analysis presented here can make predictions with reasonable accuracy as early as one month before release. It is evident that the prediction is more precise for more successful movies. Some examples of the movies whose box office receipts were predicted accurately are Iron Man 2Alice in WonderlandToy Story 3InceptionClash of the Titans, and Shutter Island.

The researchers also believed that monitoring Wikipedia represented a better approach than scouring Twitter.

The predicting power of the Wikipedia-based model, despite its simplicity compared to the Twitter, can be explained by the fact that many of the Wikipedia editors are committed followers of movie industry who gather information and edit related articles significantly earlier than the release date, whereas the “mass” production of tweets only occurs very close to the release time, mostly evoked by marketing campaigns.

What’s more, the researchers are also confident that the approach can easily be extended to other fields, including finance and public policy.

Original post

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.


Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}