Last week I shared research that can predict the movements of people within a disaster area by analysing their mobile phone usage in the months prior to the disaster. Such uses for big data are becoming commonplace. Google Flu Trends has largely been a success, albeit with a well publicised mishap during the latest American flu season. Indeed such has been the success that they have taken the approach and applied it to Dengue.
With Twitter being so potent a source of updates on what we’re all up to, one would think it would be just as useful as our mobile phone behaviour or our search engine requests. First though we have to understand how often people make reference to their location in their messages. There are two ways for Twitter to identify the location of a tweet. The first is the use of Places, where tweeters actually identify the location they’re in from within the tweet. The second uses the GPS data from the mobile device they used to send the tweet.
Due to security concerns however, this latter option is disabled by default, with users having to manually allow it again. Recent research has looked at the number of tweets that allow observers to identify the location from which it was sent, and indeed how many of these tweets were from the two forms of identifying source.
They analysed the Twitter decahose over a period of 39 days towards the end of 2012. They found that 2.02% of all tweets included geographic metadata, with 1.8% having a Place indicator, 1.6% having Exact Location, and 1.4% having both (these sum to more than the total because tweets can have both). Closer inspection found another 1.1% of tweeters who had manually entered their location. So 3.04%, or roughly 46.5 million tweets contained some kind of identifying information, with over 600,000 unique places on Earth captured each day.
You can see below how these were distributed around the world.
It is of course worth pointing out that georeferenced tweets were made by just 8.2% of total Twitter users, with 1.1% making 66% of all location based tweets, so it is not a large proportion by any means. Does that curtail any hopes of utilising this data for meaningful work? That was a question unfortunately the researchers didn’t explore, instead choosing to focus purely on the number of georeferenced tweets and the geographic spread of retweets (location was found to have no bearing on our likelyhood of sharing something). Hopefully the research will open the door for more digging around in this fascinating area.