Over the last few weeks I got in touch with the fascinating field of data visualisation which offers great ways to play around with the perception of information.
In a more formal approach data visualisation denotes “The representation and presentation of data that exploits our visual perception abilities in order to amplify cognition“
Nowadays there is a huge flood of information that hit’s us everyday. Enormous amounts of data collected from various sources are freely available on the internet. One of these data gargoyles is Twitter producing around 400 million (400 000 000!) tweets per day!
Tweets basically offer two “layers” of information. The obvious direct information within the text of the Tweet itself and also a second layer that is not directly perceived which is the Tweets’ metadata. In this case Twitter offers a large number of additional information like user data, retweet count, hashtags, etc. This metadata can be leveraged to experience data from Twitter in a lot of exciting new ways!
So as a little weekend project I have decided to build a small piece of software that generates real-time heat maps of certain keywords from Twitter data.
This is a first static peek on what it’s gonna look like (apparently the friendly floatees use Twitter, too):
See a screencast here: screencast.com
To get this working I have used lots of shiny things:
- Twitter Streaming API
- MongoDB’s capped collections
- jQuery Eventsource
- Google Maps API
The app is written in Python and consists of mainly three components:
A small service based on tweepy that implements a StreamListener which inserts incoming data in a MongoDB capped collection. Here you can also set filter terms. This example uses mostly terms related to “Big Data”.
A Flask based web app which get’s new data from MongoDB and makes use of the publish-subscribe pattern. Being “tailable” MongoDB’s capped collections come in handy. There is no need to remember which messages a client has already received, the cursor itself yields new documents on arrival. Also, capped collections are of a fixed size which is appropriate for this use case but your mileage may vary. Incoming Tweets are published to a redis channel for which there is also a listener that returns a “text/event-stream” “Content-Type” header for connecting clients.
With a relatively small amount of code it is possible to turn text data in astonishing visualizations. With only a little more effort different hash tags could be illustrated by different colors. Or one could count tweets on a topic in certain regions and then compare activity based on number of citizens. There are lots of interpretations possibel through the underlying data set.
You can find the code on github, feel free to fork and play around!