Over a million developers have joined DZone.

Getting Started with Neo4J Using Your Twitter Data

DZone's Guide to

Getting Started with Neo4J Using Your Twitter Data

· Big Data Zone
Free Resource

Need to build an application around your data? Learn more about dataflow programming for rapid development and greater creativity. 

When learning a new technology it’s best to have a toy problem in mind so that you’re not just reimplementing another glorified “Hello World” project. Also, if you need lots of data, it’s best to pull in a fun data set that you already have some familiarity with. This allows you to lean upon already established intuition of the data set so that you can more quickly make use of the technology. (And as an aside, this is just why we so regularly use the StackExchange SciFi data set when presenting our new ideas about Solr.)

When approaching a graph database technology like Neo4J, if you’re as avid of a Twitter user as I am then POOF you already have the best possible data set for becoming familiar with the technology — your own Social network. And this blog post will help you download and setup Neo4J, set up a Twitter app (needed to access the Twitter API), pull down your social network as well as any other social network you might be interested in. At that point we’ll interrogate the network using the Neo4J and the Cypher syntax. Let’s go!

Installing and Setting Up Neo4J

Since we’re not setting Neo4J up for production use, this part’s real easy. Just go to the Neo4J download page, click on that giant blue download button, and 36.1M later you’ll have your very own copy of Neo4J. Unzip it to some reasonable place on your machine, cd into that directory, and simply issue the command bin/neo4j start. (Once you’re finished, a bin/neo4j stop will shut Neo4J down.) Now if you point your browser at http://localhost:7474 and see stuff (rather than lack of stuff), then you’re ready to start shoveling data into Neo4J.

Prepping Twitter

You’ll need to create a Twitter app before you can start pulling down your connections because you need the app’s credentials in order to access Twitter’s API. But don’t sweat it, this literally takes less than a minute. Just go to the Twitter developer apps page, sign in, and there will be yet another big blue button, this time labeled “Create a new application” — click it! After filling out a really short form, checking the “I blindly agree to whatever is included in this legal contract” checkbox, entering a CAPTCHA string, and clicking the “Create your own Twitter application” button, you will indeed have your very own Twitter app. You’ll be taken to a screen that contains the details for your new app, but most importantly the OAuth credentials. Initially, you won’t have the access tokens, but you can click the “Create access tokens” button at the bottom and next time you refresh the page (wait a few seconds) you’ll see that the access keys are available. Keep track of the credentials here because you’ll need to refer to them soon.

Scraping Your Social Circles from Twitter

Check out my Python TwitterScraper script. Though it’s not yet the most beautiful code, it doesn’t really matter, because there’s not much here! Let’s take a moment to walk through it. The first section is where you set up Twitter and Neo4J. Naturally you’ll need to pip install the Tweepy and Py2Neo libraries, but they don’t have any weird dependencies, so this shouldn’t be a problem. Also notice, this is where all the access keys for your Twitter app should be used. Go ahead and copy and paste your credentials there. Now you should be ready to go.

The remaining code includes two functions. The first, create_or_get_node, creates, or gets a node (in this case a Twitter user) from Neo4J by id_str, and if it’s creating the node for the first time, it also inserts all of the relevant user metadata into Neo4J. Also, the create_or_get_node optionally takes a list of labels that will later be used to group certain users together. The second function. insert_user_with_friends, takes a Twitter user (via their screen name), pulls that all relevant metadata for that user from the Twitter API and inserts it into Neo4J. This function will then do the same thing for all the individuals that this Twitter user follows. And finally, insert_user_with_friends will establish a FOLLOWS relationship linking the source Twitter user to those that she follows. Again here, insert_user_with_friends takes an optional list of labels that can be used to group the seed nodes (those that are followed do not get labeled).

The last bit of the script is the fun part. This is where you programmatically lay out the social networks and individuals that you want to stalk… er, uh… observe. For your convenience, I’ve added all of the OpenSource Connections team, as well as several notable individuals from the Neo4J community. I’ve also included grouping labels that I though were pretty reasonable descriptors for these individuals and groups. As that last comment in the code states, make sure to add several people that you follow as well. Remember, the goal here is to create a data set that you are eminently familiar with. Once you’re happy with the data set, the run it: python TwitterScraper.py. It will pull down twitter users 200 at a time and insert them into Neo4J as fast as possible. Soon the program will hit Twitter’s rate limit cutoff, at which point, the script will wait until the rate limit has been lifted and will continue pulling down the rest of the data. All together, you can plan on getting around 200 updates per minute.

Start Infiltrating the Social Network!

Now for the fun part; let’s start putting some queries together and pulling back interesting data. In all of the example’s below, we will be using the default Neo4J browser which you’ll still find at http://localhost:7474/. Here’s we’re using the Cypher query language. This blog post won’t go into too much detail about Cypher syntax itself, but feel free to look at the very rich Neo4J documentation. Also, I’ll be using my own Twitter screen name “JnBrymn” as an example, so feel free to replace my screen name with your own and try the queries for yourself.

First off, let’s make sure the data we’ve ingested seems reasonable. The most obvious thing to do is to make sure we’re actually in the data set:

MATCH (n {screen_name:"JnBrymn" }) 

Up pops an orange node representing me. And if I click on the node, I see a list of all my metadata.

Screen Shot 2013-11-27 at 12.57.12 AM

I wonder just how many users we have indexed now?

MATCH (n) 
RETURN count(*)

7098 users, not bad. How many are you following?

MATCH (n {screen_name:"JnBrymn"})-[:FOLLOWS]->(o)
RETURN count(*)

371 – yep, that looks right. And check out how easy Cypher is — you’re basically drawing ASCII art of the node connections. So it’s easy to ask the next obvious question: How many are following me? Here I just switch the direction of the relationship arrow:

MATCH (n {screen_name:"JnBrymn"})<-[:FOLLOWS]-(o) 
RETURN count(*)

Hmm… only 10 followers. Am I really that unpopular? (Checking Twitter now.) No, says I’ve got 460 friends. Oh, that’s right, if you’ll remember, we’re only collecting outbound FOLLOWS relationships from our seed users (labeled as SeedNode). The reason for this is because some people, Justin Beiber for example, are followed by millions of Twitter users! And we certainly don’t want to keep track of that for now.

But all this makes me think, of the seed users that I follow, who does not follow me back?

MATCH (n {screen_name:"JnBrymn"})-[:FOLLOWS]->(o:SeedNode)
RETURN o.screen_name

This returns a single name: mesirii. This is Michael Hunger, one of the Neo4J hot shots. If he’s not following me back, then I’m definitely not doing a good job of infiltrating the Neo4J community yet. No matter… I bet he’s a @justinbeiber follower anyway… let’s check:

MATCH (n:SeedNode)-[:FOLLOWS]->(o {screen_name:"justinbieber"})
RETURN n.screen_name

Sadly… no one on our list follows Justin Bieber… I was sure I would have some good blackmail fodder there! (But hey, maybe you’ll discover some Beliebers in your own data set :P )

Hmm… well if I’m going to break into the Neo4J community, I need to find my likely vectors. Let’s create a list of all people who follow me and order them by the number of Neo4J people that they follow. Maybe I can get introductions through these friends:

MATCH (n:Neo)-[:FOLLOWS]->(m:SeedNode {screen_name:"JnBrymn"}),
RETURN count(*), n.screen_name
ORDER BY count(*) desc

This returns:

count(*) |  n.screen_name
13       |  wefreema
11       |  technige

Sweet, so my friends wefreema and technige look like my gatekeepers to the Neo4J community. The only thing left to determine is what people I need to connect to.

MATCH (n:Neo)-[:FOLLOWS]->(o)
RETURN count(*), o.screen_name
ORDER BY count(*) desc

This query enumerates the most popular people among the Neo4J community based upon who my Neo seed nodes are following. And the results of this query look like this:

count(*) |  n.screen_name
13       |  mesirii
12       |  emileifrem
12       |  jimwebber
12       |  digitalstain
11       |  apcj
11       |  cleishm
11       |  pandamonial
11       |  iansrobinson
11       |  p3rnilla
11       |  neo4j

As expected, plenty of these people are SeedNodes that I selected because I already knew them to be leaders in the community: mesirii, emileifrem, jimwebber, p3rnilla, neo4j. But who are these guys: digitalstain, apcj, cleishm, pandamonial, iansrobinson? After quickly looking them up on Twitter, I think we’ve discovered some new, key players in the Neo4J space.


This is only an intro to Neo4J. There are plenty of things that we could have talked about here: I could have gone into much more detail about the Cypher query syntax, I could have added indexes to speed up query times, and I could have put together some even crazier Cypher queries that make use of the broader Cypher syntax. But this is a good start. I think that you’ll agree: by looking at your own Twitter social graph, you’ll immediately think of questions that you want to ask and you’ll get a better understanding of what possibilities are out there.

Want to learn more about Cypher? Well I might just be co-authoring a book on that very subject! Stay tuned.

Update – Crowdsourcing a Collection of Key Community Figures

Apparently some people are already using this post to search through their own communities of interest. Let’s help each other out. If you’re tracking a community, then comment below with the Twitter screen names of the key figures from the community. I’ll edit the comments later to coalesce clean lists.

Check out the Exaptive data application Studio. Technology agnostic. No glue code. Use what you know and rely on the community for what you don't. Try the community version.


Published at DZone with permission of John Berryman, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.


Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.


{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}