How to Use Gephi to Visualize Related Entries in Wikipedia
Join the DZone community and get the full member experience.
Join For Freethe content of this article was originally written by tony hirst on his blog, ouseful.info
sometime last week, @mediaczar tipped me off to a neat recipe on the wonderfully named drunks&lampposts blog, graphing the history of philosophy , that uses gephi to map an influence network in the world of philosophy. the data is based on the extraction of the “influencedby” relationship over philosophers referred to in wikipedia using the machine readable, structured data view of wikipedia that is dbpedia.
the recipe given hints at how to extract data from dbpedia, tidy it up and then import it into gephi… but there is a quicker way: the gephi semantic web import plugin . (if it’s not already installed, you can install this plugin via the tools -> plugins menu, then look in the available plugin .)
to get dbpedia data into gephi, we need to do three things:
- tell the importer where to find the data by giving it a url (the “driver” configuration setting);
- tell the importer what data we want to get back, by specifying what is
essentially a database query (the “request” configuration setting);
- tell gephi how to create the network we want to visualise from the
data returned from dbpedia (in the context of the “request”
configuration).
fortunately, we don’t have to work out how to do this from scratch – from the semantic web import configuration panel, configure the importer by setting the configuration to dbpediamovies .
hitting “set configuration” sets up the driver ( remote soap endpoint with endpoint url http://dbpedia.org/sparql):
and provides a dummy, sample query request:
we need to do some work creating our own query now, but not too much – we can use this dbpediamovies example and the query given on the drunks&lampposts blog as a starting point:
select * where { ?p a <http://dbpedia.org/ontology/philosopher> . ?p <http://dbpedia.org/ontology/influenced> ?influenced. }
this query essentially says: ‘give me all the pairs of people, (?p, ?influenced), where each person ?p is a philosopher, and each person ?influenced is influenced by ?p’.
we can replace the where part of the query in the semantic web importer with the where part of this query, but what graph do we want to put together in the construct part of the request?
the graph we are going to visualise will have nodes that are philosophers or the people who influenced them. the edges connecting the nodes will represent that one influenced the other, using a directed line (with an arrow) to show that a influenced b, for example.
the following construction should achieve this:
construct{ ?p <http://dbpedia.org/ontology/influenced> ?influenced. } where { ?p a <http://dbpedia.org/ontology/philosopher> . ?p <http://dbpedia.org/ontology/influenced> ?influenced. } limit 10000
(the limit argument limits the number of rows of data we’re going to get back. it’s often good practice to set this quite low when you’re trying out a new query!)
hit run and a graph should be imported:
if you click on the graph panel (in the main overview view of the gephi tool), you should see the graph:
if we run the pagerank or eigenvector centrality statistic, size the nodes according to that value, and lay out the graph using a force directed or fruchtermann-rheingold layout algorithm, we get something like this:
the nodes are labelled in a rather clumsy way – http://dbpedia.org/page/martin_heidegger – for example, but we can tidy this up. going to one of the dppedia pages, such as http://dbpedia.org/page/martin_heidegger , we find what else dbpedia knows about this person:
in particular, we see we can get hold of the name of the philosopher using the foaf:name property/relation. if you look back to the original dbpediamovies example, we can start to pick it apart. it looks as if there are a set of gephi properties we can use to create our network, including a “label” property. maybe this will help us label our nodes more clearly, using the actual name of a philosopher for example? you may also notice the declaration of a gephi “prefix”, which appears in various constructions (such as gephi:label ). hmmm.. maybe gephi:label is to prefix gephi:<http://gephi.org/> as foaf:name is to something? if we do a web search for the phrase foaf:name prefix , we turn up several results that contain the phrase prefix foaf:<http://xmlns.com/foaf/0.1/>, so maybe we need one of those to get the foaf:name out of dbpedia….?
but how do we get it out? we’ve already seen that we can get the name of a person who was influenced by a philosopher by asking for results where this relation holds: ?p <http://dbpedia.org/ontology/influenced> ?influenced. so it follows we can get the name of a philosopher (?pname) by asking for the foaf:name in the wheer part of the query:
?p <foaf:name> ?pname.
and then using this name as a label in the construction:
?p gephi:label ?pname.
we can also do a similar exercise for the person who is influenced.
looking through the dbpedia record, i notice that as well as an influenced relation, there is an influencedby relation (i think this is the one that was actually used in the drunks&lampposts blog?). so let’s use that in this final version of the query:
prefix gephi:<http://gephi.org/> prefix foaf: <http://xmlns.com/foaf/0.1/> construct{ ?philosopher gephi:label ?philosophername . ?influence gephi:label ?influencename . ?philosopher <http://dbpedia.org/ontology/influencedby> ?influence } where { ?philosopher a <http://dbpedia.org/ontology/philosopher> . ?philosopher <http://dbpedia.org/ontology/influencedby> ?influence. ?philosopher foaf:name ?philosophername. ?influence foaf:name ?influencename. }
if you’ve already run a query to load in a graph, if you run this query it may appear on top of the previous one, so it’s best to clear the workspace first. at the bottom right of the screen is a list of workspaces – click on the rdf request graph label to pop up a list of workspaces, and close the rdf request graph one by clicking on the x.
now run the query into a newly launched, pristine workspace, and play with the graph to your heart’s content…:-) [i'll maybe post more on this later - in the meantime, if you're new to gephi, here are some gephi tutorials ]
here’s what i get sizing nodes and labels by pagerank, and laying out the graph by using a combination of force atlas2, expansion and label adjust (to stop labels overlapping) layout tools:
using the ego network filter, we can then focus on the immediate influence network (influencers and influenced) of an individual philosopher:
what this recipe hopefully shows is how you can directly load data from dbpedia into gephi. the two tricks you need to learn to do this for other data sets are:
1) figuring out how to get data out of dbpedia (the where part of the request);
2) figuring out how to get that data into shape for gephi (the construct part of the request).
if you come up with any other interesting graphs, please post request fragments in the comments below:-)
[see also: graphing every* idea in history ]
ps via @sciencebase ( mapping research on wikipedia with wikimaps ), there’s this related tool: wikimaps , on online (and desktop?) tool for visualising various wikipedia powered graphs, such as, erm, justin bieber’s network…
any other related tools out there for constructing and visualising wikipedia powered network maps? please add a link via the comments if you know of any…
pps for a generalisation of this approach, and a recipe for finding other dbpedia networks to map, see mapping how programming languages influenced each other according to wikipedia .
Published at DZone with permission of Eric Genesky. See the original article here.
Opinions expressed by DZone contributors are their own.
Trending
-
Deploying Smart Contract on Ethereum Blockchain
-
A Complete Guide to Agile Software Development
-
Reactive Programming
-
Decoding eBPF Observability: How eBPF Transforms Observability as We Know It
Comments