Over a million developers have joined DZone.

Get Real Data from the Semantic Web

· Big Data Zone

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

Semantic Web this, Semantic Web that, what actual use is the Semantic Web in the real world? I mean how can you actually use it?

If you haven't heard the term "Semantic Web" over the last couple of years then you must have been in... well somewhere without this interweb they're all talking about.

Basically, by using metadata (see RDF), disparate bits of data floating around the web can be joined up. In otherwords they stop being disparate. Better than that, theoretically you can query the connections between the data and get lots of lovely information back. This last bit is done via SPARQL, and yes, the QL does stand for Query Language.

I say theoretically because in reality it's a bit of a pain. I may be an intelligent agentcapable of finding linked bits of data through the web, but how exactly would you do that in python.

It is possible to use rdflib to find information, but it's very long winded. It's much easier to use SPARQLWrapper andin fact in the simple example below, I've used a SPARQLWrapperWrapper to make asking for lots of similarly sourced data, in this case DBPedia, even easier.

from SPARQLWrapper import SPARQLWrapper, JSON
 
class SparqlEndpoint(object):
 
    def __init__(self, endpoint, prefixes={}):
        self.sparql = SPARQLWrapper(endpoint)
        self.prefixes = {
            "dbpedia-owl": "http://dbpedia.org/ontology/",
            "owl": "http://www.w3.org/2002/07/owl#",
            "xsd": "http://www.w3.org/2001/XMLSchema#",
            "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
            "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
            "foaf": "http://xmlns.com/foaf/0.1/",
            "dc": "http://purl.org/dc/elements/1.1/",
            "dbpedia2": "http://dbpedia.org/property/",
            "dbpedia": "http://dbpedia.org/",
            "skos": "http://www.w3.org/2004/02/skos/core#",
            "foaf": "http://xmlns.com/foaf/0.1/",
        }
        self.prefixes.update(prefixes)
        self.sparql.setReturnFormat(JSON)
 
    def query(self, q):
        lines = ["PREFIX %s: <%s>" % (k, r) for k, r in self.prefixes.iteritems()]
        lines.extend(q.split("\n"))
        query = "\n".join(lines)
        print query
        self.sparql.setQuery(query)
        results = self.sparql.query().convert()
        return results["results"]["bindings"]
 
 
class DBpediaEndpoint(SparqlEndpoint):
    
    def __init__(self, prefixes = {}):
        endpoint = "http://dbpedia.org/sparql"
        super(DBpediaEndpoint, self).__init__(endpoint, prefixes)

To use this try importing the DBpediaEndpoint and feeding it some SPARQL:

#!/usr/bin/env python
 
import sys
from sparql import DBpediaEndpoint
    
def main ():
    s = DBpediaEndpoint()
    resource_uri = "http://dbpedia.org/resource/Foobar"
    
    results = s.query("""
        SELECT ?o
        WHERE { <%s> dbpedia-owl:abstract ?o .
        FILTER(langMatches(lang(?o), "EN")) }
    """ % resource_uri)
    abstract = results[0]["o"]["value"]
    print abstract
 
    
if __name__ == '__main__':
    try:
        main()
        sys.exit(0)
    except KeyboardInterrupt, e: # Ctrl-C
        raise e

Your homework is - How do you identify the resource_uri in the first place?

That's for another evening.

Learn how you can modernize your data warehouse with Apache Hadoop. View an on-demand webinar now. Brought to you in partnership with Hortonworks.

Topics:

Published at DZone with permission of Col Wilson, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}