DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • AI-Powered Knowledge Graphs
  • From Relational to Really Relational: The RDB2RDF Working Group
  • Useful System Table Queries in Relational Databases
  • Designing a Blog Application Using Document Databases

Trending

  • Developers Beware: Slopsquatting and Vibe Coding Can Increase Risk of AI-Powered Attacks
  • Using Python Libraries in Java
  • Intro to RAG: Foundations of Retrieval Augmented Generation, Part 2
  • SaaS in an Enterprise - An Implementation Roadmap
  1. DZone
  2. Data Engineering
  3. Databases
  4. RDF Storage: Apache Jena TDB

RDF Storage: Apache Jena TDB

By 
Jirapongse Phuriphanvichai user avatar
Jirapongse Phuriphanvichai
·
Apr. 08, 20 · Tutorial
Likes (8)
Comment
Save
Tweet
Share
10.2K Views

Join the DZone community and get the full member experience.

Join For Free

Overview

RDF Storage is a database used to store and query RDF data. RDF stands for Resource Description Framework, which is a standard model for describing and interchanging data on the web.

RDF storage contains collections of RDF statements, which are three-part statements known as triples. Each triple has a resource (subject), property (predicate), and property value (object).

 



0
 Advanced issue found
▲

 

Basic triple architecture

Basic triple architecture



For example, the following statement represents “Alphabet Inc has a CEO named Pichai Sundararajan.”.

Triple Statement

Triple Statement


Moreover, RDF statements can be linked together to create a graph. For instance:

Graph example

Graph example
Unlike a relational database, RDF storage is a type of graph database that stores RDF triple statements. 

There are many RDF storages available in the market, such as MarkLogic, Amazon Neptune, Virtuoso, and Apache Jena – TDB. This article only introduces Apache Jena – TDB, which is a free and open-source Java framework to manage RDF data.

Apache Jena TDB

TDB is a component of Apache Jena for RDF storage and query. Apache Jena is a free and open-source Java web framework that provides several APIs and components to process RDF data. The following picture shows the framework architecture of Apache Jena:



0
 Advanced issue found
▲

 

Apache Jena architecture

Apache Jena architecture

The core APIs of Apache Jena is the RDF API used to process RDF data and SPARQL API used to query RDF data.

SPARQL is an RDF query language able to retrieve and manipulate data stored in RDF format, including Apache Jena TDB. It is similar to SQL in a relational database. Typically, the SPARQL consists of two parts:

  • The SELECT clause identifies the variables (prefixed with ‘?’) to appear in the query results
  • The WHERE clause provides the triple pattern to match against the RDF data
SPARQL
xxxxxxxxxx
1
 
1
SELECT ?subject ?predicate ?object
2
WHERE {
3
  ?subject ?predicate ?object
4
}


For example, to query who is the CEO of Alphabet Inc. from the above example, the SPARQL looks like:

SPARQL
xxxxxxxxxx
1
 
1
SELECT ?NAME
2
WHERE {
3
  “Alphabet Inc.” hasCEO ?Name 
4
}


The value of the variable (?NAME) will be “Pichai Sundararajan”.

The SPARQL can be sent to the Apache Jena TDB through the TDB command line (bat\tdb2_tdbquery.bat) or the Apache Jena Fuseki component. Apache Jena Fuseki is a SPARQL server that can run as a Java web application (WAR file) and provide the web interface for users to send SPARQL queries to the Apache Jena TDB.

 



0
 Advanced issue found
▲

 

Apache Jena Fuseki

Apache Jena Fuseki

Use Case Scenario: Open PermID Entity Bulk Download

In this section, I will demonstrate how to load RDF data into the Apache Jena TDB in the real use case scenario. To do this, I will use as an example, Open PermID from Refinitiv. Open PermID is freely available at https://permid.org/.

PermID is a shortening of “Permanent Identifier”, which is a machine-readable number assigned to entities, securities, organizations (companies, government agencies, universities, etc.), quotes, individuals, and more. 

It is specifically designed for use by machines to reference related information programmatically. Open PermID also provides bulk files (one per entity type), containing the complete lists of the entities including organization, instrument, quote, asset class, currency, instrument code, and person entities. These files are updated weekly. The following picture represents relationships among entities. 

 



0
 Advanced issue found
▲

1

Open PermID

Open PermID

The following steps demonstrate how to load organization, industry, and quote files into the Apache Jena TDB. Apache Jena and Apache Jena Fuseki must be installed properly on the machine. To install Apache Jena and Apache Jane Fuseki, please refer to the Apache Jena website.

First download organization, industry, and quote files from the permid.org website. There are two types of files (ttl and ntriples). In this step, ntriples files will be used.

Entity Download Files

Entity Download Files

After downloading, decompress those files and rename the files’ extensions from ntriples to nt.

Next, run the following command in the Apache Jena directory to load those files to Apache Jena TDB. The location of the database is at c:\workspace\database.

Plain Text
xxxxxxxxxx
1
 
1
C:\workspace\apache-jena-3.14.0>bat\tdb2_tdbloader.bat --loc c:\workspace\database OpenPermID-bulk-organization-xxx.nt OpenPermID-bulk-industry-xxx.nt OpenPermID-bulk-quote-xxx.nt


 



0
 Advanced issue found
▲

 

tdb2_tdbloader.bat output

It may take more than four minutes to populate the database.

At this point, the database is ready and SPARQL can be used to query the database. For example, the following SPARQL lists active US companies in the Healthcare Providers and Services industry. 

SPARQL
x
19
 
1
prefix skos: <http://www.w3.org/2004/02/skos/core#>
2
prefix vcard: <http://www.w3.org/2006/vcard/ns#> 
3
prefix tr-org: <http://permid.org/ontology/organization/> 
4
prefix tr-fin: <http://permid.org/ontology/financial/>
5
 
6
7
SELECT ?orgPermID ?orgName ?exchangeCode ?ticker ?mic ?ric
8
WHERE {
9
  ?industry skos:prefLabel "Healthcare Providers & Services" .
10
  ?orgPermID tr-org:hasPrimaryIndustryGroup ?industry .
11
  ?orgPermID tr-org:isIncorporatedIn <http://sws.geonames.org/6252001/> .
12
  ?orgPermID tr-org:hasActivityStatus tr-org:statusActive .
13
  ?orgPermID vcard:organization-name ?orgName .
14
  ?orgPermID tr-fin:hasOrganizationPrimaryQuote ?orgQuote .
15
  OPTIONAL {?orgQuote tr-fin:hasExchangeCode ?exchangeCode .}
16
  OPTIONAL {?orgQuote tr-fin:hasExchangeTicker ?ticker .}
17
  OPTIONAL {?orgQuote tr-fin:hasMic ?mic .}
18
  OPTIONAL {?orgQuote tr-fin:hasRic ?ric .} 
19
}


The query can be run through the tdb2_tdbquery.bat command or Apache Jena Fuseki.

tdb2_tdbquery.bat output

tdb2_tdbquery.bat output


Apache Jena Fuseki

Apache Jena Fuseki

The results contain organizations’ PermIDs, organization names, exchange codes, tickers, Market Identifier Codes (MICs), and Reuters Instrument Codes (RICs).

Summary

Apache Jena TDB is a free RDF database used to store and query RDF data. RDF is a standard model for describing and interchanging data on the web. The underlying structure of RDF is a collection of triples, each consisting of a subject, a predicate, and an object. 

Triples can be linked together to create graphs. Therefore, RDF storage is a type of graph database. The database can be populated with the tdb2_tdbloader.bat command. It uses SPARQL as a query language. The query can be run through the tdb2_tdbquery.bat command or Apache Jena Fuseki.

References

  1. Apache Jena. n.d. Apache Jena. [online] Available at: <https://jena.apache.org/index.html> [Accessed 25 March 2020].
  2. Apache Jena. n.d. Apache Jena - Apache Jena Fuseki. [online] Available at: <https://jena.apache.org/documentation/fuseki2/index.html> [Accessed 25 March 2020].
  3. Apache Jena. n.d. Apache Jena - Jena Architecture Overview. [online] Available at: <https://jena.apache.org/about_jena/architecture.html> [Accessed 26 March 2020].
  4. Apache Jena. n.d. Apache Jena - TDB. [online] Available at: <https://jena.apache.org/documentation/tdb/index.html> [Accessed 25 March 2020].
  5. Cambridge Semantics. n.d. Learn RDF. [online] Available at: <https://www.cambridgesemantics.com/blog/semantic-university/learn-rdf/> [Accessed 25 March 2020].
  6. Open PermID. n.d. Permid. [online] Available at: <https://permid.org/> [Accessed 25 March 2020].
  7. W3C. 2014. RDF - Semantic Web Standards. [online] Available at: <https://www.w3.org/RDF/> [Accessed 25 March 2020].
  8. Db-engines.com. n.d. RDF Stores - DB-Engines Encyclopedia. [online] Available at: <https://db-engines.com/en/article/RDF+Stores> [Accessed 25 March 2020].
  9. Wikipedia, the free encyclopedia. 2020. SPARQL. [online] Available at: <https://en.wikipedia.org/wiki/SPARQL> [Accessed 25 March 2020].
  10. W3C. 2013. SPARQL 1.1 Overview. [online] Available at: <https://www.w3.org/TR/2013/REC-sparql11-overview-20130321/> [Accessed 25 March 2020].
  11. Wikipedia, the free encyclopedia. 2019. Triplestore. [online] Available at: <https://en.wikipedia.org/wiki/Triplestore> [Accessed 25 March 2020].
  12. Ontotext. n.d. What Is RDF? Making Data Triple Their Power. [online] Available at: <https://www.ontotext.com/knowledgehub/fundamentals/what-is-rdf/> [Accessed 25 March 2020].
Resource Description Framework Jena (framework) Relational database Database

Opinions expressed by DZone contributors are their own.

Related

  • AI-Powered Knowledge Graphs
  • From Relational to Really Relational: The RDB2RDF Working Group
  • Useful System Table Queries in Relational Databases
  • Designing a Blog Application Using Document Databases

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!