From Relational to Really Relational: The RDB2RDF Working Group
Join the DZone community and get the full member experience.Join For Free
While a lot of databases have been created listing information in a table format, the web isn't set up in a tabular style. Neither is plenty of data in a variety of formats that the web uses. However, many databases are still using tables, because many web developers feel that tables handle plenty of data better than any other structure.
Others feel that the data tables known as RDB should be converted to RDF, a format used to gather an even wider array of metadata across the worldwide web. The ability to convert to RDF will be extremely beneficial as technology advances to Artificial Intelligence AI and beyond.
What is an RDB?
RDB stands for a Relational Database. An RDB is a collective set of multiple data sets organized by tables, columns, or records. An RDB establishes a well-defined relationship between database tables. The tables communicate to share information that makes it possible to search for data, organize, and report.
RDB is derived from the mathematical function concept of mapping data sets as developed by Edgar R. Codd. RDBs use Structured Query Language, SQL.SQL is a standard user application that provides an easy programming interface for database interaction.
RDBs organize data with each table known as a relation which contains columns. Each table row, or record, contains a unique data instance defined for a corresponding column category. The data and record characteristics relate to at least one record to form functional dependencies.
RDB performs "select", "project" and "join" database operations, where select is used for data retrieval, identifying data attributes, and combining relations. Those who prefer to use RDBs do so because of the advantages, including easy extensibility or scalability, new technology performance, and data security.
What is RDF?
RDF is primarily used to provide information or metadata for data available on the Internet. RDF provides the methodology for specifying, structuring, and transferring metadata, and provides the basic XML syntax for software applications to exchange or use that information. The URI/URL provides the location of that data.
RDF stands for Resource Description Framework and is a standard for describing web resources and data interchange, developed and standardized by the World Wide Web Consortium, W3C with Xtensible Markup Language (XML) and Uniform Resource Identifier (URI) serving as its distribution standards.
Typically, RDF provides basic information and attributes about an Internet-based object, such as the name of the author, Web page keywords, object creation or editing data, or the sitemap.
While there are many conventional tools for dealing with data and more specifically for dealing with the relationships between data, RDF is the easiest, most expressive, and most powerful standard to date. The overall informational value is much greater because context or intent can be inferred.
RDF presents small chunks of information in a form that infers meaning. This can include rules about how the data should be interpreted.
Resource Description Framework, RDF, is the standard for encoding metadata and other structured information on the Semantic Web.
With all the semantic standards and database-centered HTML5 APIs and a W3C standard that calls for implementations, this is an exciting time for data on the web. It's time to embrace RDF with the capacity to start pulling relational data into the semantic web!
The Purpose of RDBMS
The software used to store, manage, query, and retrieve data stored in a relational database is called a Relational Database Management System, or RDBMS. The RDBMS provides an interface between users and applications with the database. It also provides administrative functions for managing data storage, performance, and access.
Semanticization, or giving meaning to, all data can be done in two stages.
- First, construct a web of meanings, not documents -- as Sir Tim Berners-Lee has always wanted, and as the RDF, Resource Description Framework seeks to do.
- Second, fit all tabular data into the web whether legitimately or not.
This second step is less exciting than the first because plenty of tabular data is not ideally tabular. In these cases, the second step is rather backward-looking. However, it is no less necessary than the first for two reasons:
- Converting everything RDBMS to RDF is not even close to worth it
- Much data ought not to be converted to RDF
All of this data still needs to talk to the web, which means it needs to be translated into a webby structure, ideally RDF. The easiest way to translate without conversion is, of course, just plain mapping. But mapping two rather different structures to one another is no small undertaking or trivial task.
That's why there's a whole W3C Working Group devoted to devising a mapping language and actual mapping of relational data to RDF.
Sir Tim offers this insight into the RDF-RDBMS relation, cutting through questions that might otherwise be couched in domain-inappropriate terms (like 'is the RDF model an entity-relationship model'):
Relational database systems manage RDF data, but in a specialized way. In a table, there are many records with the same set of properties. An individual cell (which corresponds to an RDF property) is not often thought of on its own. SQL queries can join tables and extract data from tables, and the result is generally a table. So, the practical use for which RDB software is used is typically optimized for doing operations with a small number of tables, some of which may have a large number of elements.
Because relational databases are species of the genus described by RDF, the basic mapping model is as follows:
a record is an RDF node;
the field (column) name is RDF propertyType;
and the record field (table cell) is a value.
So far, so straightforward. Of course, the implementations usually wander pretty far from the original concept. That's why mapping actual RDBMS to RDF takes a bit of dirty work.
Enter RDB2RDF. The RDB2RDF WG is doing the dirty work.
Back in 2005, when the Group was still an Incubator, they published a detailed survey of then-current approaches to mapping relational databases to RDF.
This survey served as the starting point for typically extensive discussion and debate, which culminated in two Candidate Recommendations:
Many techniques and tools have been proposed to enable the publication of relational data on the web in RDF. RDB-to-RDF methods are one of the keys to populating the web of data by unlocking the huge amount of data stored in relational databases.
Since producing RDF data with sufficiently rich semantics is often important in order to make the data usable, interoperable and linkable, there are various strategies developed to enrich data semantics.
Turning RDB to RDF has proven to be of value when dealing with SQL databases. It offers a straightforward and practical system for relational database conversion into RDF.
RDB2RDF and the Future
Moving forward beyond RDB-to-RDF methods, it will become necessary to find a compromise between the expressiveness of RDB to RDF mapping languages and the need for updating relational data using protocols of the semantic web. Creating, updating, and deleting RDF data should only be made possible in a secure, reliable, trustworthy, and scalable way.
Opinions expressed by DZone contributors are their own.