The web isn't tabular, and plenty of data isn't tabular, either -- but plenty of databases are, and anyway tables handle plenty of data better than any other structure can.
Semanticizing all data, then, can be done in two stages. First, construct a web of meanings, not documents -- as Sir Tim Berners-Lee has always wanted, and as RDF seeks to do.
And, second, fit all (legitimately or illegitimately) tabular data into the web.
This second step is less exciting than the first -- because plenty of tabular data is not ideally tabular, so in these cases the second step is rather backward-looking.
But it is no less necessary than the first: not only because converting everything RDBMS to RDF is not even close to worth it, and, but also because much data ought not be converted to RDF.
But that data still needs to talk to the web -- which means it needs to be translated into a webby structure, ideally RDF.
The easiest way to translate without conversion is, of course, just plain mapping. But mapping two rather different structures to one another is no trivial task.
That's why there's a whole W3C Working Group devoted to devising a mapping language and actual mapping of relational data to RDF.
Sir Tim offers this unsurprisingly lucid insight into the RDF-RDBMS relation, cutting through questions that might otherwise be couched in domain-inappropriate terms (like 'is the RDF model an entity-relationship model'):
Relational database systems manage RDF data, but in a specialized way. In a table, there are many records with the same set of properties. An individual cell (which corresponds to an RDF property) is not often thought of on its own. SQL queries can join tables and extract data from tables, and the result is generally a table. So, the practical use for which RDB software is used typically optimized for doing operations with a small number of tables some of which may have a large number of elements.
Because relational databases are species of the genus described by RDF, the basic mapping model is as follows:
- a record is an RDF node;
- the field (column) name is RDF propertyType;
- and the record field (table cell) is a value.
So far, so straightforward. But of course the implementations usually wander pretty far from the original concept -- so mapping actual RDBMS to RDF takes a bit of dirty work.
Well, the RDB2RDF WG is doing the dirty work.
Back in 2005, when the Group was still an Incubator, they published a detailed survey of then-current approaches to mapping relational databases to RDF. (The resulting article is still a useful read, with an excellent bibliography, including domain-specific projects (which always interest me a bit more than the content-free versions).
This survey served as the starting-point for typically extensive discussion and debate, which culminated in two Candidate Recommendations published last month.
(If you read only one document, read the first, which includes tons of simple examples.)
This is an exciting time for data on the web, with all the semantic standards and database-centered HTML5 APIs -- and a W3C Candidate Recommendation means Call for Implementations -- so let's absorb the recs and start pulling relational data into the semantic web!