Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

An XQuery Module For Simplifying Semantic Namespaces

DZone's Guide to

An XQuery Module For Simplifying Semantic Namespaces

Learn how to use the Turtle syntax in the context of the MarkLogic database for remembering semantic namespaces and simplifying how you write them.

· Database Zone
Free Resource

What if you could learn how to use MongoDB directly from the experts, on your schedule, for free? We've put together the ultimate guide for learning MongoDBSign up and you'll receive instructions for how to get started!

While I enjoy working with the MarkLogic 8 server, there are a number of features about the semantics library there that I still find a bit problematic. Declaring namespaces for semantics in particular is a pain—I normally have trouble remembering the namespaces for RDF or RDFS or OWL, even after working with them for several years, and once you start talking about namespaces that are specific to your own application domain, managing this list can get onerous pretty quickly.

I should point out however, that namespaces within semantics can be very useful in helping to organize and design an ontology, even a non-semantic ontology, and as such, my applications tend to be namespace rich. However, when working with Turtle, Sparql, RDFa, and other formats of namespaces, the need to incorporate these namespaces can be a real showstopper for any developer. Thus, like any good developer, I decided to automate my pain points and create a library that would allow me to simplify this process.

The code given here is in turtle and xquery, but I hope to build out similar libraries for use in JavaScript shortly. When I do, I'll update this article to reflect those changes.

What MarkLogic Supports

Before going into my own code, I wanted to touch on what MarkLogic supports. There are a few key functions that are especially useful to the budding ontologist - sem:curie-expand() and sem:curie-shorten(). To understand their utility, it's first necessary to know what exactly a curie is.

A namespace usually has an extended form, or IRI that consists of two parts: a term, such as "type" and a qualifying namespace "http://www.w3.org/1999/02/22-rdf-syntax-ns#". This namespace is usually intended to be unique, somewhat descriptive (though not necessarily very descriptive) and can be thought of as identifying a specific vocabulary. In this case the namespace describes the vocabulary originally associated with RDF. When you combine a namespace and a term within that namespace as a string, you have what is known as a qualified name:

<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>

The angle brackets here have nothing to do with XML. They are simply used by RDF to indicate that this particular string is in fact a qualified name, as well as an international resource identifier (or IRI, the natural successor of the Uniform Resource Identifier - URI and Uniform Resource Locator - URL). For now, let's talk IRIs and assume that they are more or less synonymous with qualified names. 

These long names, while generally unique, are also a pain in the butt to write and remember. Consequently, in situations where you have the ability to control specific prefixes (such as within an organization) it makes sense to represent the namespace string with a prefix, e.g.,

rdf:type

Not only is this easier on the eyes, but it drives home that you have a vocabulary and a term within that vocabulary in ways that the longer strings don't. Now, people working with RDF, Turtle, SPARQL, or OWL end up working with a number of namespaces, and as such, have the biggest need to turn them into short forms, known as compact URI entities, or CURIES (or curies for those who don't need all the caps-locked).

Marklogic has the ability to expand or collapse curies to or from IRIs via two functions, sem:curie-expand() and sem:curie-shorten(). For common namespaces, you can use a single argument for sem:curie-expand() to turn the curie to its associated IRI:

sem:curie-expand("rdf:type")
=>sem:iri("http://www.w3.org/1999/02/22-rdf-syntax-ns#type")

Similarly, you can reduce the fully qualified IRI to the reduced curie-form:

sem:curie-shorten(sem:iri("http://www.w3.org/1999/02/22-rdf-syntax-ns#type"))
=>rdf:type

The problem arises when you end up with namespaces that aren't in the default list. In this case, you have to use an XQuery map (a kind of a hash entity) to create the associations, with the prefix as the key and the namespace as the value:

let $map := map:new((
map:entry("core","http://semanticalllc.com/ns/core#"),
map:entry("rdf","http://www.w3.org/1999/02/22-rdf-syntax-ns#")
))
return sem:curie-expand("core:gender",$map)
=> sem:iri("http://semanticalllc.com/ns/core#gender")

The double quotes used for the map:new() function are given because you are passing a sequence (or list) of map:entry() objects, each of which has one key and one value. Note also that the output is itself a function: sem:iri("..."). This is how MarkLogic indicates that this is a sem:iri object, rather than simply a string. This distinction can be important, because it's tied into how information is indexed.

There are a couple of problems with this map. First, the ability to expand and contract curies comes very in handy when you are either passing arguments into SPARQL statements or when you are creating sem:triples, and the maps and the associated baggage of redefining them at various points can get to be complicated. Ideally, MarkLogic would have a way of predefining these in a control panel for a web application in the Admin section, but this does not yet exist. There are also metadata properties that have a certain degree of utility for such namespaces that capturing in a map can prove difficult (in part because it makes writing retrieval code more complex). Finally, doing something like creating a triple in XQuery can get to be very verbose:

let $namespace-map := map:new((
map:entry("core","http://semanticalllc.com/ns/core#"),
map:entry("gender","http://semanticalllc.com/ns/gender#"),
map:entry("person","http://semanticalllc.com/ns/person#"),
map:entry("rdf","http://www.w3.org/1999/02/22-rdf-syntax-ns#")
))
return sem:triple(sem:resolve-iri("jane_doe",map:get($namespace-map,"person")),
           sem:curie-expand("core:gender",$namespace-map),
           sem:curie-expand("gender:Female",$namespace-map)
           )
=> sem:triple(sem:iri("http://semanticalllc.com/ns/person#jane_doe"),
              sem:iri("http://semanticalllc.com/ns/core#gender"),
              sem:iri("http://semanticalllc.com/ns/gender#Female"))

There's a seemingly unrelated problem. When working with either Sparql or Turtle, in order to use curie notation you need to declare the namespace prefixes ahead of time. The problem is that historically, Turtle and SPARQL evolved somewhat independently, and there's a syntactical difference in the way that the two specs do this declaration. For Turtle, the declaration just for the properties given above looks like the following:

@PREFIX core: <http://semanticalllc.com/ns/core#>.
@PREFIX gender: <http://semanticalllc.com/ns/gender#>.
@PREFIX person: <http://semanticalllc.com/ns/person#>.
@PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.

Meanwhile, the same declaration in SPARQL looks similar, but not quite identical:

PREFIX core: <http://semanticalllc.com/ns/core#>
PREFIX gender: <http://semanticalllc.com/ns/gender#>
PREFIX person: <http://semanticalllc.com/ns/person#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

This means that not only do you need to worry about keeping namespaces in sync between Turtle and SPARQL, but you also have to handle these notational differences. 

After a few weeks of banging my head against this limitation, I decided that it was time to automate my pain points.

Introducing the ns: Object

I've tried a number of different approaches towards creating a module that would give me the capability to both better manage curies and to simplify the process of writing declaration preambles: global objects, maps, XML structures, and so forth. But ironically, I found that the best solution seemed to be to make use of RDF and Sparql to actually store the source. To that end, I created a specific Turtle file that defined namespace class instances:

@prefix ns: <http://semanticalllc.com/ns/namespace#>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix xs: <http://www.w3.org/2001/XMLSchema>.
@prefix cts: <http://marklogic.com/cts>.
@prefix term: <http://semanticalllc.com/ns/canonical/term/>.
@prefix class: <http://semanticalllc.com/ns/canonical/class/>.
@prefix graph: <http://semanticalllc.com/ns/canonical/graph/>.
@prefix scheme: <http://semanticalllc.com/ns/canonical/scheme/>.
@prefix skos: <http://semanticalllc.com/ns/canonical/scheme/>.
@prefix skosx: <http://semanticalllc.com/ns/canonical/skosx/>.
@prefix skosxl: <http://www.w3.org/2008/05/skos-xl#>.
@prefix core: <http://semanticalllc.com/ns/canonical/>.
@prefix membership: <http://semanticalllc.com/ns/canonical/membership/>.
@prefix individual: <http://semanticalllc.com/ns/canonical/individual/>.
@prefix document: <http://semanticalllc.com/ns/canonical/document/>.
@prefix metadata: <http://semanticalllc.com/ns/canonical/metadata/>.

ns:skos
    rdf:type class:Namespace;
    rdfs:label "SKOS";
    ns:prefix "skos";
    ns:namespace "http://www.w3.org/2004/02/skos/core#";
skos:description "This namespace identifies concepts from the Semantic Knowledge Organization System, used primarily for organizing classification hierarchies.";
    .

ns:skosx
    rdfs:label "SKOS Application Extension";
    rdf:type class:Namespace;
    ns:prefix "skosx";
    ns:namespace "http://semanticalllc.com/ns/canonical/skosx/";
skos:description "This namespace is a local extension to skos for certain system defined properties.";
    .

ns:skosxl
    rdfs:label "SKOS Extended Label Extension";
    rdf:type class:Namespace;
    ns:prefix "skosxl";
    ns:namespace "http://www.w3.org/2008/05/skos-xl#";
skos:description "This namespace describes the properties of the SKOS language extension.";
    .

ns:term
    rdfs:label "Term";
    rdf:type class:Namespace;
    ns:prefix "term";
    ns:namespace "http://semanticalllc.com/ns/canonical/term/";
skos:description "This namespace indicates SKOS-XL term identifiers, used to indirectly reference term strings.";
    .


ns:concept
    rdfs:label "Concept";
    rdf:type class:Namespace;
    ns:prefix "concept";
    ns:namespace "http://semanticalllc.com/ns/canonical/concept/";
skos:description "This namespace identifies controlled vocabulary content.";
    .

ns:scheme
    rdfs:label "Scheme";
    rdf:type class:Namespace;
    ns:prefix "scheme";
    ns:namespace "http://semanticalllc.com/ns/canonical/scheme/";
skos:description "A scheme is a collection of related vocabularies.";
    .

ns:ns
    rdf:type class:Namespace;
    rdfs:label "Namespace Object";
    ns:prefix "ns";
    ns:namespace "http://semanticalllc.com/ns/canonical/namespace/";
skos:description "This is used to identify the namespace of the ns: xquery library.";
    .

ns:class
    rdf:type class:Namespace;
    rdfs:label "Class";
    ns:prefix "class";
    ns:namespace "http://semanticalllc.com/ns/canonical/class/";
skos:description "This namespace identifies classes, which inherit from the Owl:Class object.";
    .

ns:graph
    rdf:type class:Namespace;
    rdfs:label "Graph";
    ns:prefix "graph";
    ns:namespace "http://semanticalllc.com/ns/canonical/graph/";
skos:description "This identifies the namespace of RDF graphs.";
    .

ns:rdf
    rdf:type class:Namespace;
    rdfs:label "RDF";
    ns:prefix "rdf";
    ns:namespace "http://www.w3.org/1999/02/22-rdf-syntax-ns#";
skos:description "This is the default RDF namespace.";
    .

ns:rdfs
    rdf:type class:Namespace;
    rdfs:label "RDF-Schema";
    ns:prefix "rdfs";
    ns:namespace "http://www.w3.org/2000/01/rdf-schema#";
skos:description "This namespace for RDF Schema.";
    .

ns:owl
    rdf:type class:Namespace;
    rdfs:label "Web Ontology Language";
    ns:prefix "owl";
    ns:namespace "http://www.w3.org/2002/07/owl#";
skos:description "This namespace identifies the core concepts of the Web Ontology Language.";
    .

ns:xs
    rdf:type class:Namespace;
    rdfs:label "XML Schema Definition Language";
    ns:prefix "xs";
    ns:namespace "http://www.w3.org/2001/XMLSchema#";
skos:description "This namespace identifies base types used in the XML Schema specification.";
    .

ns:cts
    rdf:type class:Namespace;
    rdfs:label "Marklogic CTS";
    ns:prefix "cts";
    ns:namespace "http://marklogic.com/cts#";
skos:description "This identifies function in the MarkLogic CTS search namespace.";
    .

ns:fn
    rdf:type class:Namespace;
    rdfs:label "W3C Functions";
    ns:prefix "fn";
    ns:namespace "http://www.w3.org/2005/xpath-functions#";
skos:description "This identifies the namespace for W3C XPath functions. Used primary for SPARQL queries.";
    .

ns:xdmp
    rdf:type class:Namespace;
    rdfs:label "Marklogic XDMP";
    ns:prefix "xdmp";
    ns:namespace "http://marklogic.com/xdmp#";
skos:description "This identifies the namespace for the bulk of the MarkLogic xdmp functions. Used primary for SPARQL queries.";
    .

ns:document
    rdf:type class:Namespace;
    rdfs:label "Document";
    ns:prefix "document";
    ns:namespace "http://semanticalllc.com/ns/canonical/document/";    
skos:description "This namespace identifies narrative documents stored as XML within the MarkLogic database.";
    .    

ns:person
    rdf:type class:Namespace;
    rdfs:label "Person";
    ns:prefix "person";
    ns:namespace "http://semanticalllc.com/ns/canonical/person/";    
skos:description "This namespace identifies people entities within the model";
.

This Turtle file would then be stored in the database as "/models/namespaces.ttl"

This approach had a couple of big benefits. The namespaces were dynamically accessed, so would automatically recognize any updates to the namespace when the turtle file was updated, without deling with the problems of clearing caching (though caches could be written against the results). The same information was available to other sparql queries, which could use them for lookups or building user interfaces. 

Note that this could also be written as a SPARQL Insert statement by changing the preamble to use the SPARQL form instead. It would look something like this (just showing part):

prefix ns: <http://semanticalllc.com/ns/namespace#>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix xs: <http://www.w3.org/2001/XMLSchema>
prefix cts: <http://marklogic.com/cts>
prefix term: <http://semanticalllc.com/ns/canonical/term/>
prefix class: <http://semanticalllc.com/ns/canonical/class/>
prefix graph: <http://semanticalllc.com/ns/canonical/graph/>
prefix scheme: <http://semanticalllc.com/ns/canonical/scheme/>
prefix skos: <http://www.w3.org/2004/02/skos/core#>
prefix skosx: <http://semanticalllc.com/ns/canonical/skosx/>
prefix skosxl: <http://www.w3.org/2008/05/skos-xl#>
prefix core: <http://semanticalllc.com/ns/canonical/>
prefix individual: <http://semanticalllc.com/ns/canonical/individual/>
prefix document: <http://semanticalllc.com/ns/canonical/document/>

with graph:namespaces
insert data {
ns:skos
    rdf:type class:Namespace;
    rdfs:label "SKOS";
    ns:prefix "skos";
    ns:namespace "http://www.w3.org/2004/02/skos/core#";
skos:description "This namespace identifies concepts from the Semantic Knowledge Organization System, used primarily for organizing classification hierarchies.";
    .

ns:skosx
    rdfs:label "SKOS Application Extension";
    rdf:type class:Namespace;
    ns:prefix "skosx";
    ns:namespace "http://semanticalllc.com/ns/canonical/skosx/";
skos:description "This namespace is a local extension to skos for certain system defined properties.";
    .

  # More namespace definitions go here
  }

The preamble for ns.xqy (at /lib/ns.xqy in the modules database) is given as:

xquery version "1.0-ml";
module namespace ns = "http://semanticalllc.com/ns/namespace#";
import module namespace sem = "http://marklogic.com/semantics" at "/MarkLogic/semantics.xqy";

Loading Namespaces

The first ns: function, ns:load(),  parses the turtle file as triples and puts these triples into the RDF graph graph:namespaces, clearing the prior contents beforehand. This ensures that there's only one active set of namespaces defined at any given time.

(: Provides default path to the namespaces turtle file :)
declare variable $ns:source-path := "/models/namespaces.ttl";

(: This parses the namespaces file and puts into the triple store :)
declare function ns:load($src as xs:string?){
(: If no source is provided, use the default source :)    
    let $src := if ($src) then $src else $ns:source-path
(: Parse the triples :)    
    let $triples := sem:rdf-parse(fn:unparsed-text($src),"turtle")
(: declare the namespace graph IRI :)    
    let $graph-ns := sem:iri("http://semanticalllc.com/ns/core/graph/namespace")
    return (
(: clear the old namespace graph :)    
    sem:graph-delete($graph-ns),    
(: insert the triples into the new namespace graph :)    
    sem:graph-insert($graph-ns,$triples),
(: with the newly formed triples, create a session field called "ns:prefix-map" :)
    xdmp:set-server-field("ns:prefix-map",ns:prefix-map())
)
    };

(: Function signature to call ns:load() with no arguments :)
declare function ns:load(){
    ns:load($ns:source-path)
    };


declare function ns:reload(){
  xdmp:eval('
  import module namespace ns="http://semanticalllc.com/ns/namespace#" at "/lib/ns.xqy";
  xdmp:set-server-field("ns:prefix-map",());
  import module namespace ns="http://semanticalllc.com/ns/namespace#" at "/lib/ns.xqy";
  ns:load()')
  };    

You could delete and insert triples directly through SPARQL update, but graph:delete() and graph:insert() are generally faster.

The reload function combines two operations - clearing the server field named "ns:prefix-map" which stores the working prefix map operation and reloading the namespace mapped. It is used primarily to force a refresh of the map.  

Serializing Namespaces

Once loaded, the triples can be queried via SPARQL to create a number of different configurations, depending upon need (this is one of the reasons this approach is so powerful). The ns:serialize() method provides this serialization, converting various prefix/namespace combinations from the source to generate forms appropriate to a number of different applications.

declare function ns:serialize($format as xs:string,$prefixes as xs:string*) as item()* {
    let $prefix-map := xdmp:get-server-field("ns:prefix-map")
    return
      if ($prefix-map instance of map:map) then
          let $prefixes := if (fn:empty($prefixes)) then map:keys($prefix-map) else $prefixes
          return
          switch($format)
          case "turtle" return
             fn:string-join(
               for $key in $prefixes order by $key return
                 ("@prefix "||$key||": <"||map:get($prefix-map,$key)||">."),
               "&#13;")
          case "sparql" return
             fn:string-join(
               for $key in $prefixes order by $key return
                 ("prefix "||$key||": <"||map:get($prefix-map,$key)||">"),
               "&#13;")
          case "prefix-map" return 
              $prefix-map
          case "namespace-map" return
              map:new(for $key in map:keys($prefix-map) return map:entry(map:get($prefix-map,$key),$key))
          case "xquery" return
              fn:string-join(for $key in $prefixes return
               ('declare namespace '||$key||' = "'||map:get($prefix-map,$key)||'";'),
                  "&#13;")
          case "rdfa" return       
            fn:string-join(for $key in $prefixes return 
              $key ||": "||map:get($prefix-map,$key)," ")
          case "xmlns" return
            (for $key in $prefixes return
               namespace {$key} {map:get($prefix-map,$key)})
          case "xmlns-str" return
            fn:string-join(
              for $key in $prefixes return 'xmlns:'||$key||'="'||map:get($prefix-map,$key)||'"',"&#13;"
              )
          default return fn:error(xs:QName("ns:ERR_UNKN_SERIAL_FORMAT"),"Unknown serialization format")          
      else
    let $preface := "prefix ns: <http://semanticalllc.com/ns/namespace#>
    prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    prefix class: <http://semanticalllc.com/ns/core/class/>
    prefix graph: <http://semanticalllc.com/ns/core/graph/>
    "
    let $query := $preface || "select ?prefix ?namespace from graph:namespace where {
         ?ns rdf:type class:Namespace.
         ?ns ns:prefix ?prefix.
         ?ns ns:namespace ?namespace.
    } order by ?prefix"
    let $ns-maps := sem:sparql($query,())
    return
if ($format = "prefix-map") then
      let $prefix-map := xdmp:get-server-field("ns:prefix-map")
      return if ($prefix-map instance of map:map) then
        $prefix-map else
      let $prefix-map := 
      map:new(for $ns-map in $ns-maps return 
        map:entry(map:get($ns-map,"prefix"),map:get($ns-map,"namespace")))
      return ($prefix-map,xdmp:set-server-field("ns:prefix-map",$prefix-map))[last()]
   else fn:error(xs:QName("ns:ERR_NAMESPACES_NOT_LOADED"),"ns:load() must be called before invoking the '"||$format||"' serialization.")
};


declare function ns:sparql($prefixes as xs:string*) as xs:string? {
   xs:string(ns:serialize("sparql",$prefixes))
   };

declare function ns:sparql() as xs:string?{
    xs:string(ns:serialize("sparql",()))
    };

declare function ns:xquery($prefixes as xs:string*) as xs:string? {
   xs:string(ns:serialize("xquery",$prefixes))
   };

declare function ns:xquery() as xs:string? {
   xs:string(ns:serialize("xquery",()))
   };

declare function ns:turtle($prefixes as xs:string) as xs:string? {
   xs:string(ns:serialize("turtle",$prefixes))
   };

declare function ns:turtle() as xs:string? {
   xs:string(ns:serialize("turtle",()))
   };

declare function ns:namespace-map() as map:map {
   ns:serialize("namespace-map",())
   };

declare function ns:prefix-map() as map:map* {
   ns:serialize("prefix-map",())
   };

declare function ns:rdfa($prefixes as xs:string*) as xs:string {
   xs:string(ns:serialize("rdfa",$prefixes))
   };

declare function ns:rdfa() as xs:string {
   xs:string(ns:serialize("rdfa",()))
   };


declare function ns:xmlns($prefixes as xs:string*) as node()* {
   ns:serialize("xmlns",$prefixes)
   };

declare function ns:xmlns() as node()* {
   ns:serialize("xmlns",())
   };


declare function ns:xmlns-str($prefixes as xs:string*) as xs:string {
   xs:string(ns:serialize("xmlns-str",$prefixes))
   };

declare function ns:xmlns-str() as xs:string {
   xs:string(ns:serialize("xmlns-str",()))
   };

This is a fairly complex function, but ultimately starts with the same functional root - calling a SPARQL function to retrieve a sequence of maps that have prefix/namespace pairs.

select ?prefix ?namespace from graph:namespace where {
         ?ns rdf:type class:Namespace.
         ?ns ns:prefix ?prefix.
         ?ns ns:namespace ?namespace.
    } order by ?prefix

Here, the ?ns variable "floats" - it's not bound. This will then retrieve all objects of type class:Namespace, retrieves their prefixes and namespace strings, then sorts them by prefix.

The ns:serialize() function takes a single string which identifies the type of output expected, such as ns:serialize("sparql") returning output for a SPARQL query. These are shadowed by direct functions that turn these strings into API calls: (i.e., ns:serialize("sparql") becomes ns:sparql()). This makes their use easier with development platforms such as Oxygen.

Table 1 shows the breakdown of these functions, what they do, and what they produce as output:

Function Description Output
ns:sparql()
ns:serialize("sparql")
This serializes to the SPARQL and
SPARQL UPDATE formats

prefix p1: <http://domain.com/ns/p1/>
prefix p2: <http://domain.com/ns/p2/>

ns:turtle()
ns:serialize("turtle")

This serializes to the Turtle format.

@prefix p1: <http://domain.com/ns/p1/>.
@prefix p2: <http://domain.com/ns/p2/>.

ns:xquery()
ns:serialize("xquery")

This generates XQuery namespace
declarations, mostly useful when
dealing with xdmp:eval() statements.

declare namespace p1 = "http://domain.com/ns/p1/";
declare namespace p2 = "http://domain.com/ns/p2/";
ns:rdfa()
ns:serialize("rdfa")
This creates the space delimited prefix/namespace pairs used by rdfa and certain xsd declarations as a string. p1 http://www.domain.com/ns/p1/
p2 http://www.domain.com/ns/p2/
ns:prefix-map()
ns:serialize("prefix-map")
This creates a map with prefixes as keys and namespaces as values. Note that this is the format that sem:curie-expand() and sem:curie-shrink() utilize. 

map {
"p1":"http://domain.com/ns/p1",
 "p2":"http://domain.com/ns/p2"
}    

ns:namespace-map()
ns:serialize("namespace-map")
This creates a map with namespaces as keys and prefixes as values. This is essentially the reverse lookup for the prefix map. 

map {
"http://domain.com/ns/p1":"p1",
"http://domain.com/ns/p2":"p2"
}

ns:xmlns()
ns:serialize("xmlns")
This creates a sequence of namespace nodes for use as constructores for an XML element in XQuery (see below). (namespace {"p1"} {"http://domain.com/ns/pt1"}, ...)
ns:xmlns-str()
ns:serialize("xmlns-str")
This creates a string for text output of the xmlns declarations: 'xmlns:p1="http://domain.com/ns/p1"
xmlns:p2="http://domain.com/ns/p2"'

In general, the ns:sparql() and ns:turtle() functions will be prepended to a SPARQL query or update or a turtle header using the fn:concat(), fn:string-join() or the "||" concatenation operators. For instance, assuming that all of the namespaces have been declare in the initial Turtle file, you could do a SPARQL query using the following code:

let $maps := sem:sparql(ns:sparql()||'
  select ?label ?value from graph:foo where {
    ?s skos:prefLabel ?_label.
    ?item skos:broader ?s.
    ?item skos:prefLabel ?label.
    ?item skos:notation ?value.
    } order by ?label',
  map:entry("_label","carnivores"))
return $maps

The ns:sparql() function returns all of the relevant namespace declarations, including the graph: and skos: namespaces. The output here would then be a "table" (which is what a sequence of maps are) for all of the classes of carnivores which might be found in a biological classification taxonomy.

?label ?value
Cats Felinidae
Dogs Canidae
Foxes Vulpidae


The ns:turtle() function works similarly, providing the preamble declarations for a turtle data file. 

Understanding the Prefix Map

The ns:prefix-map()  creates a key/value map where the prefixes are the keys and the namespaces are the values. Prefix maps can be used with the sem:curie-expand() or sem:curie-shrink() functions as the second parameter to these functions:

sem:curie-expand("skos:prefLabel",ns:prefix-map())
=>sem:iri("http://www.w3.org/2004/02/skos/core#")

Note that in general this map requires making a sparql call and then folding the map structure, so if you are doing a lot of sem:curie-*() operations, it is generally preferable to store the map into a variable and reference the variable, especially if you are creating triples or making calls within a loop. Here's an example illustrating a biology taxonomy:

let $prefix-map := ns:prefix-map()
let $triples := (
  sem:triple(
    sem:curie-expand("concept:Felidae",$prefix-map),
    sem:curie-expand("skos:broader",$prefix-map),
    sem:curie-expand("concept:Carnivora",$prefix-map)
  ),
  sem:triple(
    sem:curie-expand("concept:Canidae",$prefix-map),
    sem:curie-expand("skos:broader",$prefix-map),
    sem:curie-expand("concept:Carnivora",$prefix-map)
  ),
)
return $triples

One important note here - a SPARQL query is required initially to generate the prefix map, but once generated, it's much faster to cache this in a server field than to calculate this for not only the ns:prefix-map() function, but in fact for just about every function in the ns: module. 

Working with Curies

To faciliate this, I defined two function called ns:ciri() and ns:curie() which directly access the ns:prefix-map() function that in turn reads the "ns:prefix-map" server field.  The ns:load() function must be invoked as part of a server set-up routine or whenever the namespaces are changed in order to make sure this field is available. 

The ns:ciri() function is a wrapper around sem:curie-expand(), but implicitly uses the prefix-map stored in memory if it's available (or loads it in if its not). The name "ciri" is short for "Convert to IRI". The ns:curie() function does the same for sem:curie-shorten()

let $ciri := ns:ciri("concept:Felidae")
let $curie := ns:curie($ciri)
return ($ciri,$curie)
=> sem:iri("http://optum.com/ns/canonical/concept/Felinidae")
=> "concept:Felinidae"

While you can use ns:ciri() and ns:curie() with the sem:triple() function, you would actually be better off using the built in sem:rdf-builder function, with the ns:prefix-map() providing the base prefix map, then use ns:ciri() and ns:curie() wherever working with curies would make sense:

let $fn-triple := sem:rdf-builder(ns:prefix-map())
let $triples := (
$fn-triple("concept:Mammalia","skos:broader","concept:Chordates")
$fn-triple("concept:Carnivora","skos:broader","concept:Mammalia"),
$fn-triple("concept:Insectivora","skos:broader","concept:Mammalia"),
$fn-triple("concept:Felidae","skos:broader","concept:Carnivora"),
$fn-triple("concept:Canidae","skos:broader","concept:Carnivora"),
$fn-triple("concept:Vulpini","skos:broader","concept:Canidae"),
$fn-triple("concept:Canis","skos:broader","concept:Canidae"),
$fn-triple("concept:Felinae","skos:broader","concept:Felidae"),
$fn-triple("concept:Pantherinae","skos:broader","concept:Felidae"),
)
return sem:graph-insert(ns:ciri("graph:temp"),$triples)

The sem:rdf-builder() function is a factory - it generates a function that takes the subject, predicate and object curies (or values, if the third (object) expression can't be evaluated as a curie), expands them, then converts them into triples. The sem:graph-insert() function then places these newly generated triples into the graph:temp graph (which is again an IRI). 

Cleaning Up Turtle

The ns:namespace-map() isn't used nearly as often, but it occasionally comes in handy. This function uses the namespace itself as a lookup for the prefix used within the system. This is especially helpful when getting source from external content that have the same namespaces but use different prefixes.

One of the most common cases where this is used is to solve what I consider a minor bug in MarkLogic. I've found that complex queries can end up involving dozens of namespaces, and so the namespaces prefixes can actually be fairly important, especially when exporting content out to Turtle.  The problem that MarkLogic has is that while it's serialization out to Turtle is quite fast, there's no clean way of telling it to use my prefixes, so instead it serializes out using it's own internal prefixes. For example, if you run the following against a SKOS-oriented database I manage:

let $triples := sem:sparql(ns:sparql()||'
describe ?concept from graph:concepts where {
  ?concept rdf:type skos:Concept.
  } limit 3')
return sem:rdf-serialize($triples,"turtle")

gives me a sample Turtle output that looks something like this:

@prefix p2: <http://semanticalllc.com/ns/canonical/scheme/> .
@prefix p4: <http://semanticalllc.com/ns/canonical/term/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix p0: <http://semanticalllc.com/ns/canonical/concept/> .
@prefix xs: <http://www.w3.org/2001/XMLSchema#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix p3: <http://www.w3.org/2008/05/skos-xl#> .
@prefix p6: <http://semanticalllc.com/ns/canonical/skosx/> .

p0:AddressTypes_HR_Mailing_Address
                a               skos:Concept ;
                skos:inScheme   p2:Root_Vocabularies ;
                p6:spellsLike   "rmlnktrs" ;
                p6:code         "8" ;
                p6:path         "concept/AddressTypes_HR_Mailing_Address" ;
                p6:weight       "8"^^xs:integer ;
                skos:broader    p0:Demographics_AddressTypes ;
                p3:prefLabel    p4:HR_Mailing_Address .

p0:AddressTypes_Corporate_Physical_Address
                a               skos:Concept ;
                skos:inScheme   p2:Root_Vocabularies ;
                p6:spellsLike   "krprtfskltrs" ;
                p6:code         "6" ;
                p6:path         "concept/AddressTypes_Corporate_Physical_Address" ;
                p6:weight       "6"^^xs:integer ;
                skos:broader    p0:Demographics_AddressTypes ;
                p3:prefLabel    p4:Corporate_Physical_Address .

p0:AddressTypes_Alternate_Address
                a               skos:Concept ;
                skos:inScheme   p2:Root_Vocabularies ;
                p6:spellsLike   "altrnttrs" ;
                p6:code         "5" ;
                p6:path         "concept/AddressTypes_Alternate_Address" ;
                p6:weight       "5"^^xs:integer ;
                skos:broader    p0:Demographics_AddressTypes ;
                p3:prefLabel    p4:Alternate_Address .

This can be incredibly hard to read, and while valid, I'd really prefer to see "concept: Demographics_AddressTypes" rather than p0:Demographics_AddressTypes" (this becomes really exciting when you have GUIDs rather than understandable names).

The ns:remap-turtle() function makes use of the ns:namespace-map() to do some regular expression parsing on turtle files to map the generated namespaces with the ones you have defined in your prefix map.

declare function ns:remap-turtle($turtle as xs:string) as xs:string {
let $prefix-lines := for $line in fn:tokenize($turtle,"\n") where fn:starts-with($line,"@prefix ") return $line
let $namespace-map := ns:namespace-map()
    let $turtle-map := map:entry("turtle",$turtle)
    let $prefix-map := map:new(
    for $prefix-line in $prefix-lines return
    let $regex := fn:tokenize(fn:replace($prefix-line,"@prefix (.+?): <(.+?)>\s*\.","$1|$2"),"\|")
    let $source-prefix := $regex[1]
    let $namespace := $regex[2]
    return map:entry($regex[1],map:get($namespace-map,$regex[2]))
    )
    let $_ := for $key in map:keys($prefix-map) 
return map:put($turtle-map,"turtle",
fn:replace(map:get($turtle-map,"turtle"),
            $key||":",map:get($prefix-map,$key)||":"))
    return map:get($turtle-map,"turtle")
};

With this, you can modify the script to include ns:remap-turtle() foldered around the sem:rdf-serialize() function

let $triples := sem:sparql(ns:sparql()||'
describe ?concept from graph:concepts where {
?concept rdf:type skos:Concept.
} limit 3')
return ns:remap-turtle(sem:rdf-serialize($triples,"turtle"))

to generate what I would consider clean output:

@prefix scheme: <http://semanticalllc.com/ns/canonical/scheme/> .
@prefix term: <http://semanticalllc.com/ns/canonical/term/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix concept: <http://semanticalllc.com/ns/canonical/concept/> .
@prefix xs: <http://www.w3.org/2001/XMLSchema#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix skosxl: <http://www.w3.org/2008/05/skos-xl#> .
@prefix skosx: <http://semanticalllc.com/ns/canonical/skosx/> .

concept:AddressTypes_HR_Mailing_Address
  a skos:Concept ;
  skos:inScheme scheme:Root_Vocabularies ;
  skosx:spellsLike "rmlnktrs" ;
  skosx:code "8" ;
  skosx:path "concept/AddressTypes_HR_Mailing_Address" ;
  skosx:weight "8"^^xs:integer ;
  skos:broader concept:Demographics_AddressTypes ;
  skosxl:prefLabel term:HR_Mailing_Address .

concept:AddressTypes_Corporate_Physical_Address
  a skos:Concept ;
  skos:inScheme scheme:Root_Vocabularies ;
  skosx:spellsLike "krprtfskltrs" ;
  skosx:code "6" ;
  skosx:path "concept/AddressTypes_Corporate_Physical_Address" ;
  skosx:weight "6"^^xs:integer ;
  skos:broader concept:Demographics_AddressTypes ;
  skosxl:prefLabel term:Corporate_Physical_Address .

concept:AddressTypes_Alternate_Address
  a skos:Concept ;
  skos:inScheme scheme:Root_Vocabularies ;
  skosx:spellsLike "altrnttrs" ;
  skosx:code "5" ;
  skosx:path "concept/AddressTypes_Alternate_Address" ;
  skosx:weight "5"^^xs:integer ;
  skos:broader concept:Demographics_AddressTypes ;
  skosxl:prefLabel term:Alternate_Address .

Supporting Easier RDFa

There are three additional serialization modes that the namespaces object can be used for. The first, ns:rdfa(), is intended to simplify the generation of RDFa within HTML documents, and consists of a string of space separated prefixes and namespaces within the prefixes attribute. By using the prefix attribute and curies, what's happening within an HTML+RDFa document can be made much more obvious.

<html prefix="{ns:rdfa(("concept","skos"))}">
<body>
<article about="concept:Carnivora">
<h1>Carnivores</h1>
<p>Carnivores include two primary clades, 
the first, <span about="concept:Canidiformia"><span property="skos:broader" resource="concept:Carnivora">dog-like carnivores or Canidiforms</span></span>, include
<span about="concept:Canis"><span property="skos:broader" resource="concept:Canidiformia">true dogs, wolves</span></span>, 
<span about="concept:Ursus"><span property="skos:broader" resource="concept:Canidiformia">bears</span></span>, and 
<span about="concept:Muskelids"><span property="skos:broader" resource="concept:Canidiformia">otters</span></span>.
</p>
<p>The second major clade or sub-order, are the 
<span about="concept:Feliformia">
<span property="skos:broader" resource="concept:Carnivora">cat-like carnivores of Feliforms</span></span>,
including the
<span about="concept:Felidae"><span property="skos:broader" resource="concept:Feliformia">cats, panthers and lions</span></span>,
<span about="concept:Hyenidae"><span property="skos:broader" resource="concept:Feliformia">hyenas</span></span> and
<span about="concept:Herpestidae"><span property="skos:broader" resource="concept:Feliformia">mongooses</span></span>.
</p>
</article>
</body>
</html>

Note here that the serialization function has an argument consisting of a sequence of prefixes: ns:rdfa(("skos","concept")). All of the serializations functions can take such sequences, and this will give the output of only those prefixes that are given, rather than all of the prefixes. This can be handy in those cases like the ones above, where there are only two RDFa namespaces that are relevant.

The output is space separated, with the prefix ending in a colon (":").

<html prefix="concept: http://semanticalllc.com/ns/canonical/concept/ 
  skos: http://www.w3.org/2004/02/skos/core#">
  <body>
  <article about="concept:Carnivora">
  <h1>Carnivores</h1>
  <p>Carnivores include two primary clades,  
         the first, 
<span about="concept:Canidiformia">
          <span property="skos:broader" resource="concept:Carnivora">
          dog-like carnivores or Canidiforms
          </span>
        </span>, include...
        </p>
    </article>
  </body>
</html>

RDFa offers a lot of potential, but the complexity of using full URIs tends to obscure what's actually going on, especially in larger documents. By the way, the above document, can be run through and RDFa parser (such as the one written in Python at http://www.w3.org/2012/pyRdfa/Validator.html#distill_by_input), to generate RDF content via GRDDL. In this case, that content is given as:

@prefix concept: <http://semanticalllc.com/ns/canonical/concept/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .

concept:Canis skos:broader concept:Canidiformia .
concept:Felidae skos:broader concept:Feliformia .
concept:Herpestidae skos:broader concept:Feliformia .
concept:Hyenidae skos:broader concept:Feliformia .
concept:Muskelids skos:broader concept:Canidiformia .
concept:Ursus skos:broader concept:Canidiformia .
concept:Canidiformia skos:broader concept:Carnivora .
concept:Feliformia skos:broader concept:Carnivora .

A Couple of XML Serializations

The ns:xquery(), ns:xmlns() and ns:xmlns-str() functions generate preambles for XQuery, namespace nodes for assigning namespaces to XML functions and the "xmlns:foo" strings for when HTML is being generated as text.

ns:xquery(("skos","skosxl"))
=> declare namespace skos = "http://www.w3.org/2004/02/skos/core#";
=> declare namespace skosxl = "http://www.w3.org/2008/05/skos-xl#";
<foo>{ns:xmlns(("skos","skosxl"))}<bar>5</bar><bat>blue</bat></foo>
=> <foo xmlns:skos="http://www.w3.org/2004/02/skos/core#" 
        xmlns:skosxl="http://www.w3.org/2008/05/skos-xl#">
        <bar>5</bar>
        <bat>blue</bat>
    </foo>
let $text := "<foo "||ns:xmlns-str(("skos","skosxl"))||
             "><skos:bar>5</skos:bar><skosxl:bat>blue</skosxl:bat></foo>"
return xdmp:unquote($text)
=> <foo xmlns:skos="http://www.w3.org/2004/02/skos/core#" 
        xmlns:skosxl="http://www.w3.org/2008/05/skos-xl#">
  <skos:bar>5</skos:bar>
  <skosxl:bat>blue</skosxl:bat>
   </foo>

The output of the ns:xmlns() functions makes use of a new XQuery 3.0 function - namespace nodes. This function actually generates an output consists of a sequence of namespace node with associated prefixes, and is used most often when you want to take different XML nodes in varying namespaces and have them all share a single common declaration. As such, it should be the first item in the child sequence for the dynamic constructor of the element.

Conclusion

The ns: module is a workhorse library designed to be fast, flexible and usable throughout semantic applications (and even non-semantic XQuery applications). The full module is available as a gist at https://gist.github.com/kurtcagle/3e047942d2b4c2ee3b2e, and sample namespace.ttl file is available as a gist at https://gist.github.com/kurtcagle/bf02000f8a96386080c1 . These are published under the Apache license. For more information, please contact me at kurt.cagle at gmail.com. 

What if you could learn how to use MongoDB directly from the experts, on your schedule, for free? We've put together the ultimate guide for learning MongoDBSign up and you'll receive instructions for how to get started!

Topics:
semantics ,marklogic ,namespaces ,sparql ,xquery ,turtle ,rdf

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}