How We Built an Asynchronous, Temporal RESTful API Based on Vert.x, Keycloak and Kotlin/Coroutines for Sirix.io (Open Source)

Check out how you can build an asychronous RESTful API using Sirix, Vert.x, and Keycloakis.

Johannes Lichtenberger

Updated Mar. 05, 19 · Interview

Likes (4)

Comment

Save

6.4K Views

Using hierarchical edge bundles to visualize moves

Why storing historical data becomes feasible nowadays

Life is subdued to constant evolution. So is our data, be it in research, business or personal information management. As such it’s surprising that databases usually just keep the current state. With the advent, however of flash drives as for instance SSDs, which are much faster in randomly accessing data in stark contrast to spinning disks and not very good at erasing or overriding data, we are now capable of developing clever versioning algorithms and storage systems to keep past states while not impeding efficiency/performance. Search/insertion/deletion-operations should therefore be in logarithmic time (O(log(n)), to compete with commonly used index structures.

The temporal storage system SirixDB

Sirix is a versioned, temporal storage system, which is log-structured at its very core.

We support N read-only transactions, which are bound to a single revision (each transaction might be started on any past revision) concurrently to one write transaction on a single resource. Our system thus is based on snapshot isolation. The write-transaction can revert the most recent revision to any past revision. Changes to this past revision can then be commit to create a new snapshot and therefore a new revision.

Image title

Writes are batched and synced to disk in a post-order traversal of the internal index tree-structure, during a transaction commit. Thus we are able to store hashes of the pages in parent-pointers just like ZFS for future integrity checks.

Snapshots, that is new revisions are created during every commit. Apart from a numerical revision-ID, the timestamp is serialized. A revision can afterwards be opened either by specifying the ID or the timestamp. Using a timestamp involes a binary-search on an array of timestamps, stored persistently in a second file and loaded in memory on startup. The search ends if either the exact timestamp is found or the closest revision to the given point in time. Data is never written back to the same place and thus not modified in-place. Instead, Sirix uses copy-on-write (COW) semantics at the record-level (creates page-fragments and usually doesn’t copy whole pages). Everytime a page has to be modified records, which have changed as well as some of the unchanged records are written to a new location. Which records exactly are copied depends on the versioning algorithm used. It’s thus especially well suited for flash-based drives as for instance SSDs. Changes to a resource within a database occur within the aforementioned resource-bound single write-transaction. Therefore, first a ResourceManager has to be opened on the specific resource to start single resource-wide transactions. Note, that we already started work on database wide transactions :-)

We recently wrote another article with much more background information on the principles behind Sirix.

Simple, transaction cursor based API

The following shows a simple Java code to create a database, a resource within the database and the import of an XML-document. It will be shredded to our internal representation (which can be thought of as a persistent DOM implementation, thus both an in-memory layout as well as a binary serialization format is involved).

// Path to the database.
var file = Paths.get("sirix-database");

// Create the database.
var dbConfig = new DatabaseConfiguration(file);
Databases.createDatabase(dbConfig);

// Open the database.
try (var database = Databases.openDatabase(file)) {
  /* 
   * Create an empty resource in the database with the name resource1. Store DeweyIDs (hierarchical node labels),
   * use text node compression, build a summary of all paths in the resource and use the SLIDING_SNAPSHOT versioning
   * algorithm.
   */
  database.createResource(
            ResourceConfiguration.newBuilder("resource", dbConfig)
                                 .useDeweyIDs(true)
                                 .useTextCompression(true)
                                 .buildPathSummary(true)
                                 .versioningApproach(Versioning.SLIDING_SNAPSHOT)
                                 .build());
  try (
      // Start a resource manager on the given resource.
      var manager = database.getXdmResourceManager("resource");
      // Start the single read/write transaction.
      var wtx = manager.beginNodeWriteTrx()) {
      // Import an XML-document.
      wtx.insertSubtreeAsFirstChild(XMLShredder.createFileReader(LOCATION.resolve("input.xml")));
  }
}

The native storage of JSON will be the next. In general every kind of data could be stored in Sirix as long as it can be fetched by a generated sequential, stable record-identifier, which is assigned by Sirix during insertion and a custom serializer/deserializer is plugged in. However, we are working on several small layers for natively storing JSON data.

Vert.x, Kotlin/Coroutines and Keycloak

Vert.x on the other hand is closely modeled after Node.js and for the JVM. Everything in Vert.x should be non blocking. As thus a single thread called an event-loop can handle a lot of requests. Blocking calls have to be handled on a special Thread Pool. Default are two event-loops per CPU (Multi-Reactor Pattern).

We are using Kotlin, because it’s simple and concise. One of the features, which is really interesting are coroutines. Conceptually they are like very lightweight threads. While creating threads is very expensive creating a coroutine is not. Coroutines allow to writing of asynchronous code almost like sequential. Whenever it is suspended due to blocking calls or long running tasks, the underlying thread is not blocked and can be reused. Under the hood each suspending function gets another parameter through the Kotlin compiler, a continuation, which stores where to resume the function (normal resuming, resuming with an exception).

Keycloak is used as the authorization server via OAuth2 (Password Credentials Flow), as we decided not to implement authorization ourselves.

Things to consider when building the Server

First, we have to decide, which OAuth2 flow best suits our needs. As we built a REST-API usually not consumed by user agents/browsers we decided to use the Pasword Credentials Flow. It is as simple as this: first get an access token, second send it with each request in the Authorization header.

In order to get the access-token, first a request has to be made against a POST /login — route with the username/password credentials sent in the body as a JSON-object.

The implementation looks like this:

post("/login").produces("application/json").coroutineHandler { rc ->
    val userJson = rc.bodyAsJson
    val user = keycloak.authenticateAwait(userJson)
    rc.response().end(user.principal().toString())
}

The coroutineHandler is a simple extension function:

/** * An extension method for simplifying coroutines usage with Vert.x Web routers. */
private fun Route.coroutineHandler(fn: suspend (RoutingContext) -> Unit) {
  handler { ctx ->
    launch(ctx.vertx().dispatcher()) {
      try {
        fn(ctx)
      } catch (e: Exception) {
        ctx.fail(e)
      }
    }
  }
}

Coroutines are launched on the Vert.x event loop (the dispatcher).

In order to execute a longer running handler we use:

vertxContext.executeBlockingAwait(Handler<Future<Nothing>> {
  // Do longer running task.
})

Vert.x uses a different thread pool for these kind of tasks. The task is thus executed in another thread. Beware that the event loop isn’t going to be blocked, the coroutine is going to be suspended.

API design by example

Now we are switching the focus to our API and show how it’s designed with examples. We first need to set up our server and Keycloak (read on http://sirix.io how to do this).

Once both servers are up and running, we’re able to write a simple HTTP-Client. We first have to obtain a token from the /login endpoint with a given "username/password" JSON-Object. Using an asynchronous HTTP-Client (from Vert.x) in Kotlin, it looks like this:

val server = "https://localhost:9443"

val credentials = json {
  obj("username" to "testUser",
      "password" to "testPass")
}

val response = client.postAbs("$server/login").sendJsonAwait(credentials)

if (200 == response.statusCode()) {
  val user = response.bodyAsJsonObject()
  val accessToken = user.getString("access_token")
}

This access token must then be sent in the Authorization HTTP-Header for each subsequent request. Storing the first resource would look like (simple HTTP PUT-Request):

val xml = """
    <xml>
      foo
      <bar/>
    </xml>
""".trimIndent()

var httpResponse = client.putAbs("$server/database/resource1").putHeader(HttpHeaders.AUTHORIZATION.toString(), "Bearer $accessToken").sendBufferAwait(Buffer.buffer(xml))

if (200 == response.statusCode()) {
  println("Stored document.")
} else {
  println("Something went wrong ${response.message}")
}

First, an empty database with the name database with some metadata is created, second the XML-fragment is stored with the name resource1. The PUT HTTP-Request is idempotent. Another PUT-Request with the same URL endpoint would just delete the former database and resource and recreate the database/resource.

The HTTP response-code should be 200 (everything went fine) in which case the HTTP-body yields:

<rest:sequence xmlns:rest="https://sirix.io/rest">
  <rest:item>
    <xml rest:id="1">
      foo
      <bar rest:id="3"/>
    </xml>
  </rest:item>
</rest:sequence>

We are serializing the generated IDs from our storage system for element-nodes.

Via a GET HTTP-Request to https://localhost:9443/database/resource1 we are also able to retrieve the stored resource again.

However, this is not really interesting so far. We can update the resource via a POST-Request. Assuming we retrieved the access token as before, we can simply do a POST-Request and use the information we gathered before about the node-IDs:

val xml = """
    <test>
      yikes
      <bar/>
    </test>
""".trimIndent()

val url = "$server/database/resource1?nodeId=3&insert=asFirstChild"

val httpResponse = client.postAbs(url).putHeader(HttpHeaders.AUTHORIZATION
                         .toString(), "Bearer $accessToken").sendBufferAwait(Buffer.buffer(xml))

The interesting part is the URL, we are using as the endpoint. We simply say, select the node with the ID 3, then insert the given XML-fragment as the first child. This yields the following serialized XML-document:

<rest:sequence xmlns:rest="https://sirix.io/rest">
  <rest:item>
    <xml rest:id="1">
      foo
      <bar rest:id="3">
        <test rest:id="4">
          yikes
          <bar rest:id="6"/>
        </test>
      </bar>
    </xml>
  </rest:item>
</rest:sequence>

The interesting part is that every PUT- as well as POST-request does an implicit commit of the underlying transaction. Thus, we are now able send the first GET-request for retrieving the contents of the whole resource again for instance through specifying an simple XPath-query, to select the root-node in all revisions GET https://localhost:9443/database/resource1?query=/xml/all-time::* and get the following XPath-result:

<rest:sequence xmlns:rest="https://sirix.io/rest">
  <rest:item rest:revision="1" rest:revisionTimestamp="2018-12-20T18:44:39.464Z">
    <xml rest:id="1">
      foo
      <bar rest:id="3"/>
    </xml>
  </rest:item>
  <rest:item rest:revision="2" rest:revisionTimestamp="2018-12-20T18:44:39.518Z">
    <xml rest:id="1">
      foo
      <bar rest:id="3">
        <xml rest:id="4">
          foo
          <bar rest:id="6"/>
        </xml>
      </bar>
    </xml>
  </rest:item>
</rest:sequence>

Note, that we used a time-traveling axis in the query-parameter. In general we support several additional temporal XPath axis: future, future-or-self, past, past-or-self, previous, previous-or-self, next, next-or-self, first, last, all-time

Time axes are compatible with node tests:
<time axis>::<node test>
is defined as
<time axis>::*/self::<node test>.

Of course, the usual way would be, to use one of the standard XPath axis first to navigate to the nodes you are interested in as for instance, the descendant- and/or child-axis, add predicate(s), and then navigate in time, to watch how a node and subtree changed. This is an incredible powerful feature and might be the subject of a future article.

The same can be achieved through specifying a range of revisions to serialize (start- and end-revision parameters) in the GET-request:

GET https://localhost:9443/database/resource1?start-revision=1&end-revision=2

or via timestamps:

GET https://localhost:9443/database/resource1?start-revision-timestamp=2018-12-20T18:00:00&end-revision-timestamp=2018-12-20T19:00:00

However, if we first open a resource, then via a query select individual nodes, it is faster to use the time traveling axis, otherwise the same query has to be executed for each opened revision (parsed, compiled, executed…).

We for sure are also able to delete the resource or any subtree thereof by an updating XQuery expression (which is not very RESTful) or with a simple DELETE HTTP-request:

val url = "$server/database/resource1?nodeId=3"

val httpResponse = client.deleteAbs(url).putHeader(HttpHeaders.AUTHORIZATION
                         .toString(), "Bearer $accessToken").sendAwait()

if (200 == httpResponse.statusCode()) {
  ...
}

This deletes the node with ID 3 and in our case as it’s an element node the whole subtree. For sure it’s committed as revision 3 and as such all old revisions still can be queried for the whole subtree as it was during the transaction-commit (in the first revision it’s only the element with the name “bar” without any subtree).

If we want to get a diff, currently in the form of an XQuery Update Statement, simply call the XQuery function sdb:diff which is defined as:

sdb:diff($coll as xs:string, $res as xs:string, $rev1 as xs:int, $rev2 as xs:int) as xs:string

We could specify other serialization formats for sure.

For instance, we can send a GET-request like this on the database/resource1 we created above:

GET https://localhost:9443/?query=sdb%3Adiff%28%27database%27%2C%27resource1%27%2C1%2C2%29

Note that the query-String has to be URL-encoded, thus decoded it’s

sdb:diff('database','resource1',1,2)

and we are comparing revision 1 and 2 (but diffing performance for sure is in the same time complexity for each revision-tuple we compare). The output for the diff in our example is this XQuery-Update statement wrapped in an enclosing sequence-element:

<rest:sequence xmlns:rest="https://sirix.io/rest">
  let $doc := sdb:doc('database','resource1', 1)
  return (
    insert nodes <xml>foo<bar/></xml> as first into sdb:select-node($doc, 3)
  )
</rest:sequence>

This means the resource1 from database is opened in the first revision. Then the subtree <xml>foo<bar/></xml> is appended to the node with the stable node-ID 3 as a first child.

If you like this, please give us a star on Github… and most importantly check it out (we would love to hear any suggestions, feedback, suggestions for future work, as for instance work on JSON or horizontal scaling in the cloud, bug reports ;-), just everything… please get in contact) :-)

Our Open Source repository: http://sirix.io

API Vert.x Database Open source Keycloak REST Web Protocols Event loop IT Data (computing)

Opinions expressed by DZone contributors are their own.

Related

Trending