Over a million developers have joined DZone.

Asynchronous Datastore Manifesto

DZone 's Guide to

Asynchronous Datastore Manifesto

· Cloud Zone ·
Free Resource

My work with NoSQL datastores over the last couple of years has given me some insight into the direction applications will inevitably take as NoSQL becomes the dominant data storage and retrieval method—at least for web and cloud-based applications (enterprise applications will get there eventually, but that’s going to take a lot longer). I’ve learned to trust my instincts over the years and my instincts are screaming at me that this approach has value and should be explored by someone—even if I don’t personally have the time to write this system.

Asynchronous Access

The entire approach is, I think, centered around non-blocking, asynchronous access to the data. Generally speaking, if we want asynchronous messaging in our applications, we’d have to enlist the help of a message broker, whose sole purpose in life is to route messages asynchronously. My favorite, as you no doubt know, is RabbitMQ. But in trying to extend some key functionality of RabbitMQ and coming to grips with how excruciatingly difficult it is in practicality, I’m guessing that more progress can be made faster by leveraging a much lighter-weight asynchronous library that has no conformity to a specific protocol as does RabbitMQ to AMQP.

Node.js has changed the shape of web application development for the better, in my opinion. Though its true asynchronous applications are more difficult to build (so not as popular among the rank-and-file quite yet) they are demonstrably more scalable and work better in a cloud environment where a user might want lots of relatively small instances of virtual machines that can coordinate with one another. Messaging is the logical choice to achieve this goal.

We could certainly enlist our handy message broker here, add some consumers, write a few producers in our web application, and call it good.

But I’m not content with good. Ever.

Data Is King

If we strip our application down to its purest form, the only thing we care about is the data. I might choose something like my favorite NoSQL datastore, Riak, because it can scale my data and I can execute distributed Map/Reduce on my data. But the data is the entire reason for the application in the first place. Whatever I do inside the application is done in the context of the data. If I use a message broker, the messages are data. If I put up a web form, I’m accepting data. If I generate a report, I’m reporting on data.

Nothing matters but the data.

But a message broker doesn’t care about the data. Its simply a conductor. And the message store doesn’t care about the conductor. But that really shouldn’t be the case.

Since I think in code, maybe an example is in order here to illustrate my point.

Imagine I need to convert and scale an uploaded image into a thumbnail. To do this, I write a simple program that uses ImageMagick to scale, crop, and convert an image to a JPEG. I also create a web form that allows a user to upload their image. In this asynchronous datastore world, my image converter logic should be able to listen for INSERT or UPDATE events in the datastore and convert the incoming data, storing a thumbnail of the uploaded image automatically.

Image converter pseudo-code:

def db = asyncdb.connect("tcp://localhost:5555") 
def img = request.get_upload_data("image") 
def metadata = [ content_type: "image/jpg" ] 
db.push("imagebucket", img.name, img.data, metadata, { event, data -> 
  if(event.type == SUCCESS) { 
    db.push("profilebucket", "$user.profile", [ avatar: data.key + ".thumbnail" ]) 

What this does:

  • Connects to a node of the datastore.
  • Subscribes to datastore events by passing a Closure that can be called that returns true|false and passes a Closure to call when the filter Closure returns true.
  • When called, this updates the thumbnail automatically and stores a version of the original image under a special key.

In my web application controller, I would insert the uploaded image using the asynchronous datastore access client.

def db = asyncdb.connect("tcp://localhost:5555", [ name: "image.converter", 
                                            description: "Image thumbnailing listener" ])

What this does:

  • Connects to a node of the datastore.
  • Fashions a new datastore entry, including metadata that will be sufficient to trigger the thumbnailer listener.
  • Asynchronously “pushes" the image data into the datastore and registers an event handler so that when the listener successfully thumbnails the image, the client’s callback is invoked.
  • When the thumbnail has been successfully created, the user’s profile is updated by pushing new data into it that refers to the newly-converted thumbnail.

Notice that everything is done in a non-blocking and asynchronous way. Data integrity is maintained in that the profile is not updated until the thumbnail image has been created. This system is also stateless. Every node knows about each other, so a load balancer can send the first part of a request to one server and the second half of the request to a different server—but none of that matters because one operation that depends on the other waits for a particular event to be emitted.

A Mix of Messaging and Data Handling

This form of data access makes a lot of sense to me. Although it may mix pieces of an application that have not traditionally gone together (asynchronous messaging and data storage), it can lead to very succinct and easy-to-understand applications.

The datastore should provide a web UI so that developers can interrogate the internals of the system and see if events are waiting to be delivered. Client methods should ideally also accept arbitrary metadata that the web UI could display to the developer so they could easily see what a reported listener actually does. Something like this:

def db = asyncdb.connect("tcp://localhost:5555", [ name: "image.converter", 
description: "Image thumbnailing listener" ]) 

No datastore I know of currently supports these kinds of things. It could be that I’m the only one that would really like to have these kinds of things. But as data access graduates to a more asynchronous, NoSQL world, our application development patterns will change. I can guarantee you that. The only question is “to what will it change?"


Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}