Many APIs, Same Data
In “The Seven Most Popular APIs in Big Data—Part One,” I described the various forms of popular data management APIs out there and how they’re often used. As I noted toward the end of the post, currently each of the APIs is often tied with a specific data model and data store and is used by different APIs that serve other use cases.
One of the main reasons behind this is that each API represents a specific optimization that requires a fairly different data structure. Common to all of the various techniques however is that they were written under the assumption that disk is a bottleneck. That led to various point optimizations, architecture, and algorithms that are used by each of the APIs to bypass the disk bottleneck.
Memory Based Data Management Is Ready for Flash
Here's the good news: this need not always be the case. Flash devices have become so advanced that they are able to completely remove the disk from causing this bottleneck, which then makes it possible for us to explore API and data structures that will serve them in a completely different way.
I would argue that there's no longer a need for point optimizations that are not compatible with one another and we can instead use a common data structure to serve each of the APIs.
Same Data, Many APIs
So unlike the majority of existing data management solutions which have been designed with the assumption that disk will be a bottleneck, In Memory Data Grids are based on RAM and therefore designed with the assumption that the data device is fast. Best of all, moving memory-based solutions into flash doesn't require any significant change to your current system.
To illustrate this, let's look at our solution, XAP, and what it's able to do:
This example shows how one can write the same data as an object, read it as a document, query it through SQL via JDBC connection, navigate through its subfields as in object graphs, and so on. XAP even allows for the same data store to run both transactional data and analytics, vastly reducing complex data transformation.
Utopia vs. Reality
The ideal system would be one where we could rely on just one common data store that's been outfitted with a set of light-weight services that could expose different APIs and semantics to access data, as illustrated in the diagram below:
In reality however, the maturity and cost of a flash-based solution isn't yet ready for such a transformation. A more realistic scenario is for a flash-based device to be used as a data bus or hub that can act as a front-end for various databases and for more high performance and latency-sensitive use cases. In order to reduce transformation complexity, however, flash will need to include built-in synchronization with those databases and streaming technologies so that the area of complexity is handled implicitly.
Memory Based Data Management Systems Are More Suitable to Serve as High Speed Data Buses
Many of the existing databases were designed with the assumption that they serve as the main data store; therefore, moving from MySQL to Mongo for example often means a complete re-write.
Memory Data Grids on the other hand were designed as an extension to the database that takes/holds the part of the data with high contention and serves it in-memory to speed up the access time and scalability. This approach is considered more complementary as it only affects part of the application that needs high speed access, where the remaining of the application continues to work with the same underlying database as if nothing had changed. As a result of that, most of the existing Data-Grid solutions already include fairly rich data synchronization plugins—hence why I would argue that they are more suitable to serve as a Bus or a Hub.
Example: Using High Speed Data Bus with Storm
I’ll leave you with one example of where this framework is effective: “Storm.” This stream processing framework that's used by Twitter includes a few plug-in external feeds (Spouts) while also still maintaining the state of processing (Trident State). The following illustration demonstrates how this integration can be used to handle real time processing of web page analytics by using Storm and XAP as the high speed data bus.