Since the dawn of software development, the relational database has served as the de facto solution for handling most of our data persistence needs. With the advent of a new breed of large scale, high performance architectures such as Facebook and Twitter however, the role of the traditional database is changing. DZone recently had a chance to catch up with Gigaspaces CTO Nati Shalom to get his thoughts on the 'NoSQL' phenomenon, some of its underlying principles, and to learn more about how this ties in to the Gigaspaces' 2010 product roadmap.
DZone: Nati, what are the common principles underlying the ‘NoSQL’ approach?
Nati Shalom: Most of the write-ups that I saw on that regard seemed to assume a specific implementation in mind and tried to draw a pattern through a reverse engineering of that specific implementation. For example I saw that in some cases people used the "No Schema" model that is often used for document and search engines as the common principle for all of the NoSQL alternatives, even the word NoSQL itself can be misleading as some of the NoSQL alternatives support a good portion of the SQL semantics. When I refer to common principles I'd like to distinguish between things that I would count generic and those that I would consider implementations specific. For example the demand for NoSQL alternatives was driven mainly by the demand for massive scaling – you don't have to become "schemaless" or drop SQL completely to provide a scalable data store. I would therefore classify the requirement for document oriented model on a different category of query requirements that is not necessarily related to the demand for scalable data store that the NoSQL movement represent.
With that in mind I would say that the set of principles that I would consider common are:
- Partition the Data,
- Keep Multiple Replicas of the Same Data
- Flexible performance/consistency model
- Dynamic Scaling
Query Support is an area where there is a fairly substantial difference between the various implementations. The common denominator is a key/value matching, as in a hash table. Some implementations provide more advanced query support, such as the document-oriented approach, where data is stored as blobs, with an associated list of key/value attributes. In this model you get a schema-less storage that makes it easy to add/remove attributes from your documents, without going through schema evolution etc.. Managing aggregated queries is another area of difference between the various NoSQL alternatives.
Map/Reduce (often associated with Hadoop) is also a pattern for parallel aggregated queries. Most of the NOSQL alternatives do not provide built-in support for map/reduce, and require an external framework to handle these kind of queries.
GigaSpaces,provides implicitmap/reduce support as part of our SQL query support, as well as explicitly through an API that is referred to as executors (See the diagram below). With this model, you can send the code to where the data is, and execute the complex query directly on that node.
(Click for larger image)
Figure 1 Implicit & Explicit Map/Reduce support with GigaSpaces
You can find the more details on the common NOSQL principles in one of my recent posts here.
DZone: We have seen the emergence of alternate forms of storage to the traditional, RDBMS, SQL model. What are some of the driving forces behind the ‘noSQL movement,’ as it were?
Nati: As I outlined in one of my posts on the subject: No to SQL? Anti-database movement gains steam – My Take the main driver behind the NOSQL movement are probably
- Demand for extremely large scale
- Complexity and cost of setting up database clusters
- Compromising reliability for better performance
As those limitations are not new it is interesting to analyze why those limitation becomes more critical today then before.
In previous years you had to be at the scale of Amazon or Google to face the need for extreme large scaling. Social computing moved the entire web experience from a fairly static, read mostly experience to a real time interactive environment with viral traffic behavior. Moving to more social web experience brought the demand for scaling to the masses. Cloud computing provided the mean to meet that scaling demand by reducing the cost barrier required to support such a large scale environment. It was clear that the existing set of databases were not designed for this sort of environment and it is therefore not surprising that they didn't fit to this class of applications.
DZone: What are some of the architectural benefits achieved by eliminating the RDBMS layer?
Nati: Personally I think that the RDBMS is not going away anytime soon. I do believe that the current "one size fits it all" databases thinking was and is wrong which opens a place for more specialized data management solutions alongside traditional SQL databases.
There are ways to combine the two together and get the best of both worlds, for example Google provide a JPA abstraction on their big-table implementation – even though there are lots of limitations on their JPA support it provides a smoother transition from the current centralized data store to a distributed data store. More importunately even though you couldn't port any existing JPA application into the Google JPA support you could still write your code in a way that would run with other JPA implementations in order to avoid potential lock-in. Our upcoming JPA support will be taking another step on that direction by providing JPA support directly on our distributed data model and not through an external modeling layer. With that we believe that we can make the use of JPA our distributed model significantly more efficient as well as reduce the limitations quite substantially.
Another example is our Hibernate integration. With our hibernate support users can use their existing database and use an In-Memory Data-Grid as second-level cache or as a complete a front-end to their existing database. In both options we use a combination of key-vale store in conjunction with existing SQL/RDBMS to provide the desired scalability while minimizing the change.
DZone: Two approaches to NoSQL are the file-based and the in-memory approach. When should I consider using one over the other?
Nati: NOSQL alternatives are available as a file-based approach, or as an in-memory-based approach. Some provide a hybrid model that combines memory and disk for overflow. The main difference between the two approaches comes down mostly to cost/GB of data and read/write performance.
An analysis done recently by Stanford University, titled “The Case for RAMCloud” provides an interesting comparison between the disk and memory-based approaches, in terms of cost/performance. In general, it shows that cost is also a function of performance. For low performance, the cost disk based storage is significantly lower then RAM-based approach, and with higher performance requirements, RAM storage becomes significantly cheaper.
The most obvious drawbacks of RAMClouds are high cost per bit and high energy usage per bit. For both of these metrics RAMCloud storage will be 50-100x worse than a pure disk-based system and 5-10x worse than a storage system based on flash memory (see  for sample configurations and metrics). A RAMCloud system will also require more floor space in a datacenter than a system based on disk or flash memory. Thus, if an application needs to store a large amount of data inexpensively and has a relatively low access rate, RAMCloud is not the best solution. However, RAMClouds become much more attractive for applications with high throughput requirements. When measured in terms of cost per operation or energy per operation, RAMClouds are 100-1000x more efficient than disk-based systems and 5-10x more efficient than systems based on flash memory. Thus for systems with high throughput requirements a RAM-Cloud can provide not just high performance but also energy efficiency. It may also be possible to reduce RAMCloud energy usage by taking advantage of the low-power mode offered by DRAM chips, particularly during periods of low activity. In addition to these disadvantages, some of RAM-Cloud's advantages will be lost for applications that require data replication across datacenters. In such environments the latency of updates will be dominated by speed-of-light delays between datacenters, so RAM-Clouds will have little or no latency advantage. In addition, cross-datacenter replication makes it harder for RAMClouds to achieve stronger consistency as described in Section 4.5. However, RAMClouds can still offer exceptionally low latency for reads even with cross-datacenter replication.
Based on that I would say that you could use a pure file-based approach as long as your latency and performance requirements fits the level that is provided by file system. As the research indicate memory based solution provides a better alternative for high performance scenarios. That's explains why in reality what I often see is a combination of the two where memory storage is used to store the real time data and disk based storage the long-lived data. This type of topology is often referred to as Write Behind and is supported by various In-memory-data-grid solutions. With GigaSpaces this model is supported through a Mirror Service. As can be seen in the diagram below:
DZone: In one of your presentations, you describe an approach for designing a ‘scalable Twitter’ application. What makes Twitter an interesting architectural case study? What are some of the key lessons coming out of this?
Nati: Twitter exposes an extreme challenge for scaling:
- The number of users grows continuously
- Each user maintains a list of followers that tends to grow continuously as well
- All communication is done on a many to many approach
The combination of all that makes the traffic pattern in a twitter environment fairly viral and un predictable and therefore stretches almost any boundaries of scaling that we can think of.
As I outlined in a recent paper the first lesson is that scaling should be dealt with a well defined methodology. There are too many cases where scaling is treated as an afterthoughts and through a very painful (and very costly) trial an error process. There are set of common principles, such as partitioning and share-nothing approach that are common to any scalable application. Realizing how to partition your application is probably the toughest challenge.
DZone: What is a ‘space-based architecture’ and how can it be used as an ‘SQL alternative’?
Nati: Space Based Architecture (SBA) is a software architecture pattern for achieving linear scalability of stateful, high-performance applications, based on Yale’s Tuple-Space Model (Source Wikipedia). It takes a holistic end-to-end approach to scaling. SQL alternatives deals with scaling at the data layer. One of the core principles in SBA is to group your data and logic based on data affinity dependency. In this way we can reduce the number of moving parts in our system and therefore turn the make the scaling significantly simpler in addition to that we can reduce the number of network hops between our application and the data and therefore improve the performance and latency of our application. The diagram below show how that model may look like:
Figure 2 A scalable space based twitter architecture
As can be seen in the above diagram we collocate the reader and writer application logic with the user data and create a single partition that includes a reader/writer and data service in one single process. We then create multiple set of those instances and split the load between users among those partitions. The fact that the reader is collocated with a particular data partition doesn't mean that it can only access that particular partition. It can always access the entire cluster. For example in the case where we need to read the specific tweets that was updated on a specific user account the reader will execute a parallel read on all the spaces using an implicit map/reduce style query as outlined above and return the aggregated result to the user.
DZone: GigaSpaces will be unveiling a new ‘On-Demand Middleware Services fo the Enterprise’ offering in 2010. What will this comprise?
Nati: The underlying idea of this initiative is to enable users to use our data grid, messaging or Map/Reduce services by calling a single API, without worrying about installation, configuration, provisioning, sizing or cluster setup. GigaSpaces middleware, with its enterprise-grade transaction support, administration tools and security, will be just as simple to consume as services like Amazon SimpleDB, SQS, and such. Any application, whether it is running on-premises in a traditional or virtualized data center, or on the cloud, will be able to access our middleware services on-demand from their local environment.
The challenge that we are trying to address with this new initiative is the complexity and efficiency of existing middleware deployments. Deployment and management of large clusters is one of the key complexities of the datacenter. It includes installation, sizing, and continuous management and tuning. As of our 7.0 release we were the first to introduce an API that enables users to interact with every bit of our cluster. This API enables us to automate all of those steps by automatically detecting the available resources, deploying the relevant service/s on those resources and scaling them out or down based on the workload. In a multi-tenant environment users will be able to choose the right balance between resource sharing and isolation requirements.
In addition, offering these middleware services on demand will provide users with the choice to use only the relevant services offered by GigaSpaces XAP, e.g. our in-memory data grid or messaging services, and integrate these with their existing application platforms.
Bottom line, simplicity and ease of use for enterprise users, and shorter time to market for ISVs looking to run their SaaS applications on a multi-tenant environment. Both are expected to experience a fairly significant cost efficiency.
DZone: How will I be able to scale out my JPA and REST-based applications using this new offering?
Nati: JPA provides a standard API for data management. Currently most of the JPA implementations rely on a centralized database that are limited in scale. We intent to provide a JPA implementations that is implemented ontop of a distributed data store. In other words it gives the benefit of the NOSQL alternatives without the cost of re-writing your entire data model.
The REST interface will enable web based application as well as non Java/.Net application to access our services directly from the browser.
DZone: Many assume that NOSQL == lack of transaction support. How true is this?
Nati: As I argued earlier this argument is misleading. A large scalable environment requires a different way for handling transaction semantics then the traditional 2PC (two phase commit) model. This doesn't mean that those alternatives lack a transaction support it just means that transaction needs to treated differently. In a centralized environment 2PC transaction model ensure consistency through a central transaction coordination server. It is clear that this is not going to fit with many of the distributed data management alternatives.
In an earlier post, “Lessons from Pat Helland: Life Beyond Distributed Transactions,” Pat Helland suggested an alternative model to distribute transactions, a workflow model. Instead of executing a long transaction and blocking until everything is committed, break the operation into small individual steps (Transactions) where each step can fail or succeed individually. By breaking the transaction into small steps, it is easier to ensure that each step can be resolved within a single partition, thus avoiding all the network overhead associated with coordinating a transaction across multiple partitions.