Some of us may have already heard the terms Data Grid and Data Fabric, however, neither of these terms has been well defined in the industry. In this blog, I will try to add some clarity to both terms by outlining some main features for data grids and data fabrics.
What is a Data Grid
Often when doing meetup presentations about Apache Ignite, I ask the crowd if anyone has ever heard of what a Data Grid is. I usually get only a few hands. However, when I flip the question and ask what Distributed Caching is, everyone in the room immediately raises their hands and nods in understanding. The reality is that a Data Grid can be viewed as a Distributed Cache with extra features, so if you do know what a Distributed Cache is, you probably already know a lot about Data Grids as well.
Generally, the term distributed cache means ability to replicate data in memory, so it is accessible from anywhere in the cluster. Data Grids usually accomplish this by partitioning data in memory, where each cluster member is responsible only for its own subset of the data. You can also think of it as a distributed Hash Table. This way, the more servers are available in your cluster, the more data you can cache.
Data grids are generally known for having a fairly rich feature set on top of in-memory caches. The 3 main features that are absolutely mandatory for any data grid solution are:
- distributed transactions
- distributed queries
- collocation of compute and data
Without the above 3 features, you cannot really call a product a data grid. Many vendors also differentiate between each other by adding other popular features, including:
- SQL support
- Off-Heap Memory (to avoid lengthy GC pauses)
- WebSession Caching
- Hibernate Integration
- Database Integration
What is an In-Memory Data Fabric
In Memory Data Fabrics represent the natural evolution of in-memory computing. Data Fabrics generally take a broader approach to in memory computing, grouping the whole set of in memory computing use cases into a collection of well-defined independent components. Usually a Data Grid is just one of the components provided by a Data Fabric. Additionally to the data grid functionality, an In-Memory Data Fabric typically also includes a Compute Grid, CEP Streaming, an In-Memory File System, and more.
The main advantage of an In-Memory Data Fabric is that all of the provided in-memory computing components can be used independently, while being well integrated with each other. For example, in Apache Ignite a Compute Grid knows how to load-balance and schedule computations within a cluster, but when used together with a Data Grid, the Compute Grid will also route all the computations that process data to the cluster members responsible for caching that data. The same goes for Streaming and CEP - when working with streamed data, all the processing happens on the cluster members responsible for caching that data as well.
Commonly seen features of In-Memory Data Fabrics include:
- Data Grid (must have for any Data Fabric)
- Compute Grid
- Service Grid
- Streaming & CEP
- Distributed File System
- In-Memory Database
Apache Ignite, an Apache Incubator project, is the only In-Memory Data Fabric available in the Open Source space. GridGain provides a commercial, enterprise edition of Apache Ignite that is targeted toward production, business critical use cases.
Comments