Apache Cassandra and Apache Ignite: Selecting the Right Distributed Database Solution
Apache Cassandra and Apache Ignite: Selecting the Right Distributed Database Solution
Sometimes Apache Cassandra and Apache Ignite complement each other. But sometimes, a single solution is better.
Join the DZone community and get the full member experience.Join For Free
Built by the engineers behind Netezza and the technology behind Amazon Redshift, AnzoGraph™ is a native, Massively Parallel Processing (MPP) distributed Graph OLAP (GOLAP) database that executes queries more than 100x faster than other vendors.
Apache® Cassandra™ is a popular open-source, distributed, key-value store columnar NoSQL database used by companies such as Netflix, eBay, and Expedia for key parts of their business. For Apache Cassandra users that need ad-hoc SQL query capabilities but are otherwise happy with their database choice, Apache® Ignite™ can enhance the performance of Apache Cassandra. For Apache Cassandra users or companies considering Cassandra that find it does not have the read speed or SQL capabilities they need in a distributed key-value store database, Apache Ignite can be a powerful alternative to Apache Cassandra.
Apache Cassandra Benefits and Limitations
The features that make Apache Cassandra so appealing include:
- Fully distributed, peer-to-peer architecture. Apache Cassandra has no single point of failure, so it is well-suited for high-availability applications. It supports multi-datacenter replication, allowing, for example, organizations to store data in the cloud across multiple Amazon Web Services (AWS) availability zones for greater resiliency.
- Massive and linear scalability. Any number of nodes can be added to (or removed from) any Cassandra cluster in any datacenter, enabling users to reliably store ever-growing amounts of structured and unstructured data.
- Flexible modeling. Apache Cassandra combines distributed systems technologies from the Amazon Dynamo key-value store and Google’s BigTable column-based data model, making it possible to model complex data structures that would be difficult to model in traditional relational databases.
- Tunable consistency. Users can configure replication to balance speed and reliability.
- Open source community. Apache Cassandra users benefit from a large and active community that continues to refine the solution and provides community-based support through a number of websites.
Common uses for Apache Cassandra include storing and analyzing sequentially captured measurements from sensors and application logs and storing key-value data with high availability. This can be very beneficial for use cases such as web-scale applications or IT monitoring, which require counts of high-velocity data.
While powerful, Apache Cassandra has some limitations:
- It is disk-based, which ultimately limits the speed of some operations because data needs to be written to and read from disks.
- It does not support ANSI-99 SQL, so running ad-hoc SQL queries is problematic.
- Apache Cassandra is “eventually consistent,” so transactional data may be lost: a challenge for applications involving high-value transactions.
Benefits of Using Apache Ignite With Apache Cassandra
Apache Ignite can be deployed as an in-memory computing layer between an organization’s existing data and application layers. An Ignite cluster can be inserted between Apache Cassandra and an existing application layer, providing support for ANSI-99 SQL and ACID transactions for the portion of data held in the Ignite server cluster.
Ad Hoc SQL Queries
Apache Ignite is powered by an ANSI SQL-99-compliant engine, which offers SQL and indexing of the data held in the Ignite cluster. Using the Ignite ODBC/JDBC API, standard SQL commands can be sent to Apache Ignite. Ad-hoc SQL queries can be run on the Cassandra data held in Apache Ignite, providing high-performance, highly flexible SQL for data stored in Apache Cassandra.
Apache Ignite offers user-definable transaction guarantees for distributed transactions, which can be adjusted from eventual consistency to strong consistency for ACID transactions. Apache Ignite will write any changes to the in-memory dataset back to Apache Cassandra to maintain consistency between the Cassandra and Ignite datasets.
No Data Remodeling
Adding Apache Ignite does not require the data in an existing Cassandra database to be modified. Apache Ignite can read from Cassandra and other NoSQL databases just as well as it does relational databases. There is also no need to modify the schema, which will migrate directly into Apache Ignite as-is.
Apache Ignite requires no “rip-and-replace,” so it is a convenient solution for organizations with a relational database considering a move to Apache Cassandra that are concerned about having to redo their data model to match Cassandra’s requirements. Instead of remodeling the data for a move directly to Apache Cassandra, an organization can use Apache Ignite on the relational database, change the application to interface with Apache Ignite, and then migrate the relational database to Apache Cassandra. The application will see no difference between the original relational database and Apache Cassandra if it goes through Apache Ignite.
Apache Ignite works equally well with NoSQL, RDBMS, and Apache® Hadoop® data stores, so Apache Ignite can be used to speed them up and scale them out as well. Apache Ignite can also be used with Apache® Spark™, and the Ignite file system can be used to pin resilient distributed datasets (RDDs) or DataFrames into memory to make Spark RDDs mutable and to allow the sharing of state between Spark jobs.
A Mature Codebase
While Apache Ignite is fairly new to the Apache Software Foundation (ASF), it has a very mature codebase. It originated as a private project in 2007 and was donated to ASF in 2014. Ignite graduated to a top-level project in about a year: the second-fastest Apache project to graduate (after Apache Spark). Apache Ignite has an active worldwide community and includes over one million lines of code with a robust feature set.
Integrating the Solutions
Architecturally, integrating Apache Ignite with Apache Cassandra is straightforward. Apache Cassandra users typically have some type of application that reads and writes out of the Cassandra cluster (possibly with Apache® Kafka™ or other clients). Apache Ignite slides between Apache Cassandra and the application and integrates using the Cassandra connector in Apache Ignite. The application then no longer reads and writes out of Apache Cassandra. Instead, it reads and writes out of Apache Ignite, so it is accessing data in memory instead of on disk. Apache Ignite handles the reads and writes out of Apache Cassandra.
When a Single Alternative Is Better
While combining Apache Ignite and Apache Cassandra creates a powerful solution, it is not always the best solution for specific use cases. For example, for new applications that require the ability to run ad-hoc SQL queries, Apache Ignite includes an in-memory database that functions as a stand-alone, distributed, in-memory RDBMS with support for ACID transactions and ANSI-99 SQL including DML and DDL. As noted above, the ANSI SQL-99-compliant engine provides in-memory SQL with indexing, enabling ad-hoc SQL queries at in-memory computing speed. Apache Ignite also offers strong consistency for ACID transactions, making it appropriate for applications where capturing absolutely every transaction is a requirement.
For organizations with applications that have hit upon the ad-hoc SQL capabilities or eventual consistency limitations of Apache Cassandra, they may find that migrating to Apache Ignite (instead of adding it) provides the database capabilities and in-memory performance they need while maintaining a simple, two-tier application/database infrastructure.
Apache Ignite is also approximately 3-6x faster than Apache Cassandra for read-intensive applications, while Apache Cassandra offers superior write performance. When the driving requirement of an application or SLA will be write performance under a heavy load — and ad-hoc SQL queries and ACID transactions will not become an issue — Apache Cassandra may still be the best option for a standalone solution. However, for any mission-critical application requiring high read or mixed performance, a standalone Apache Ignite deployment will likely be the best choice.
Organizations that are using Apache Cassandra — or considering it — but who are concerned about meeting the performance demands of extreme OLTP and OLAP workloads of today’s web-scale applications should consider taking advantage of the Apache Ignite in-memory computing platform. Combining the two solutions will allow applications to access data in memory instead of on disk, an approach that is 1,000x faster than disk-based approaches. Adding Apache Ignite to Apache Cassandra maintains Cassandra’s high availability and horizontal scalability while also providing several additional benefits, including more flexible ANSI SQL-99-compliant ad-hoc query capabilities and more robust consistency. All this is achieved without the need to remodel the data. But in use cases where ad-hoc SQL queries, strong consistency, or maximum performance for primarily read or mixed read/write are of paramount importance, organizations may find they are better served by either replacing Apache Cassandra with Apache Ignite or selecting Ignite instead of Cassandra at the beginning of the project.
Opinions expressed by DZone contributors are their own.