Why Shared Storage Hinders Performance in Cassandra
Join the DZone community and get the full member experience.Join For Free
Planet Cassandra receives a lot of questions about Cassandra (as one might guess), and one that keeps coming up, according to Jonathan Lacefield, is why shared storage is not recommended for Cassandra. The short version? Performance suffers, and it introduces a single point of failure. Lacefield's explanation, however, aims to clarify what "performance suffers" really means.
The reason performance can be an issue, he says, is that Cassandra can be pretty demanding when it comes to the underlying disk system:
To put it plainly, performance in Cassandra is directly correlated to the performance of a Cassandra node’s disk system.
This is because of a variety of disk-intensive operations:
The key, though, is that Cassandra's disk-intensive nature is not just about IOPS, but throughput (MB/s). According to Lacefield, heavy throughput is a major problem for shared storage Cassandra implementations. If "bad for performance" doesn't conjure enough images, Lacefield gets specific with it. These are the kinds of problems you might run into:
- Atrocious read performance
- Potential write performance issues
- System instability (nodes appear to go offline and/or are “flapping”)
- Client side failures on read and/or write operations
- Flushwriters are blocked
- Compactions are backed up
- Nodes won’t start
- Repair/Streaming won’t complete
Check out the full article for details on how it all works and some stress tests to quantify it. If you're looking for more information on optimizing and understanding Cassandra's performance, take a look at these:
- Tuning the JVM to Improve Performance in Cassandra
- Cassandra vs. MongoDB: Why Your Database Should be Boring
- Benchmarking Cassandra: The Right & Wrong Way to Do it
Opinions expressed by DZone contributors are their own.