Database storage is expensive. This is especially true if you build a traditional SAN based M+N cluster. The cost of the storage array, fiber channel switches, fiber channel interfaces, drives the cost per terabyte into the thousands quite easily. And while storage costs in general are plummeting, SAN storage costs are falling at a slower rate, widening the gap between SAN and direct attached storage. Given the cost of SAN storage, it would be unfortunate to waste it which is what we discovered we were doing.
Our platform makes a lot of 3rd party service calls. Several of these are very complex conversational interfaces that generate a lot of text. In order for customer service to trouble shoot customer issues we retain these API interactions. Storing this 3rd party API request/response text was implemented from the beginning within our platform. At that time, the logical place to save this data was in database CLOBs. When we recently analyzed our SAN storage, we discovered that 40% of it was consumed by these API logs. Clearly there was an opportunity to save costs with a lower cost solution.
We looked at alternatives for managing LOBs and we settled on Cassandra for a few reasons:
- Provides for reliable storage on commodity hardware
- The shared nothing architecture of Cassandra brings an attractive availability model.
- Eventual consistency wasn't a big concern. This data is stored and not retrieved at least for several hours, if at all.
- The ability to achieve quorum on writes is important when using commodity storage.
- Built-in expirations makes data life cycle management straightforward
- The scale out model allows storage and transactional capacity to be easily added.
- Data center awareness provides for clean multi-data-center deployments
There were a few concepts we wanted to standardize for the storage and management of large data objects. These were:
- Correlate the importance of the data to the business with the cost of storage.
- Ensure that life cycle was applied and LOBs weren't kept longer than meaningful.
- Ensure that data was as space efficient as possible.
The first was a new concept for us. The cost advantage for using Cassandra comes by using commodity hardware with commodity drives. This hardware can and will fail though. So to ensure data cannot be lost, there must be multiple copies. Redundancy comes at a high cost however. For example, if the cost of storage is $150/TB and you keep 6 copies of the data (3 each in 2 data centers) then your protected cost is $900/TB. Reducing redundancy increases the risk of a loss, but some data can afford to be lost or at least afford to be temporarily unavailable. We wanted to be able to trade off data importance against cost. We defined 3 levels of consistency and corresponding replication values for each.
We also require that every LOB is provided with an expiration date. That can be set to never, but by providing a simple way to control a meaningful life of data, we increase the likelihood of it being purged when no longer useful. For example, we can retain travel service debugging information for 6 months after the trip is complete. This is trivial for the caller to set and Cassandra will clean up the data automatically after the expiration.
Another observation was that much of the text wasn't compressed even though it was highly compressible. When we designed the LOB service, the interface included both content type and content transfer encoding. Based on the type and the transfer encoding, we will transparently compress and decompress the data. Our Cassandra storage costs are about 1/3rd the SAN, for the highest replication level, while compression reduces the storage needed by 90% giving us an order of magnitude improvement in storage costs for text LOBs.
Our early usage of the Cassandra based LOB service has produced positive results. Cassandra has proven to be reliable and performant. We have even experienced a hardware failure, during a peak transaction period without impact to the platform. We plan on expanding both the usage of our LOB service as well as Cassandra based on the early results.