DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
View Events Video Library
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Integrating PostgreSQL Databases with ANF: Join this workshop to learn how to create a PostgreSQL server using Instaclustr’s managed service

Mobile Database Essentials: Assess data needs, storage requirements, and more when leveraging databases for cloud and edge applications.

Monitoring and Observability for LLMs: Datadog and Google Cloud discuss how to achieve optimal AI model performance.

Automated Testing: The latest on architecture, TDD, and the benefits of AI and low-code tools.

Related

  • How to Integrate a Distributed Database With Event Streaming
  • Working With Geospatial Data in Redis
  • Introduction to Data Replication With MariaDB Using Docker Containers
  • Create a Multi-tenancy Application In Nest.js - Part 1

Trending

  • REST vs. Message Brokers: Choosing the Right Communication
  • Mastering Persistence: Why the Persistence Layer Is Crucial for Modern Java Applications
  • Auditing Spring Boot Using JPA, Hibernate, and Spring Data JPA
  • Getting Started With Prometheus Workshop: Instrumenting Applications
  1. DZone
  2. Software Design and Architecture
  3. Microservices
  4. Is Your Database Wasting the Ephemeral Drive?

Is Your Database Wasting the Ephemeral Drive?

It's important to avoid wasting your ephemeral drive, as this can become very costly. Learn how exactly to use Redis Enterprise Architecture to avoid these costs.

Cihan B. user avatar by
Cihan B.
·
Jul. 12, 17 · Tutorial
Like (4)
Save
Tweet
Share
5.28K Views

Join the DZone community and get the full member experience.

Join For Free

If you are running in a VM or a container, you get the following types of storage:

  • Network-attached durable storage. Even if your VM or container moves from one physical host to another, your drive is guaranteed to follow without losing committed data. This is typically what all databases use for data on disk. A downside is that it's network-attached and thus, a regular disk write is a network plus disk write.

  • Local ephemeral storage.This drive is local but it is "reset" every time the VM or container moves to another physical host. Thus, beyond some limited use, many disk-based databases waste this space. Redis with Redis Enterprise Flash has a unique way of utilizing the ephemeral drive instead of wasting it. Let's explore the architecture.  

Redis and Redis Enterprise Architecture

Redis is a high-speed, in-memory database with a "structures"-based data model that lets you express your problem to the database with great flexibility (read more at Redis data types). Redis Enterprise is a distributed, highly available, and scalable database platform that is built on Redis. Redis Enterprise is 100% compatible with Redis, so just change your connection string and you are good to go.

Redis Enterprise scales Redis in a few ways. You can find those details here. I'll focus on how Redis Enterprise uses the local ephemeral drive to extend RAM to scale data size. 

With Redis Enterprise Flash, Redis database allocates a memory quota that spans RAM plus the local ephemeral drive. Values stored in the database are spread over RAM and the ephemeral drive. Redis Enterprise Flash smartly places frequently accessed values in RAM and less frequently accessed values in the ephemeral drive. Obviously, to match RAM speeds, it is best to use SSD-based instances and ephemeral drives. 

Redis Enterprise Flash is built for the ephemeral drive. If the VM and container move and the drive is an ephemeral drive reset, all is well. Redis Enterprise maintains a durable copy on-disk and a replicated copy on another node, and it can simply re-populate from these sources.

It is easy to setup a Redis Enterprise Cluster. You can find the instructions using Docker on macOS or Windows. On the last step, instead of creating a regular database, you can create a Flash-based database. 

Image title

In my case, I am using a MacBook to run a three-node Redis Enterprise Pack cluster. I configured a simple 10GB quota with 1GB of RAM plus 9GB on the ephemeral drive — meaning I am only consuming 1GB RAM to store 10GB total data in Redis.

Image title

As I populate data (in this case, I have 1K value size), I end up with data first in RAM. As I run out of the first GB, I get additional data pushed to the ephemeral drive. In the picture below, the left-side stat shows the number of values in RAM and the right-side stat is the total count of values in flash over a minute.

Image title

For transparency, I use the memtier benchmark tool to run the data load  with the following arguments:

./memtier_benchmark --pipeline=100 -n allkeys --ratio=1:0 --data-size=1024 
--key-prefix A --key-minimum=1 --key-maximum=3000000 --key-pattern P:P 
-c 2 -t 2 -h 10.0.0.2 -p 12000

Many workloads we look at here at Redis Labs have a pattern that shows that not all keys and values get accessed with the same frequency. Most of them exhibit a "hot working set" that represents the more frequently accessed portion of data among all data. As it turns out, keeping the RAM-to-Flash ratio so that your "hot working set" fits in RAM can provide the best latency characteristics when it comes to using Flash. To help detect this ratio, Redis Enterprise Flash provides another stat: RAM hit ratio. This hit ratio represents the percentage of times the value accessed was found in RAM. This stat is similar to the buffer cache hit ratio you may be familiar with in disk-based databases. Keeping the value high keep latencies lower. Over time, however, the working set may change. 

With the following graph, you can see the RAM hit ratio and latency. Please ignore the latency value simply because the test was run on a laptop that is overbooking CPU, running all three nodes and load generation under heavy paging. The general idea is there, however. The graph shows how easy it is to adjust the RAM-to-Flash ratio so you can get to lower latencies by simply allocating more RAM to your database.Image title

The memtier benchmark options looked like this:

./memtier_benchmark --pipeline=100 -n allkeys --ratio=2:8 --data-size=1024 
--key-prefix A --key-minimum=1000000 --key-pattern G:G --key-maximum=2000000 
--key-stddev=180000 --distinct-client-seed --randomize -c 2 -t 2 -x 10

In the picture above, I first ran a steady workload over a set of keys that achieved roughly an 85% RAM-hit ratio. However, I want lower latencies. On the left graph, you see the rise in RAM-hit ratio. That's the point where I change my RAM size from 1GB to 2GBs. With the additional RAM, over time, more values were moved into RAM. To change the ratio, I simply change the database slider from 10% to 20% on the database configuration page in the UI. It takes a few seconds to settle but it is easy to see the trend in latencies falling as the RAM-hit ratio increases — with no downtime required!

There are a few other important reasons why the RAM plus ephemeral drive approach works well. The ephemeral drive contains only values that do not fit in RAM. That means that if I get hot values and keys that get repeated writes, there is no repeated IO to record these updates on the ephemeral drive. This reduces the IO to the drive and saves the IO bandwidth of the ephemeral drive for real RAM faults. Databases, in durable writes, maintain WAL (write ahead logs) or redo logs to protect against data loss. This causes write amplification, meaning that each value write ends up producing many additional writes as you maintain more structures like WALs. However, Redis Enterprise Flash does not suffer from this type of write amplification and does not need to maintain WAL. 

Obviously, wasting your ephemeral drive is costly. In fact, it is 80% cheaper to use Redis Enterprise Flash in infrastructure costs (detailed cost comparisons can be found here).

Database Redis (company) Data (computing) Docker (software)

Opinions expressed by DZone contributors are their own.

Related

  • How to Integrate a Distributed Database With Event Streaming
  • Working With Geospatial Data in Redis
  • Introduction to Data Replication With MariaDB Using Docker Containers
  • Create a Multi-tenancy Application In Nest.js - Part 1

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: