DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • AWS S3 Strategies for Scalable and Secure Data Lake Storage
  • Attribute-Level Governance Using Apache Iceberg Tables
  • Processing Cloud Data With DuckDB And AWS S3
  • Enterprise RAG in Amazon Bedrock: Introduction to KnowledgeBases

Trending

  • The Future of Java and AI: Coding in 2025
  • Developers Beware: Slopsquatting and Vibe Coding Can Increase Risk of AI-Powered Attacks
  • Intro to RAG: Foundations of Retrieval Augmented Generation, Part 1
  • How Can Developers Drive Innovation by Combining IoT and AI?
  1. DZone
  2. Data Engineering
  3. Databases
  4. Improve Performance and Data Availability With Elastic Block Store (EBS)

Improve Performance and Data Availability With Elastic Block Store (EBS)

With a careful selection of Elastic Block Store (EBS) types and clever optimizations, deploying DBaaS on EBS can achieve even better performance.

By 
Chenhao Huang user avatar
Chenhao Huang
·
Bokang Zhang user avatar
Bokang Zhang
·
Jun. 15, 22 · Tutorial
Likes (3)
Comment
Save
Tweet
Share
5.5K Views

Join the DZone community and get the full member experience.

Join For Free

Nowadays, many Database-as-a-Service (DBaaS) solutions separate the computation layer and the storage layer. These include, for example, Amazon Aurora and Google BigQuery. This solution is attractive, as the data storage and data replication can be handled by existing services. DBaaS takes off the need to worry about this complexity; however, the performance of this design sometimes may not be as good as the traditional ways—using a local disk as storage. 

In this article, we show that with a careful selection of Elastic Block Store (EBS) types and clever optimizations, deploying DBaaS on EBS can achieve even better performance than on local disks.

Why Do We Consider EBS in the First Place?

To explain our motivation for using EBS, we’d like to briefly introduce TiDB. TiDB is a MySQL-compatible, distributed database. TiDB Servers are the computation nodes which process SQL requests. The Placement Driver (PD) is the brain for TiDB, which configures load balancing and provides metadata services. TiKV is a row-oriented key-value store that processes transactional queries. TiFlash is a columnar storage extension that handles analytical queries. Going forward, we will take a deep dive into TiKV.

deep dive into TiKV

TiKV provides distributed key-value service. First, it splits the data into several Regions, the smallest data unit for replication and load balancing. To achieve High Availability (HA), each Region is replicated three times and then distributed among different TiKV nodes. The replicas for one Region form a Raft group. Losing one node and thus losing one replica in some Regions is acceptable for TiDB. However, losing two replicas simultaneously causes problems because the majority of members of a Raft group are lost. This makes a Region unavailable; its data can no longer be accessed. Human intervention is needed to address such issues. 

 Human intervention is needed to address such issues

When deploying TiDB Cloud, we have placement rules which guarantee that the replica of a Region will be spread across multiple Availability Zones (AZ). Losing one Availability Zone (AZ) will not have a huge impact on TiDB Cloud. However, with AZ + 1 failure (one Availability Zone and at least one node failure in another Availability Zone), the Region becomes unavailable. We had such a failure in production, and it took a lot of work to bring the TiDB cluster online. To avoid such painful experiences again, EBS comes into our sight.

AWS Elastic Block Store (EBS) is a Block Store service provided by AWS, which can be attached to EC2 instances. The data on EBS, however, are independent of the EC2 instance, so when an EC2 instance fails, the data persists. When an EC2 instance fails, the EBS can be automatically remounted to a working EC2 instance by using Kubernetes. Moreover, EBS volumes are designed for mission-critical systems, so they are replicated within an AZ. This means that EBS is less likely to fail, which gives us extra peace of mind. 

Selecting a Suitable EBS Volume Type

In general, there are four SSD-based EBS volume types: gp2, gp3, io1, and io2. (When we designed and implemented TiDB Cloud, io2 Block Express was still in preview mode, so we didn’t consider it.) The following table summarizes the characteristics of these volume types.

Volume type

Durability (%) Bandwidth (MB/s) IOPS (per GB) cost comments

gp2

99.8-99.9

250

3, burstable

Low

A general purpose volume

gp3

99.8-99.9

125-1,000

3,000- 16,000

Low

A general purpose volume with 

flexible bandwidth

io1

99.8-99.9

Up to 1,000

Up to 64,000

High

High IOPS

io2

99.999

Up to 1,000

up to 64,000

High

High IOPS; the best performance of the group

Now, let’s get our hands dirty and do some performance comparison. Note that in the following figures, the four EBS volume types are attached to the r5b instance, while the measurements on the local disk are conducted on the i3 instance. This is because that r5b instance can only use EBS. We use i3 as a close alternative. Each figure shows the average and 99th percentile latency for all operations.

We’ll start with benchmarking the read and write latency. The first workload is a simple one. It has 1,000 IOPS, and each I/O is 4 KB.  The following two figures show the average and 99-percentile latency.

Write latency in a simple workload with one thread. (Lower numbers are better)

Write latency in a simple workload with one thread. (Lower numbers are better)


Read latency in a simple workload with one thread. (Lower numbers are better)

Read latency in a simple workload with one thread. (Lower numbers are better)


Write latency in a simple workload with eight threads. (Lower numbers are better)

Write latency in a simple workload with eight threads. (Lower numbers are better)



Read latency in a simple workload with eight threads. (Lower numbers are better)

Read latency in a simple workload with eight threads. (Lower numbers are better)

We found that when the background I/O becomes more intense, foreground latency grows, and the latency gap between the local disk and the EBS becomes smaller. See the following figure.

Average operation latency in some comprehensive workloads. (Lower numbers are better)

Average operation latency in some comprehensive workloads. (Lower numbers are better)



Transaction per minute (TPMC) in TPC-C workload. (Higher numbers are better)

Transaction per minute (TPMC) in TPC-C workload. (Higher numbers are better)


Average operation latency (ms) in TPC-C workload. (Lower numbers are better)

Average operation latency (ms) in TPC-C workload. (Lower numbers are better)


99-percentile operation latency (ms) in TPC-C workload. (Lower numbers are better)

99-percentile operation latency (ms) in TPC-C workload. (Lower numbers are better)

Also, in the third figure (99-percentile operation latency in TPC-C workload), when there are 800 threads, the 99-percentile latency with EBS volume type gp2 skyrockets. This is because with gp2, the bandwidth reaches the limit.

To conclude, we chose gp3 as our EBS type. The EBS volume io2 was out of our consideration, as it was not available to r5b instances when we designed and implemented TiDB Cloud. Also, io2 block express was still in preview mode then. The EBS volume io1 has comparable latency with gp2 overall ,and io1 provides a higher bandwidth IOPS limit. However, io1 has extra cost based on provisioned IOPS. The EBS volume gp2 has limited bandwidth and IOPS, which are unconfigurable. This brings extra limitations to TiDB. As a result, we chose gp3.

AWS TiDB Data (computing)

Published at DZone with permission of Chenhao Huang. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • AWS S3 Strategies for Scalable and Secure Data Lake Storage
  • Attribute-Level Governance Using Apache Iceberg Tables
  • Processing Cloud Data With DuckDB And AWS S3
  • Enterprise RAG in Amazon Bedrock: Introduction to KnowledgeBases

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!