DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
View Events Video Library
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Integrating PostgreSQL Databases with ANF: Join this workshop to learn how to create a PostgreSQL server using Instaclustr’s managed service

Mobile Database Essentials: Assess data needs, storage requirements, and more when leveraging databases for cloud and edge applications.

Monitoring and Observability for LLMs: Datadog and Google Cloud discuss how to achieve optimal AI model performance.

Automated Testing: The latest on architecture, TDD, and the benefits of AI and low-code tools.

Related

  • Database Monitoring: Key Metrics and Considerations
  • Resolving Log Corruption Detected During Database Backup in SQL Server
  • Build Quicker With Zipper: Building a Ping Pong Ranking App Using TypeScript Functions
  • A Better Web3 Experience: Account Abstraction From Flow (Part 2)

Trending

  • How To Validate Archives and Identify Invalid Documents in Java
  • What Is Good Database Design?
  • The Convergence of Testing and Observability
  • Microservices With Apache Camel and Quarkus (Part 5)
  1. DZone
  2. Data Engineering
  3. Databases
  4. Cost accounting for SSDs – it’s RAM, not disk

Cost accounting for SSDs – it’s RAM, not disk

John Piekos user avatar by
John Piekos
·
Jul. 11, 14 · Interview
Like (0)
Save
Tweet
Share
6.47K Views

Join the DZone community and get the full member experience.

Join For Free

Originally written by Ariel Weisberg.

Most discussions I have seen about choosing SSDs vs. spinning disk arrays for databases tend to focus on SSDs as a replacement for disk. SSDs don’t replace disk; they replace the RAM you would be using to cache enough disk pages to make up for the terrible random IO performance of spinning disk arrays.

When you add a new disk to a disk array you get hundreds of IOPs, max. When you add an SSD to a RAID array – or better yet, packaged along with a new node in your cluster – you get hundreds of thousands more IOPs. For workloads where you will never fit everything in RAM scaling IOPs matters, because there isn’t a power in the verse that will make your random reads sequential.

For workloads that are focused on sequential IO performance the delta in performance between SSDs and spinning disks is not as great, and spinning disks are an excellent source of sequential IO and raw capacity. For workloads that stress random IO end up being more cost effective in several dimensions.

Benchmarking the distinction

To demonstrate the difference between spinning disk and SSDs I put together a quick benchmark for random read performance. You can find the benchmark on github along with a description, and results for a 7.2k 3 terabyte disk and a Crucial m4 128 gigabyte SSD. The results are also available as a spreadsheet.

Hardware

I tested on two different desktops, one with the SSD and one with the disk. The desktops have different CPUs, which shows up as a difference in throughput for the in-memory data sets. For the larger-than-memory data sets that are the focus of this discussion, performance is dominated by available IOPs (as you will see in the graphs). Both desktops had 16 gigabytes of RAM and were running Ubuntu 12.04 with EXT4 mounted with noatime,nodiratime.

Read-ahead was disabled on the SSD using “blockdev –setra 0” and “hdparm -A0” in order to get the full 33k IOPS the device can deliver. Read-ahead was not disabled for the disk and was left at the default of 128 kilobytes.

The benchmark

The benchmark consists of 1k aligned reads from a portion of a pre-allocated file. The access distribution is scrambled Zipfian. The output of a Zipfian distribution is hashed using FNV hash to force hot and cold values to be stored on the same page, as many real world databases would in real workloads. Reads are issued by a thread pool containing eight threads.

The pre-allocated file is accessed by memory-mapping. Even though reads are 1k, actual IOs are 4k, matching the page size of the page cache. To test different dataset sizes a prefix of the pre-allocated file is used for each benchmark.

The benchmark is run with a 600 second warm-up followed by a 600 second measurement period. Only one run of each configuration is presented. I observed that 600 seconds was sufficient to warm-up the cache, even for the spinning disk. Caches are dropped between runs using “echo 1 > /proc/sys/vm/drop_caches”.

Results

SSD vs. Disk for random 1k reads

Performance is identical for in-memory data sets modulo the difference in CPU performance. I truncated the vertical access to make the differences between-larger-than memory workloads more visible. In-memory performance was several million operations a second, telling the story of why you don’t build in-memory databases the same way you build larger-than memory-databases.

Once even a small slice of the dataset is no longer in memory, at 16 gigabytes, performance drops sharply. Once 50% of the dataset is not in memory, at 24 gigabytes, performance drops again by 3x for the SSD.

Throughput for the disk drops to near zero as soon as things don’t fit in memory since the device can only sustain 250 IOPs a second. The long tail of IOs in a Zipfian distribution rapidly consumes all available IOPs, exhausting the IO thread pool as threads block waiting for pages to come in.

SSD vs. Disk Random 1k reads - RAM

With log scale you can see the performance of the disk.

Picture 3

Focusing on the performance of real larger-than-memory datasets, you can see the extra IOPs of the SSD allow it to hit well above its weight class in terms of operations performed. The SSD can only do 33k random reads a second, but with caching the workload manages to perform 4.45x to 3.15x the number of supported IOPs for 2x and 4x larger-than-memory workloads, respectively.

The spinning disk also hits above its weight class, but the multiplier is 2.12x to 1.5x. More critically, the throughput provided does not reach the threshold of what I would call useful.

Why this matters for databases

Databases typically have to cache two discrete types of data, indexes and values. A read will have to touch many index pages for each retrieval, but only one page to retrieve a row/document/column. If indexes fit in memory, exactly one IO will be consumed retrieving a value.

For many workloads indexes are an order of 10x smaller than values; thus, if you paid for the RAM to store your indexes and SSDs to store your values, you can support as many retrievals as you have IOPs available. If you commit additional RAM to caching values you can have performance that exceeds the number of IOPs available.

If your workloads are friendlier to caching then the scrambled Zipfian distribution used here, the potential gains are greater: you will be closer to in-memory performance because available IOPs will not be consumed as quickly.

Conclusion

Recognize what you need to be optimizing for when picking storage. If you have to bring in extra RAM, nodes, rack space, and power to get away with using disk arrays, that needs to be accounted for. You will be hit with a double whammy – not only are you scaling IOPs, you are also scaling enough RAM to make up for the IOPs. Factor in hidden costs like power consumption that are up-front costs with SSDs.

SSDs present their own challenges. Write amplification is a factor that must be accounted for when choosing which data structure to use. The additional sequential IO provided by SSDs makes log-structured data structures more attractive, especially if it means you can use less-expensive, lower-write-endurance SSDs. These data structures sometimes have their own warts in terms of inconsistent performance over time; this is an area where I still see room for improvement.

Remember: SSDs don’t replace disk; they replace RAM, offering a way to avoid the poor random IO performance of spinning disk. IOPs matter, and when you have larger-than-memory datasets, SSDs can provide a way to improve IOPs[AD1]  without adding RAM. Make the right choice for your workload by balancing IOPs, direct and hidden costs of SSDs, and disk performance metrics. Let me know your thoughts, here or on Twitter at @VoltDB.

Originally written by Ariel Weisberg.

Database

Published at DZone with permission of John Piekos, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Database Monitoring: Key Metrics and Considerations
  • Resolving Log Corruption Detected During Database Backup in SQL Server
  • Build Quicker With Zipper: Building a Ping Pong Ranking App Using TypeScript Functions
  • A Better Web3 Experience: Account Abstraction From Flow (Part 2)

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: