Real-World SSD Wearout
Real-World SSD Wearout
Learn about the wearout rate for servers and methods for monitoring SSD wearout to prevent performance issues.
Join the DZone community and get the full member experience.Join For Free
A year ago, we added the SMART metrics collection to our monitoring agent that collects disk drive attributes on clients servers. Here a couple of interesting cases from the real world.
Because we needed it to work without installing any additional software, like Smartmontools, we implemented collection only the basic attributes, not vendor-specific ones, to be able to provide a consistent experience. That way, we also skipped the burdensome task of maintaining a knowledge base of specific stuff — and I like that a lot.
This time we’ll discuss only SMART attribute called the “media wearout indicator.” It shows the a percentage of write resource left in the device. Under the hood, it keeps track of the number of cycles the NAND media has undergone, and the percentage is calculated against the maximum number of cycles for that device. The normalized value declines linearly from 100 to 1 as the average erase cycle count increases from 0.
Are There Actually Any Dead SSDs?
Though SSDs are pretty common nowadays, just a couple of years ago, you could hear a lot of fear talk about SSD wearout. We wanted to see if some of it were true, so we searched for the maximum wearout across all the devices of all of our clients. It was just 1%.
Reading the docs says it just won’t go below 1%. So, this SSD was worn out.
We notified this client. Turns out it was a dedicated server in Hetzner. Their support replaced the device:
Do SSDs Die Fast?
As we introduced SMART monitoring for some of the clients some time ago, we have accumulated history, and now we can see it on a timeline.
The server with the highest wearout rate across our clients' servers was unfortunately added to okmeter.io monitoring only two months ago:
This chart indicates that during these two months only, it burned through 8% of its write resource. 100% of this SSD's lifetime under that load will end in 100/(8/2) = 2 years.
Is that a lot or too little? I don’t know, but let’s check what kind of load it’s serving.
As you can see, it’s
ceph doing all the disk writes, but it’s not doing these writes for itself — it’s a storage system for some application. This particular environment was running under Kubernetes, so let’s sneak a peek at what’s running inside:
It’s Redis! Though you might’ve noticed divergence in values with the previous chart, values here are 2 times lower (probably due to ceph’s data replication), and the load profile is the same, so we conclude that it’s Redis after all.
Let’s see what Redis is doing:
On average, it's less than 100 write commands per second. As you might know, there are two ways Redis makes actual writes to disk:
- RDB — periodically snapshots all the dataset to the disk, and
- AOF — writes a log of all the changes.
It’s obvious that here we saw RDB with 1-minute dumps:
SSD + RAID
We see that there are three common patterns of server storage system setup with SSDs:
- Two SSDs in a RAID-1 that holds everything there is.
- Some HDDs + SSDs in a RAID-10 — we see that setup a lot on traditional RDBMS servers: OS, WAL and some “cold” data on HDD, while SSD array holds the hottest data.
- Just a bunch of SSDs (JBOD) for some NoSQL like Apache Cassandra.
In the first case with RAID-1, writes go to both disks symmetrically, and wearout happens at the same rate:
Looking for some anomalies, we found one server where it was completely different:
Checking mount options to understand this didn’t produce much insight — all the partitions were RAID-1
But looking for per-device IO metrics we see, again, there’s a difference between two disks, and
/dev/sda gets more bytes written:
Turns out there’s swap configured on one of the
/dev/sda partitions, and pretty decent swap IO on this server:
SSD Wearout and PostgreSQL
This journey began with me looking to check SSD wearout with different Postgres write load profiles. But, not much luck — all of our client’s Postgres databases, with at least somewhat high write load, are configured pretty carefully — writes go mostly to HDDs.
I found one pretty interesting case nevertheless:
We see these two SSDs in a RAID-1 wore out 4% during 3 months. But checking if it’s high amount of WAL writes turned out to be wrong — it’s only less than 100Kb/s:
I figured that probably Postgres generates writes in some other way, and it is indeed — constant temp file writes all the time:
Thanks to Postgres's elaborate internal statistics and okmeter.io’s rich support for it, we easily spotted the root cause of that:
It was a
SELECT query generating all that load and wearout!
SELECT’s in Postgres can sometimes generate even non-temp files, but real writes. Read about it here.
- Redis+RDB generates a ton of disk writes and it depends not on the number of changes in the Redis database, but on DB size and dump frequency. RDB seems to produce the maximum Write Amplification from all storages known to me.
- Actively used SWAP on SSD is probably a bad idea. Unless you want to add some jitter to RAID-1 SSDs wearout.
- It’s not only WAL and data files that dominate disk writes. Bad database design or access patterns might produce a lot of temp file writes.
That’s all for today. Be aware of your SSDs' wearout!
Published at DZone with permission of Pavel T . See the original article here.
Opinions expressed by DZone contributors are their own.