Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Optimizing a Seven-Year-Old Disk Test Machine

DZone's Guide to

Optimizing a Seven-Year-Old Disk Test Machine

We had a major regression in performance on Linux — major as in 75% slower than what it used to be a few weeks ago. So what did we do?

· Performance Zone
Free Resource

Discover 50 of the latest mobile performance statistics with the Ultimate Guide to Digital Experience Monitoring, brought to you in partnership with Catchpoint.

We are testing RavenDB on a wide variety of software and hardware, and a few weeks ago, one of our guys came to me with a grave concern. We had a major regression in performance on Linux — major as in 75% slower than what it used to be a few weeks ago.

Testing at that point showed that indeed, there is a big performance gap between the same benchmark on that Linux machine and a comparable machine running Windows. That was worrying and it took us a while to figure out what was going on. The problem was that we previously had that exact same scenario. The I/O patterns that are most suitable for Linux are pretty bad for Windows, and vice versa, so optimizing for each requires a delicate hand. The expectation was that we did something that overloaded the system somehow and caused major regression.

A major discovery was that it wasn’t Linux per se that was slow. Testing the same thing on a significantly smaller machine showed much better performance. We still had to rule out a bunch of other things, such as specific settings/behaviors that we would trigger on that particular machine, but it seemed promising. And that was the point when we looked at the hardware. That particular Linux machine is an old development machine that has gone through several developer upgrade cycles, and when it was rebuilt, we used the most easily available disk that we had on hand.

That turned out to be a Crucial SSD 128GB M22 disk. To those of you who don’t keep a catalog of all hard disks and their numbers, there is Google, which will tell you that this has been out for nearly a decade, and that particular disk has been shuffling bits in our offices for about seven years or so. In its life, it has been subject to literally thousands of database benchmarks, reading and writing very large amounts of data.

I’m frankly shocked that it is still working, and it is likely that there is a lot of internal error correction that is going on. But the end result is that it predictably generates very unpredictable I/O patterns, and it is a great machine to test what happens when things start to fail in a very ungraceful manner (a write to the local disk that takes five seconds but also blocks all other I/O operations in the system, for example).

I’m aware of things like NBD and Trickle, but it was a lot more fun to discover that we can just run stuff on that particular machine and find out what happens when a lot of our assumptions are broken.

Is your APM strategy broken? This ebook explores the latest in Gartner research to help you learn how to close the end-user experience gap in APM, brought to you in partnership with Catchpoint.

Topics:
performance ,optimizations ,ravendb ,regression

Published at DZone with permission of Oren Eini, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}