Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Tuning Linux I/O Scheduler for SSDs

DZone's Guide to

Tuning Linux I/O Scheduler for SSDs

· Performance Zone
Free Resource

Discover 50 of the latest mobile performance statistics with the Ultimate Guide to Digital Experience Monitoring, brought to you in partnership with Catchpoint.

This post comes from Michael Rice at the NuoDB Techblog.

Tuning Linux I/O Scheduler for SSDs

Hello Techblog readers!

I'm going to talk about tuning the Linux I/O scheduler to increase throughput and decrease latency on an SSD. I'll also cover another interesting topic about tuning the performance of NuoDB by specifying storage for the archive and journal directories.

Tuning the Linux I/O Scheduler

Linux gives you the option to select the I/O scheduler. The scheduler can be changed without rebooting, too! You may be asking at this point, "why would I ever want to change the I/O scheduler?" Changing the scheduler makes sense when the overhead of optimizing the I/O (re-ordering I/O requests) is unnecessary and expensive. This setting should be fine-tuned per storage device. The best setting for an SSD will not be a good setting for an HDD.

The current I/O scheduler can be viewed by typing the following command:

mrice@host:~$ cat /sys/block/sda/queue/scheduler
noop anticipatory deadline [cfq]

The current I/O scheduler (in brackets) for /dev/sda on this machine is CFQ, Completely Fair Queuing. This setting became the default in kernel 2.6.18 and it works well for HDDs. However, SSDs don't have rotating platters or magnetic heads. The I/O optimization algorithms in CFQ don't apply to SSDs. For an SSD, the NOOP I/O scheduler can reduce I/O latency and increase throughput as well as eliminate the CPU time spent re-ordering I/O requests. This scheduler typically works well on SANs, SSDs, Virtual Machines, and even fancy Fusion I/O cards.  At this point you're probably thinking "OK, I'm sold! How do I change the scheduler already?"  You can use the echo command as shown below:

mrice@host:~$ echo noop | sudo tee /sys/block/sda/queue/scheduler

To see the change, just cat the scheduler again.

mrice@host:~$ cat /sys/block/sda/queue/scheduler
[noop] anticipatory deadline cfq 

Notice that noop is now selected in brackets. This change is only temporary and will reset back to the default scheduler, CFQ in this case, when the machine reboots. You need to edit the Grub configuration in order to keep the setting permanently. However, this will change the I/O scheduler for all block devices. The problem is that NOOP is not a good choice for HDD. I'd only permanently change the setting if the machine only has SSDs. 

On Grub 2:

Edit: /etc/default/grub
Add "elevator=noop" to the GRUB_CMDLINE_LINUX_DEFAULT line.
sudo update-grub

At this point, you've changed the I/O scheduler to NOOP. How do you know if it made a difference? You could run a benchmark against it and compare the numbers (just remember to flush the file-system cache). The other way is to take a look at the output from iostat. I/O requests spend less time in the queue with the NOOP I/O scheduler. This can be seen with in the "await" field from iostat. Here's an example of a larger write operation with NOOP.

iostat -x 1 /dev/sda

Device: sda
rrqm/s 0.00
wrqm/s 143819.00
r/s 6.00
w/s 2881.00
rkB/s 24.00
wkB/s 586800.00
avgrq-sz 406.53
avgqu-sz 0.94
await 0.33
r_await 3.33
w_await 0.32
svctm 0.11
%util 31.00

Tuning NuoDB Performance

Now that you've learned about the NOOP I/O scheduler, I'll talk about tuning NuoDB with an SSD. If you've read the tech blogs you'll know that there are two building blocks for a NuoDB database: the Transaction Engine, TE for short, and the Storage Manager, SM.  The TE is an in-memory only copy of the database (actually a portion of the database). As a result, an SSD won't help the performance of a TE because, it doesn't store atoms to disk. The SM contains two modules that write to disk: the archive and the journal. The archive stores atoms to disk when archive configuration parameter points to a file system (versus HDFS and S3). The journal, on the other hand, synchronously writes messages to disk. If you read the blog post on durability, you may remember that the "Remote Commit with Journaling" setting provides the highest level of durability but at the cost of slower speed. Using an SSD in this situation can drastically improve performance.

To tune this setting, we'll need to make a nuodb directory on the SSD:

mkdir -p /ssd/nuodb

The SSD in this example has the mount point /ssd on this machine. Easy, right? I'm assuming you've already set the Linux I/O scheduler for the SSD to NOOP. The next step is to configure NuoDB to use this path when creating the journal directory. The journal has a direct correlation on the transaction throughput because the journal has to finish the disk write before an ACK for the transaction commit is sent by the SM back to the TE. What about the archive? The archive is decoupled from the transaction commit, the atoms will remain in memory and gradually make their way to disk. This will have very little effect on the TPS of the database. As a result, the archive directory can be placed on just a regular HDD. The quick guide for tuning performance is to put the journal on an SSD and archive can be placed on an HDD.

Here are the commands:

nuodbmgr --broker localhost --password bird
nuodb [domain] > start process sm
Database: hockey
Host: s1
Process command-line options: --journal enable --journal-dir /ssd/nuodb/demo-journal
Archive directory: /var/opt/nuodb/demo-archives
Initialize archive: true
Started: [SM] s1/127.0.0.1:37880 [ pid = 25036 ] ACTIVE

nuodb [domain/hockey] > start process te
Host: t1
Process command-line options: --dba-user dba --dba-password goalie
Started: [TE] t1/127.0.0.1:48315 [ pid = 25052 ] ACTIVE

The important point with this tuned configuration is that only the journal is on an SSD. As a result, the SSD doesn't have to be one of these massive TB drives. The costs of SSDs have significantly dropped in price and a single 128 GB or 256 GB SSD would be adequate for the journal data. This configuration should rival the local commit performance which is wicked awesome considering it is the highest durability level! I encourage you to try it out and ask some questions.



Is your APM strategy broken? This ebook explores the latest in Gartner research to help you learn how to close the end-user experience gap in APM, brought to you in partnership with Catchpoint.

Topics:

Published at DZone with permission of Seth Proctor, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}