Tuning Linux I/O Scheduler for SSDs
Hello Techblog readers!
I'm going to talk about tuning the Linux I/O scheduler to increase throughput and decrease latency on an SSD. I'll also cover another interesting topic about tuning the performance of NuoDB by specifying storage for the archive and journal directories.
Tuning the Linux I/O Scheduler
Linux gives you the option to select the I/O scheduler. The scheduler can be changed without rebooting, too! You may be asking at this point, "why would I ever want to change the I/O scheduler?" Changing the scheduler makes sense when the overhead of optimizing the I/O (re-ordering I/O requests) is unnecessary and expensive. This setting should be fine-tuned per storage device. The best setting for an SSD will not be a good setting for an HDD.
The current I/O scheduler can be viewed by typing the following command:
mrice@host:~$ cat /sys/block/sda/queue/scheduler noop anticipatory deadline [cfq]
The current I/O scheduler (in brackets) for /dev/sda on this machine is CFQ, Completely Fair Queuing. This setting became the default in kernel 2.6.18 and it works well for HDDs. However, SSDs don't have rotating platters or magnetic heads. The I/O optimization algorithms in CFQ don't apply to SSDs. For an SSD, the NOOP I/O scheduler can reduce I/O latency and increase throughput as well as eliminate the CPU time spent re-ordering I/O requests. This scheduler typically works well on SANs, SSDs, Virtual Machines, and even fancy Fusion I/O cards. At this point you're probably thinking "OK, I'm sold! How do I change the scheduler already?" You can use the echo command as shown below:
mrice@host:~$ echo noop | sudo tee /sys/block/sda/queue/scheduler
To see the change, just cat the scheduler again.
mrice@host:~$ cat /sys/block/sda/queue/scheduler [noop] anticipatory deadline cfq
Notice that noop is now selected in brackets. This change is only temporary and will reset back to the default scheduler, CFQ in this case, when the machine reboots. You need to edit the Grub configuration in order to keep the setting permanently. However, this will change the I/O scheduler for all block devices. The problem is that NOOP is not a good choice for HDD. I'd only permanently change the setting if the machine only has SSDs.
On Grub 2:
Edit: /etc/default/grub Add "elevator=noop" to the GRUB_CMDLINE_LINUX_DEFAULT line. sudo update-grub
At this point, you've changed the I/O scheduler to NOOP. How do you know if it made a difference? You could run a benchmark against it and compare the numbers (just remember to flush the file-system cache). The other way is to take a look at the output from iostat. I/O requests spend less time in the queue with the NOOP I/O scheduler. This can be seen with in the "await" field from iostat. Here's an example of a larger write operation with NOOP.
iostat -x 1 /dev/sda
Tuning NuoDB Performance
Now that you've learned about the NOOP I/O scheduler, I'll talk about tuning NuoDB with an SSD. If you've read the tech blogs you'll know that there are two building blocks for a NuoDB database: the Transaction Engine, TE for short, and the Storage Manager, SM. The TE is an in-memory only copy of the database (actually a portion of the database). As a result, an SSD won't help the performance of a TE because, it doesn't store atoms to disk. The SM contains two modules that write to disk: the archive and the journal. The archive stores atoms to disk when archive configuration parameter points to a file system (versus HDFS and S3). The journal, on the other hand, synchronously writes messages to disk. If you read the blog post on durability, you may remember that the "Remote Commit with Journaling" setting provides the highest level of durability but at the cost of slower speed. Using an SSD in this situation can drastically improve performance.
To tune this setting, we'll need to make a nuodb directory on the SSD:
mkdir -p /ssd/nuodb
The SSD in this example has the mount point /ssd on this machine. Easy, right? I'm assuming you've already set the Linux I/O scheduler for the SSD to NOOP. The next step is to configure NuoDB to use this path when creating the journal directory. The journal has a direct correlation on the transaction throughput because the journal has to finish the disk write before an ACK for the transaction commit is sent by the SM back to the TE. What about the archive? The archive is decoupled from the transaction commit, the atoms will remain in memory and gradually make their way to disk. This will have very little effect on the TPS of the database. As a result, the archive directory can be placed on just a regular HDD. The quick guide for tuning performance is to put the journal on an SSD and archive can be placed on an HDD.
Here are the commands:
nuodbmgr --broker localhost --password bird nuodb [domain] > start process sm Database: hockey Host: s1 Process command-line options: --journal enable --journal-dir /ssd/nuodb/demo-journal Archive directory: /var/opt/nuodb/demo-archives Initialize archive: true Started: [SM] s1/127.0.0.1:37880 [ pid = 25036 ] ACTIVE nuodb [domain/hockey] > start process te Host: t1 Process command-line options: --dba-user dba --dba-password goalie Started: [TE] t1/127.0.0.1:48315 [ pid = 25052 ] ACTIVE
The important point with this tuned configuration is that only the journal is on an SSD. As a result, the SSD doesn't have to be one of these massive TB drives. The costs of SSDs have significantly dropped in price and a single 128 GB or 256 GB SSD would be adequate for the journal data. This configuration should rival the local commit performance which is wicked awesome considering it is the highest durability level! I encourage you to try it out and ask some questions.