I’m currently in the process of getting some benchmark numbers for a process we have, and I was watching some metrics along the way. I have mentioned that disk speed can be affected by quite a lot of things. So here are two metrics, taken about 1 minute apart in the same benchmark.
This is using a Samsung PM871 512GB SSD drive, and it is currently running on a laptop, so not the best drive in the world, but certainly a respectable one.
Here is the steady state operation while we are doing a lot of write work. Note that the response time is very high, in computer terms, forever and a half:
And here is the same operation, but now we need to do some cleanup and push more data to the disk, in which case, we get great performance.
But just look at the latency numbers that we are seeing here.
Same machine, local hard disk (and SSD to boot), and we are seeing latency numbers that aren’t even funny.
In this case, the reason for this is that we are flushing the data file alongside the journal file. In order to allow it to proceed as fast as possible, we try to parallelize the work so even though the data file flush is currently holding most of the I/O, we are still able to proceed with minimal hiccups and stall as far as the client is concerned.
But this can really bring home the fact that we are actually playing with a very limited pipe, and there is little that we can do to control the usage of the pipe at certain points (a single fsync can flush a lot of unrelated stuff) and there is no way to throttle things and let the OS know (this particular flush operation should take more than 100MB/s, I’m fine with it taking a bit longer, as long as I have enough I/O bandwidth left for other stuff).