What is your control group?
Join the DZone community and get the full member experience.Join For Free
one of the areas where we think voron can be improved is the free space utilization policy. in particular, smarter free space utilization can lead to better performance, since we won’t have to seek as much.
i spent some time working on that, and i got something that, on paper at least, looks much better, performance wise. but… actual benchmarks showed little to no improvement, and in some cases, actual degradation! that was the point when i realized that i actually needed to have some sort of a control, to see what would be the absolute optimal scenario for us. so i wrote a null free space policy. with no free space, voron will always go to the end of the file, giving us the best-case scenario of sequential writes.
this gives us the following behavior:
flush 1 with 2 pages - 8 kb writes and 1 seeks ( 2 leaves, 0 branches, 0 overflows) flush 2 with 8 pages - 32 kb writes and 1 seeks ( 7 leaves, 1 branches, 0 overflows) flush 3 with 10 pages - 40 kb writes and 1 seeks ( 9 leaves, 1 branches, 0 overflows) flush 27 with 74 pages - 296 kb writes and 1 seeks ( 72 leaves, 2 branches, 0 overflows) flush 28 with 74 pages - 296 kb writes and 1 seeks ( 72 leaves, 2 branches, 0 overflows) flush 29 with 72 pages - 288 kb writes and 1 seeks ( 70 leaves, 2 branches, 0 overflows) flush 1,153 with 155 pages - 620 kb writes and 1 seeks (102 leaves, 53 branches, 0 overflows) flush 1,154 with 157 pages - 628 kb writes and 1 seeks (104 leaves, 53 branches, 0 overflows) flush 1,155 with 165 pages - 660 kb writes and 1 seeks (108 leaves, 57 branches, 0 overflows) flush 4,441 with 191 pages - 764 kb writes and 1 seeks (104 leaves, 87 branches, 0 overflows) flush 4,442 with 196 pages - 784 kb writes and 1 seeks (107 leaves, 89 branches, 0 overflows) flush 4,443 with 198 pages - 792 kb writes and 1 seeks (108 leaves, 90 branches, 0 overflows) flush 7,707 with 200 pages - 800 kb writes and 1 seeks (106 leaves, 94 branches, 0 overflows) flush 7,708 with 204 pages - 816 kb writes and 1 seeks (106 leaves, 98 branches, 0 overflows) flush 7,709 with 211 pages - 844 kb writes and 1 seeks (113 leaves, 98 branches, 0 overflows) flush 9,069 with 209 pages - 836 kb writes and 1 seeks (107 leaves, 102 branches, 0 overflows) flush 9,070 with 205 pages - 820 kb writes and 1 seeks (106 leaves, 99 branches, 0 overflows) flush 9,071 with 208 pages - 832 kb writes and 1 seeks (108 leaves, 100 branches, 0 overflows)
and with this, 10,000 transactions with 100 random values each:
fill rnd buff separate tx : 106,383 ms 9,400 ops / sec
and that tells me that for the best case scenario, there is something else that is causing this problem, and it ain’t the cost of doing seeks. i dropped the number of transactions to 500 and ran it through a profiler, and i got the following:
in other words, pretty much the entire time was spent just calling flushviewoffile. however, i think that we optimized that enough already, didn’t we? looking at the calls, it seems that we have just one flushviewoffile per transaction in this scenario.
in fact, looking at the actual system behavior, we can see:
so seeks-wise, we're good. what i can’t understand, however, is why we see those readfile calls. looking at the data, it appears that we run into this whenever we access the now portion of the file, so this is the mmap subsystem paging the file contents into memory before we start doing that. it is actually pretty great that it is able to page 1 mb at a time.
next, let us see what else we can do here. i ran the 500 tx test on an hdd drive and it gave me the following result:
fill rnd sync separate tx : 25,540 ms 1,958 ops / sec
but note that each write has two writes. one at the end of the file, and one at the file beginning (which is the actual final act of the commit). what happened if we just removed that part?
this gave me a very different number:
fill rnd sync separate tx : 21,764 ms 2,297 ops / sec
so just seeking and writing a single page cost us 17% of our performance. here are the details from running this test:
now, this is a meaningless test, added just to check what the relative costs are. we have to do the header write, otherwise we can’t do real transactions.
for fun, i ran the same thing using sequential writes, giving me 3,619 ops / sec. since in both cases we are actually doing sequential writes, the major difference was how much we actually wrote. this is the view of writing sequentially:
as you can see, we only have to write 8 to 10 pages per transaction, compared to 110 to 130 in the random case. and that obviously has a lot of implications.
all of this has taught me something very important. in the end, the actual free space policy matters, but not that much. so i need to select something that is good, but that is about it.
Published at DZone with permission of Oren Eini, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Incident Response Guide
Demystifying SPF Record Limitations
Application Architecture Design Principles
Java Concurrency: Condition