[This article was written by Federico Lois]
In the previous post on the topic after inspecting the decompiled source using ILSpy we were able to uncover potential things we could do.
Do you remember the loop from a couple of posts behind?
This loop was the responsible to check for the last bytes; either because the words compare found a difference or we are at the end of the memory block and the last block is not 8 bytes wide. You may remember it easier because of the awful code it generated.
After optimization work that loop looked sensibly different:
And we can double-check that the optimization did create a packed MSIL.
Now let’s look at the different operations involved. In red, blue and orange we have unavoidable instructions, as we need to do the comparison to be able to return the result.
13 instructions to do the work and 14 for the rest. Half of the operations are housekeeping in order to prepare for the next iteration.
The experienced developer would notice that if we would have done this on the JIT output each one of the increments and decrements can be implemented with a single assembler operation using specific addressing mode. However, we shouldn’t underestimate the impact of memory loads and stores.
How would the following loop look in pseudo-assembler?
:START Load address of lhs into register Load address of R into raddr-inregister Move value of lhs into R Load byte into a 32 bits register (lhs-inregister) Substract rhs-inregister, [raddr-inregister], Load int32 from R into r-inregister Jump :WEAREDONE if non zero r-inregister, Load address of lhs into lhsaddr-register Add 4 into [lhsaddr-register] (immediate-mode) Load address of rhs into rhsaddr-register Add 4 into [rhsaddr-register] (immediate-mode) Load address of n into naddr-register Increment [naddr-register] Load content of n into n-register Jump :START if bigger than zero n-register Push 0 Return :WEAREDONE Push r-inregister Return
As you can see the housekeeping keeps being a lot of operations. J
Armed with this knowledge. How would you optimize this loop?