In one of my recent posts about performance, a suggestion was raised:
Just spotted a small thing, you could optimise the call to:
_buffer[pos++] = (byte)'\';
with a constant as it's always the same.
There are two problems with this suggestion. Let us start with the obvious one first. Here is the disassembly of the code:
b = (byte) '/';
00007FFC9DC84548 mov rcx,qword ptr [rbp+8]
00007FFC9DC8454C mov byte ptr [rcx],2Fh
b = 47;
00007FFC9DC8454F mov rcx,qword ptr [rbp+8]
00007FFC9DC84553 mov byte ptr [rcx],2Fh
As you can see, in both cases, the exact same instructions are carried out.
That is because we are no longer using compilers that had 4KB of memory to work with and required hand holding and intimate familiarity with how the specific compiler version we wrote the code for behaved.
The other problem is closely related. I've been working with code for the past 20 years. And while I remember the ASCII codes for some characters, when reading b = 47, I would have to go and look it up. That puts a really heavy burden on the reader of a parser, where this is pretty much all that happens.
I recently saw it when I looked at the Smaz library. I ported that to C# and along the way, I made sure that it was much more understandable (at least in my opinion). This resulted in a totally unexpected pull request that ported my C# port to Java. Making the code more readable made it accessible and possible to work with. Whereas before it was an impenetrable black box.
Consider what this means for larger projects, where there are large sections that are marked with "there be dragons and gnarly bugs"… This really kills systems and teams productivity.
In the case of the Smaz library port, because the code was easier to work with, Peter was able to not just port it to Java but was able to repurpose it into a useful util for compressing mime types very efficiently.