Platinum Partner
dotnet,architects,opinion,high-perf,performance

Doing Micro-Benchmarking Wrong for the Wrong Reasons

This is a short excerpt from Chapter 2 (Performance Measurement) of Pro .NET Performance, scheduled to appear in August 2012. I might be publishing a few more of these before and after the book is out.

The purpose is to determine “which is faster”—using the is keyword and then casting to the desired type, or using the as keyword and relying on the result. Here’s a test class for this comparison I’ve seen lately:

//Test class
class Employee {
  public void Work() {}
}

//Fragment 1 – casting safely and then checking for null
static void Fragment1(object obj) {
  Employee emp = obj as Employee;
  if (emp != null) {
    emp.Work();
  }
}

//Fragment 2 – first checking the type and then casting
static void Fragment2(object obj) {
  if (obj is Employee) {
    Employee emp = obj as Employee;
    emp.Work();
  }
}

A rudimentary benchmarking framework might go along the following lines:

static void Main() {
  object obj = new Employee();
  Stopwatch sw = Stopwatch.StartNew();
  for (int i = 0; i < 500; i++) {
    Fragment1(obj);
  }
  Console.WriteLine(sw.ElapsedTicks);
  sw = Stopwatch.StartNew();
  for (int i = 0; i < 500; i++) {
    Fragment2(obj);
  }
  Console.WriteLine(sw.ElapsedTicks);
}

This is not a convincing microbenchmark, although the results are fairly reproducible. More often than not, the output is 4 ticks for the first loop and 200-400 ticks for the second loop. This might lead to the conclusion that the first fragment is 50-100 times faster. However, there are significant errors in this measurement and the conclusion that stems from it:

  1. The loop runs only once and 500 iterations are not enough to reach any meaningful conclusions—it takes a negligible amount of time to run the whole benchmark, so it can be affected by many environmental factors.
  2. No effort was made to prevent optimization, so the JIT compiler may have inlined and discarded both measurement loops completely.
  3. The Fragment1 and Fragment2 methods measure not only the cost of the is and as keywords, but also the cost of a method invocation (to the FragmentN method itself!). It may be the cast that invoking the method is significantly more expensive than the rest of the work.

Improving upon these problems, the following microbenchmark more closely depicts the actual costs of both operations:

class Employee {
  //Prevent inlining this method
  [MethodImpl(MethodImplOptions.NoInlining)]
  public void Work() {}
}

static void Measure(object obj) {
  const int OUTER_ITERATIONS = 10;
  const int INNER_ITERATIONS = 100000000;
  for (int i = 0; i < OUTER_ITERATIONS; ++i) {
    Stopwatch sw = Stopwatch.StartNew();
    for (int j = 0; j < INNER_ITERATIONS; ++j) {
      Employee emp = obj as Employee;
      if (emp != null)
        emp.Work();
    }
    Console.WriteLine(sw.ElapsedMilliseconds);
  }
  for (int i = 0; i < OUTER_ITERATIONS; ++i) {
    Stopwatch sw = Stopwatch.StartNew();
    for (int j = 0; j < INNER_ITERATIONS; ++j) {
      if (obj is Employee) {
        Employee emp = obj as Employee;
        emp.Work();
      }
    }
    Console.WriteLine(sw.ElapsedMilliseconds);
  }
}

The results on desktop (after discarding the first iteration) were around 410ms for the first loop and 440ms for the second loop, a reliable and reproducible performance difference, which might have you convinced that indeed, it’s more efficient to use just the as keyword for casts and checks.

However, the riddles aren’t over yet. If we add the virtual modifier to the Work method, the performance difference disappears completely, even if we increase the number of iterations. This cannot be explained by the virtues or maladies of our micro-benchmarking framework—it is a result from the problem domain. There is no way to understand this behavior without going to the assembly language level and inspecting the loop generated by the JIT compiler in both cases. First, before the virtual modifier:

; Disassembled loop body – the first loop
mov edx,ebx
mov ecx,163780h ;MT of Employee
call clr!JIT_IsInstanceOfClass
test eax,eax
je WRONG_TYPE
mov ecx,eax
call dword ptr ds:[163774h] ;Employee.Work()
WRONG_TYPE:

; Disassembled loop body – the second loop
mov edx,ebx
mov ecx,163780h ;MT of Employee
call clr!JIT_IsInstanceOfClass
test eax,eax
je WRONG_TYPE
mov ecx,ebx
cmp dword ptr [ecx],ecx
call dword ptr ds:[163774h] ;Employee.Work()
WRONG_TYPE:

The instruction sequence emitted by the JIT compiler to call a non-virtual method and to call a virtual method was discussed in one of my posts more than five years ago. When calling a non-virtual method, the JIT compiler must emit an instruction that makes sure we are not making a method call on a null reference. The CMP instruction in the second loop serves that task. In the first loop, the JIT compiler is smart enough to optimize this check away, because immediately prior to the call, there is a null reference check on the result of the cast (if (emp != null) ...). In the second loop, the JIT compiler’s optimization heuristics are not sufficient to optimize the check away (although it would have been just as safe), and this extra instruction is responsible for the extra 7-8% of performance overhead.

After adding the virtual modifier, however, the JIT compiler generates exactly the same code in both loop bodies:

; Disassembled loop body – both cases
mov edx,ebx
mov ecx,1A3794h ;MT of Employee
call clr!JIT_IsInstanceOfClass
test eax,eax
je WRONG_TYPE
mov ecx,eax
mov eax,dword ptr [ecx]
mov eax,dword ptr [eax+28h]
call dword ptr [eax+10h]
WRONG_TYPE:

The reason is that when invoking a virtual method, there’s no need to perform a null reference check explicitly—it is inherent in the method dispatch sequence. When the loop bodies are identical, so are the timing results.

This small example should not dissuade you from ever writing your own micro-benchmarks. However, the worst performance optimization is one that is based on incorrect measurements; unfortunately, manual benchmarking often leads into this trap.

Published at DZone with permission of {{ articles[0].authors[0].realName }}, DZone MVB. (source)

Opinions expressed by DZone contributors are their own.

{{ tag }}, {{tag}},

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}
{{ parent.authors[0].realName || parent.author}}

{{ parent.authors[0].tagline || parent.tagline }}

{{ parent.views }} ViewsClicks
Tweet

{{parent.nComments}}