Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

De-Virtualization in CoreCLR: Part II

DZone's Guide to

De-Virtualization in CoreCLR: Part II

How can you actually allow CLR to handle various methods and calls depending on exactly what kind of call you are doing?

· Performance Zone
Free Resource

In my previous post, I discussed how the CLR is handling various method calls, depending on exactly what we are doing (interface dispatch, virtual method call, and struct method call).

That was all fun and games, but how can we actually practice this? Let's take a look at a generic method and how it is actually translated to machine code:

Run<IActor>( ... ) ;
/*
 sub         rsp,28h  
 mov         rcx,r8  
 mov         r11,7FFACA010020h  
 cmp         dword ptr [rcx],ecx  
 call        qword ptr [r11]  
 nop  
 add         rsp,28h  
 ret  
*/


Run<ActorStruct>(...);
/*
 sub         rsp,28h  
 mov         rcx,27AB5E33068h  
 mov         rcx,qword ptr [rcx]  
 call        00007FFACA160750  
 nop  
 add         rsp,28h  
 ret  
*/

In the case of an interface, we got through the standard virtual stub to dispatch the method. In the case of a struct, the JIT was smart enough to inline the generic call.

I’ll let that sink in for a second. Using a struct generic argument, we were able to inline the call.

Remember the previous post when we talked about the cost of method dispatch, in the number of instructions, in the number of memory jumps and references? We now have a way to replace those invocation costs with inlinable code.

When is this going to be useful? This technique is most beneficial when we are talking about code that is used a lot and is relatively small/efficient already, to the point where the cost of calling it is a large part of the execution time.

One situation that pops to mind that answer just this scenario is getting the hash code and equality checks in a dictionary. The number of such calls that we have is in the many billions per second, and the cost of indirection here is tremendous. We have our own dictionary implementation (with different default and design guideline than the default one), but one who is also meant to be used with this approach.

In other words, instead of passing an EqualityComparer instance, we use an EqualityComparer generic parameter. And that allows the JIT to tune us to the Nth degree, allow us to inline the hash/equals calls. That means that in a very hot code path, we can drastically reduce costs.

Topics:
performance ,tutorial ,devirtualization ,coreclr

Published at DZone with permission of Oren Eini, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}