Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Obtaining Reliable Thread Call Stacks of 64-bit Processes

DZone's Guide to

Obtaining Reliable Thread Call Stacks of 64-bit Processes

Free Resource

Transform incident management with machine learning and analytics to help you maintain optimal performance and availability while keeping pace with the growing demands of digital business with this eBook, brought to you in partnership with BMC.

The x64 calling convention is a great improvement over the state of affairs in x86. Few would argue about this. After all, remembering the differences between __stdcall and __cdecl, when to use each, which API defaults to which calling convention, and which specific variation of __fastcall JIT compilers use when given the choice -- is not the best use of developer time and not the best in terms of debugging productivity.

With that said, the x64 calling convention often makes it very difficult to retrieve parameter values from the call stack if you don't have private symbols for the relevant frame. In a nutshell, the problem is that the x64 calling convention allows many parameters to be passed in volatile registers, which can then be modified by the callee. Often enough, the compiler spills parameters from volatile registers to a predefined location on the stack when these volatile registers must be used for another purpose. In other cases, however, parameter values might vanish without a trace, and make stack reconstruction exceptionally difficult, especially when you're dealing with a dump file and not a live process in which you can set up breakpoints and examine the context at any point.

But, enough said: let's take a look at an example where we're interested in obtaining parameter values from the stack. In this case, we have a UI thread that called the WaitForMultipleObjects API, and we're interested in the first two parameters passed to WaitForMultipleObjects: the number of synchronization objects for which the thread is waiting, and the array of handles to these objects. A first attempt involves the kb command, which takes a guess at what the method's parameters are by dumping out the first three QWORDs at RBP+8 (immediately following the method's return address if FPO wasn't used):

0:000> kb
RetAddr           : Args to Child                                                           : Call Site
000007f9`346212d2 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!NtWaitForMultipleObjects+0xa
000007f9`368e1292 : 00000000`00000001 000007f6`1cfc9000 00000000`00000001 000000b9`e75faa68 : KERNELBASE!WaitForMultipleObjectsEx+0xe5
000007f6`1d3e1da9 : 00000000`00000001 cccccccc`cccccccc cccccccc`cccccccc cccccccc`cccccccc : KERNEL32!WaitForMultipleObjects+0x12
000007f9`1f710c37 : 000000b9`e5cff3b0 000000b9`e5cfeba0 000000b9`e5cfe2b8 cccccccc`cccccccc : BatteryMeter!CBatteryMeterDlg::OnCPUSelectorChanged+0x159
...snipped for brevity...

These values are, of course, utter nonsense. If we had the source code for the BatteryMeter module, we could inspect it and try to identify the parameters. Without source code, however, we must resort to disassembling the function around the call to WaitForMultipleObjects:

0:000> uf BatteryMeter!CBatteryMeterDlg::OnCPUSelectorChanged
  ...snipped for brevity...
  185 000007f6`1d3e1d8d 41b9ffffffff    mov     r9d,0FFFFFFFFh
  185 000007f6`1d3e1d93 41b801000000    mov     r8d,1
  185 000007f6`1d3e1d99 488d542448      lea     rdx,[rsp+48h]
  185 000007f6`1d3e1d9e b904000000      mov     ecx,4
  185 000007f6`1d3e1da3 ff1597df0200    call    qword ptr [BatteryMeter!_imp_WaitForMultipleObjects (000007f6`1d40fd40)]
  ...snipped for brevity...

Note that this is indeed the call site: the CALL instruction (which is six bytes long: ff1597df0200) is at 000007f6`1d3e1da3, whereas the return address for WaitForMultipleObjects is at 000007f6`1d3e1da9, six bytes later.

Once we have the call site, we can recall the parameter order in the x64 calling convention. Specifically, WaitForMultipleObjects has four parameters: the number of synchronization objects (a DWORD), the array of handles, a Boolean indicating whether to wait for all the objects to become signaled or any of them, and finally a timeout (a DWORD). These parameters are passed in the ECX, RDX, R8D, and R9D registers, respectively. (Recall that RnD is an alias for the least-significant 32 bits of the 64-bit Rn register.)

At this point we know the number of objects in the array -- it is a constant, 4. Furthermore, even through the RDX register was probably clobbered by the callee, we can still determine the address of the array by inspecting the stack location RSP+48 in the caller's frame. To find the value of RSP, we can use the k command:

0:000> k
Child-SP          RetAddr           Call Site
000000b9`e5cfdbd8 000007f9`346212d2 ntdll!NtWaitForMultipleObjects+0xa
000000b9`e5cfdbe0 000007f9`368e1292 KERNELBASE!WaitForMultipleObjectsEx+0xe5
000000b9`e5cfdec0 000007f6`1d3e1da9 KERNEL32!WaitForMultipleObjects+0x12
000000b9`e5cfdf00 000007f9`1f710c37 BatteryMeter!CBatteryMeterDlg::OnCPUSelectorChanged+0x159
...snipped for brevity...

Now, we can inspect the handles themselves:

0:000> dq 000000b9`e5cfdf00+48 L4
000000b9`e5cfdf48  00000000`00000118 00000000`00000120
000000b9`e5cfdf58  00000000`00000128 00000000`0000012c

...or even ask the debugger to print out the handle information for the handles in the array:

0:000> .foreach /pS 1 /ps 1 (h {dq /c 1 000000b9`e5cfdf00+48 L4}) {!handle h f}
Handle 118
  Type          Thread
  Attributes    0
  GrantedAccess 0x1fffff:
         Delete,ReadControl,WriteDac,WriteOwner,Synch
         Terminate,Suspend,Alert,GetContext,SetContext,SetInfo,QueryInfo,SetToken,Impersonate,DirectImpersonate
  HandleCount   3
  PointerCount  786412
  Name          <none>
  Object Specific Information
    Thread Id   1940.1158
    Priority    12
    Base Priority 0
    Start Address 1d3e1fa0 BatteryMeter!CPUSelectorThread
Handle 120
  Type          Thread
  Attributes    0
  GrantedAccess 0x1fffff:
         Delete,ReadControl,WriteDac,WriteOwner,Synch
         Terminate,Suspend,Alert,GetContext,SetContext,SetInfo,QueryInfo,SetToken,Impersonate,DirectImpersonate
  HandleCount   3
  PointerCount  786416
  Name          <none>
  Object Specific Information
    Thread Id   1940.12d0
    Priority    10
    Base Priority 0
    Start Address 1d3e1ff0 BatteryMeter!HardwareChangeDetectorThread
Handle 128
  Type          Thread
  Attributes    0
  GrantedAccess 0x1fffff:
         Delete,ReadControl,WriteDac,WriteOwner,Synch
         Terminate,Suspend,Alert,GetContext,SetContext,SetInfo,QueryInfo,SetToken,Impersonate,DirectImpersonate
  HandleCount   3
  PointerCount  786420
  Name          <none>
  Object Specific Information
    Thread Id   1940.1220
    Priority    12
    Base Priority 0
    Start Address 1d3e2040 BatteryMeter!LocationAwarenessThread
Handle 12c
  Type          Thread
  Attributes    0
  GrantedAccess 0x1fffff:
         Delete,ReadControl,WriteDac,WriteOwner,Synch
         Terminate,Suspend,Alert,GetContext,SetContext,SetInfo,QueryInfo,SetToken,Impersonate,DirectImpersonate
  HandleCount   3
  PointerCount  786424
  Name          <none>
  Object Specific Information
    Thread Id   1940.1814
    Priority    10
    Base Priority 0
    Start Address 1d3e20a0 BatteryMeter!TemperaturePropagationThread

To summarize, with some effort we were able to discover that the WaitForMultipleObjects function was invoked with an array of four threads, which we can now go ahead and inspect. But this was an easy case, in which the parameters haven't been clobbered -- it boggles the mind to think that you have to go through disassembly listings every time you want to dump parameter values.

Enter CMKD -- a free debugging extension that streamlines the analysis of 64-bit call stacks. This extension performs some fairly clever analysis of the stack structure, non-volatile register storage areas, and function call sites to display parameter values or at least explain where they came from. In our particular case, this extension works brilliantly:

0:000> !stack -p -t
Call Stack : 44 frames
## Stack-Pointer    Return-Address   Call-Site       
...snipped for brevity...
01 000000b9e5cfdbe0 000007f9368e1292 KERNELBASE!WaitForMultipleObjectsEx+e5 
  Parameter[0] = 0000000000000004 : rcx saved in current frame into NvReg rbx which is saved by child frames
  Parameter[1] = 000000b9e5cfdf48 : rdx saved in current frame into NvReg r13 which is saved by child frames
  Parameter[2] = 0000000000000001 : r8  saved in current frame into stack 
  Parameter[3] = 0000000000000000 : r9  saved in current frame into NvReg r14 which is saved by child frames
02 000000b9e5cfdec0 000007f61d3e1da9 KERNEL32!WaitForMultipleObjects+12 
  Parameter[0] = 0000000000000004 : rcx setup in parent frame by movb instruction @ 000007f61d3e1d9e from immediate data 
  Parameter[1] = 000000b9e5cfdf48 : rdx setup in parent frame by lea instruction @ 000007f61d3e1d99 from mem @ 000000b9e5cfdf48 
  Parameter[2] = 0000000000000001 : r8  setup in parent frame by movb instruction @ 000007f61d3e1d93 from immediate data 
  Parameter[3] = 00000000ffffffff : r9  setup in parent frame by movb instruction @ 000007f61d3e1d8d from immediate data 
03 000000b9e5cfdf00 000007f91f710c37 BatteryMeter!CBatteryMeterDlg::OnCPUSelectorChanged+159 
  Parameter[0] = (unknown)        : 
  Parameter[1] = (unknown)        : 
  Parameter[2] = (unknown)        : 
  Parameter[3] = (unknown)        : 
...snipped for brevity...

In the preceding output, the highlighted parameters correspond to what we previously discovered with manual labor. CMKD deduced the parameter values and explained where they came from -- the specific MOVB/LEA instructions that initialized the registers.

To conclude: CMKD makes it much easier to analyze x64 method calls that use the x64 calling convention. It's a valuable addition to your arsenal if you need to debug dumps of optimized binaries for which you do not have private symbols and source code.

Evolve your approach to Application Performance Monitoring by adopting five best practices that are outlined and explored in this e-book, brought to you in partnership with BMC.

Topics:

Published at DZone with permission of Sasha Goldshtein, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}