Yet Another Way to Determine the Sync Object for Which Your Thread Is Waiting
Join the DZone community and get the full member experience.
Join For FreeA couple of years ago I gave a cursory visit to the subject of diagnosing a Monitor-based deadlock in a managed application, and then a few months ago I demonstrated in greater detail how to locate the synchronization object your thread is waiting for using SOS.
[Side note: It’s always easy to see the list of currently owned Monitors (sync blocks) using the SOS !SyncBlk command, which also tells you which thread owns each synchronization object. The hard part is to find the synchronization object for which your thread is waiting if you don’t have any prior knowledge of this particular lock your application is using.]
It’s always good to have another way of doing the same thing, in case something doesn’t work out in the debugger. This post shows a third approach to detecting the synchronization object for which your thread is waiting.
After attaching to the process, use !Threads, !CLRStack, and !SyncBlk as usual to see the general picture. In this example, here are the threads (edited for brevity):
0:004> !Threads
ThreadCount: 3
...
ID OSID APT Exception
0 1 1320 MTA
2 2 fec MTA (Finalizer)
3 3 18d8 MTA (Threadpool Worker)
OK, so we have a few of threads here, one is the main thread, the other is the finalizer thread, and the third is a thread pool thread. Here’s the call stack for the application’s threads (edited for brevity, e.g. I removed the unmanaged threads):
0:004> ~* e !CLRStack
OS Thread Id: 0x1320 (0)
Child-SP RetAddr Call Site
000000000013ee10 000007fef7d8d502 LocksAndLocks.Program.Main()
OS Thread Id: 0xfec (2)
Failed to start stack walk: 80004005
OS Thread Id: 0x18d8 (3)
Child-SP RetAddr Call Site
000000001aecf510 000007fef6c72bbb …c__DisplayClass1.<Main>b__0(…)
000000001aecf560 000007fef6ce7411 System.Threading.ExecutionContext.Run(…)
000000001aecf5b0 000007fef6ce724f …PerformWaitCallbackInternal(…)
000000001aecf600 000007fef7d8d502 …PerformWaitCallback(…)
“Failed to start stack walk”—in other words, the finalizer thread is not currently executing any managed code. Not interesting.
And here are the sync blocks (edited for brevity):
0:004> !syncblk
SyncBlock Owning Thread SyncBlock Owner
0000000000c38408 3 0000000002683a98 System.String
What is this string anyway?
0:003> !do 0000000002683a98
Name: System.String
MethodTable: 000007fef6daec90
EEClass: 000007fef69bb038
Size: 36(0x24) bytes
(C:\Windows\assembly\GAC_64\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
String: Hello
…
This is a 64-bit process, so from looking at the managed call stacks it wouldn’t immediately become obvious that there’s a synchronization issue here. Fortunately, the unmanaged call stack comes to the rescue. Here’s thread 3:
0:003> kc
Call Site
ntdll!NtDelayExecution
KERNELBASE!SleepEx
mscorwks!EESleepEx
mscorwks!Thread::UserSleep
mscorwks!ThreadNative::Sleep
0x0
mscorlib_ni
mscorlib_ni
mscorlib_ni
mscorwks!CallDescrWorker
mscorwks!CallDescrWorkerWithHandler
mscorwks!DispatchCallDebuggerWrapper
mscorwks!DispatchCallNoEH
mscorwks!QueueUserWorkItemManagedCallback
mscorwks!Thread::DoADCallBack
mscorwks!SVR::gc_heap::make_heap_segment
mscorwks!AssemblySecurityDescriptor::GetZone
mscorwks!ThreadNative::KickOffThread
mscorwks!ManagedPerAppDomainTPCount::DispatchWorkItem
mscorwks!ThreadpoolMgr::WorkerThreadStart
Not particularly interesting, other than that it’s currently in a sleep call. It’s fairly easy to figure out the timeout by inspecting the managed IL:
0:003> !Name2EE *!LocksAndLocks.Program+<>c__DisplayClass1.<Main>b__0
…
Module: 000007ff000533d0 (LocksAndLocks.exe)
Token: 0x0000000006000004
MethodDesc: 000007ff00053a58
Name: LocksAndLocks.Program+<>c__DisplayClass1.<Main>b__0(System.Object)
JITTED Code Address: 000007ff001a02d0
0:003> !DumpIL 000007ff00053a58
ilAddr = 00000000011c2058
IL_0000: nop
IL_0001: ldarg.0
IL_0002: ldfld <>c__DisplayClass1::s
IL_0007: dup
IL_0008: stloc.0
IL_0009: call System.Threading.Monitor::Enter
IL_000e: nop
.try
{
IL_000f: nop
IL_0010: ldc.i4.m1
IL_0011: call System.Threading.Thread::Sleep
IL_0016: nop
IL_0017: nop
IL_0018: leave.s IL_0022
} // end .try
.finally
{
IL_001a: ldloc.0
IL_001b: call System.Threading.Monitor::Exit
IL_0020: nop
IL_0021: endfinally
} // end .finally
IL_0022: nop
IL_0023: ret
Note the highlighted line—the parameter passed to Thread.Sleep is –1, which is the infinite timeout. So this thread isn’t going anywhere for the time being, and from the !SyncBlk output we remember that it owns sync block 0000000000c38408, whose owner is a string at 0000000002683a98.
What about thread 0?
0:000> kc
Call Site
ntdll!NtWaitForMultipleObjects
KERNELBASE!WaitForMultipleObjectsEx
KERNEL32!WaitForMultipleObjectsExImplementation
mscorwks!WaitForMultipleObjectsEx_SO_TOLERANT
mscorwks!Thread::DoAppropriateAptStateWait
mscorwks!Thread::DoAppropriateWaitWorker
mscorwks!Thread::DoAppropriateWait
mscorwks!CLREvent::WaitEx
mscorwks!AwareLock::EnterEpilog
mscorwks!AwareLock::Enter
mscorwks!JIT_MonEnterWorker_Portable
0x0
mscorwks!CallDescrWorker
mscorwks!CallDescrWorkerWithHandler
mscorwks!MethodDesc::CallDescr
mscorwks!ClassLoader::RunMain
mscorwks!Assembly::ExecuteMainMethod
mscorwks!SystemDomain::ExecuteMainMethod
mscorwks!ExecuteEXE
mscorwks!CorExeMain
This is more interesting. This thread is currently waiting for a lock. This is where we deviate from previous examples. The kb command would give the culprit away because the address of the sync block is passed to the AwareLock::EnterEpilog method as a parameter. Instead, let’s examine the code that calls JIT_MonEnterWorker_Portable and try to figure it out from there:
0:000> !name2ee *!*Program.Main
…
Module: 000007ff000533d0 (LocksAndLocks.exe)
Token: 0x0000000006000001
MethodDesc: 000007ff00053990
Name: LocksAndLocks.Program.Main(System.String[])
JITTED Code Address: 000007ff001a0120
0:000> !u 000007ff001a0120
Normal JIT generated code
LocksAndLocks.Program.Main(System.String[])
Begin 000007ff001a0120, size 12f
…
…mov rax,qword ptr [rbp+8]
…mov rax,qword ptr [rax+8]
…mov qword ptr [rbp+30h],rax
…mov rax,qword ptr [rbp+30h]
…mov qword ptr [rbp+10h],rax
…mov rcx,qword ptr [rbp+30h]
…call mscorwks!JIT_MonEnter (000007fe`f7d8bc60)
…nop
…nop
OK, so the sync object’s address is passed in the RCX register. However, before it goes into the RCX register, which is probably cleared out by now, it’s fetched from the stack at RBP+30. This is something we have a shot at recovering:
0:000> kn
# … Call Site
…
07 … mscorwks!CLREvent::WaitEx+0xbe
08 … mscorwks!AwareLock::EnterEpilog+0xc9
09 … mscorwks!AwareLock::Enter+0x72
0a … mscorwks!JIT_MonEnterWorker_Portable+0xf5
0b … 0x7ff`001a01fa
0c … mscorwks!CallDescrWorker+0x82
0d … mscorwks!CallDescrWorkerWithHandler+0xd3
0e … mscorwks!MethodDesc::CallDescr+0x24f
0f … mscorwks!ClassLoader::RunMain+0x22b
…
0:000> .frame /r 0b
0b 00000000`0013ee10 000007fe`f7d8d502 0x7ff`001a01fa
rax=0000000000c0e100 rbx=000007ff0005c050 rcx=0000000000c0e100
rdx=0000000000000001 rsi=000000000013f020 rdi=000000000013ee98
rip=000007ff001a01fa rsp=000000000013ee10 rbp=000000000013ee30
r8=0000000000c38420 r9=0000000000000000 r10=0000000000000000
r11=0000000000000202 r12=0000000000000001 r13=0000000000000000
r14=000000000000001d r15=0000000000000001
iopl=0 nv up ei pl zr na po nc
cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000244
000007ff`001a01fa 90 nop
0:000> dq 000000000013ee30+0x30 L1
00000000`0013ee60 00000000`02683a98
Voila—this is the sync object we saw earlier in the !SyncBlk output (if you don’t believe me, scroll up and see for yourself). Now we know that thread 0 is waiting for this synchronization object that’s held by thread 3, and thread 3 is in an infinite sleep.
This concludes yet another way to determine the synchronization object for which your thread is waiting.
Published at DZone with permission of Sasha Goldshtein, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments