Over a million developers have joined DZone.

A Loop of Nested Exceptions

·

It was a pretty incredible coincidence. Only a few days apart, I had to tackle two problems that had to do with nested exception handlers. Specifically, an infinite loop of nested exceptions that led to a stack overflow. And that’s a pretty fatal combination. A stack overflow is an extremely nasty error to debug; a nested exception means the exception handler encountered an exception, which can’t be pretty; and to add insult to injury, a stack corruption was also involved behind the scenes. Read on to learn some more about the trickiness of diagnosing nested exceptions and what can cause them in the first place.

Case 1: Read error in VC’s exception filter

The client had multiple dump files for me to look at, and they all exhibited a pretty crazy pattern. An exception would occur in the application — one that is perfectly expected and handled. However, instead of being handled normally, it would cause an infinite cascade of nested exceptions that eventually crashed the process with a stack overflow. Here’s a quick illustration of what it looked like, edited for brevity:

0:000> kcn 10000
... <repeated hundreds more times>
19e9 <Unloaded_Helper.dll>
19ea NestedExceptions1!exception_filter
19eb NestedExceptions1!trigger_exception
19ec MSVCR120D!_EH4_CallFilterFunc
19ed MSVCR120D!_except_handler4_common
19ee NestedExceptions1!_except_handler4
19ef ntdll!ExecuteHandler2
19f0 ntdll!ExecuteHandler
19f1 ntdll!KiUserExceptionDispatcher
19f2 <Unloaded_Helper.dll>
19f3 NestedExceptions1!exception_filter
19f4 NestedExceptions1!trigger_exception
19f5 MSVCR120D!_EH4_CallFilterFunc
19f6 MSVCR120D!_except_handler4_common
19f7 NestedExceptions1!_except_handler4
19f8 ntdll!ExecuteHandler2
19f9 ntdll!ExecuteHandler
19fa ntdll!KiUserExceptionDispatcher
19fb <Unloaded_Helper.dll>
19fc NestedExceptions1!exception_filter
19fd NestedExceptions1!trigger_exception
19fe MSVCR120D!_EH4_CallFilterFunc
19ff MSVCR120D!_except_handler4_common
1a00 NestedExceptions1!_except_handler4
1a01 ntdll!ExecuteHandler2
1a02 ntdll!ExecuteHandler
1a03 ntdll!KiUserExceptionDispatcher
1a04 NestedExceptions1!trigger_exception
1a05 NestedExceptions1!main
1a06 NestedExceptions1!__tmainCRTStartup
1a07 NestedExceptions1!mainCRTStartup
1a08 kernel32!BaseThreadInitThunk
1a09 ntdll!__RtlUserThreadStart
1a0a ntdll!_RtlUserThreadStart

In the preceding call stack, it’s pretty clear that exception_filter is trying to call a function in an unloaded DLL (Helper.dll). That in turns causes an exception (access violation, most likely) which transfers control to exception_filter, and we’re ankle-deep in the nested exception loop. By the way, it’s pretty easy to follow the exception chain if you know what to look for. Here’s the kb output for a few frames:

006deb54 00a85706 00000000 00000000 00000000 <Unloaded_Helper.dll>+0x1115e
006dec28 00a85eab 006dec4c 0f4e3924 00000000 NestedExceptions1!exception_filter+0x26
006dec30 0f4e3924 00000000 00000000 00000000 NestedExceptions1!trigger_exception+0x6b
006dec44 0f4e9268 006deda0 006dedf0 00000001 MSVCR120D!_EH4_CallFilterFunc+0x12
006dec7c 00a866d2 00a90000 00a81041 006deda0 MSVCR120D!_except_handler4_common+0xb8
006dec9c 7794c881 006deda0 006df734 006dedf0 NestedExceptions1!_except_handler4+0x22
006decc0 7794c853 006deda0 006df734 006dedf0 ntdll!ExecuteHandler2+0x26
006ded88 7794c6bb 006deda0 006dedf0 006deda0 ntdll!ExecuteHandler+0x24
006ded88 58b3115e 006deda0 006dedf0 006deda0 ntdll!KiUserExceptionDispatcher+0xf
006df0d4 00a85706 00000000 00000000 00000000 <Unloaded_Helper.dll>+0x1115e
006df1a8 00a85eab 006df1cc 0f4e3924 00000000 NestedExceptions1!exception_filter+0x26
006df1b0 0f4e3924 00000000 00000000 00000000 NestedExceptions1!trigger_exception+0x6b
006df1c4 0f4e9268 006df324 006df374 00000001 MSVCR120D!_EH4_CallFilterFunc+0x12
006df1fc 00a866d2 00a90000 00a81041 006df324 MSVCR120D!_except_handler4_common+0xb8
006df21c 7794c881 006df324 006df734 006df374 NestedExceptions1!_except_handler4+0x22
006df240 7794c853 006df324 006df734 006df374 ntdll!ExecuteHandler2+0x26
006df30c 7794c6bb 006df324 006df374 006df324 ntdll!ExecuteHandler+0x24
006df30c 00a85e8f 006df324 006df374 006df324 ntdll!KiUserExceptionDispatcher+0xf
006df744 00a86068 00000000 00000000 7ebab000 NestedExceptions1!trigger_exception+0x4f
006df818 00a86a79 00000001 00c37b88 00c35178 NestedExceptions1!main+0x28
006df868 00a86c6d 006df87c 76dc919f 7ebab000 NestedExceptions1!__tmainCRTStartup+0x199
006df870 76dc919f 7ebab000 006df8c0 77960bbb NestedExceptions1!mainCRTStartup+0xd
006df87c 77960bbb 7ebab000 35ed4a97 00000000 kernel32!BaseThreadInitThunk+0xe
006df8c0 77960b91 ffffffff 7794c9d2 00000000 ntdll!__RtlUserThreadStart+0x20
006df8d0 00000000 00a812d5 7ebab000 00000000 ntdll!_RtlUserThreadStart+0x1b

The highlighted values (2nd parameter to ntdll!ExecuteHandler) are the context records, and the preceding values are the exception records. You can inspect them in WinDbg using the .cxr and .exr commands:

0:000> .exr 006df324 
ExceptionAddress: 00a85e8f (NestedExceptions1!trigger_exception+0x0000004f)
 ExceptionCode: c0000005 (Access violation)
 ExceptionFlags: 00000000
NumberParameters: 2
 Parameter[0]: 00000001
 Parameter[1]: 00000000
Attempt to write to address 00000000
0:000> .cxr 006dedf0 
eax=cccccccc ebx=00000000 ecx=00000000 edx=00000000 esi=006df0dc edi=006df1a8
eip=58b3115e esp=006df0d8 ebp=006df1a8 iopl=0 nv up ei pl nz na pe nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010206
<Unloaded_Helper.dll>+0x1115e:
58b3115e ?? ???
0:000> .exr 006deda0 
ExceptionAddress: 58b3115e (<Unloaded_Helper.dll>+0x0001115e)
 ExceptionCode: c0000005 (Access violation)
 ExceptionFlags: 00000010
NumberParameters: 2
 Parameter[0]: 00000008
 Parameter[1]: 58b3115e
Attempt to execute non-executable address 58b3115e

This shows that the original exception was an access violation in the trigger_exception function, but it was shadowed by another access violation that was caused by executing code from the unloaded DLL.

But the original problem was somewhat more subtle than an unloaded DLL. In the actual application, exception_filter was actually the exception filter installed by Visual C++ to handle C++ exceptions: msvcr*!__InternalCxxFrameHandler. And somehow it would trigger a nested exception, with exception code 0xc0000006 (IN_PAGE_ERROR: in-page I/O error) and I/O error code 0xc000020c (STATUS_CONNECTION_DISCONNECTED). That nested exception would transfer control again to __InternalCxxFrameHandler, and it would hit the same nested error again and again.

Armed with this information, we began to investigate. The actual memory access that caused the nested exception was to a data structure passed to __InternalCxxFrameHandler, which is stored inside the binary and contains exception-handling information emitted by the C++ compiler. This information isn’t frequently accessed — it is only required when an exception occurs.

The final piece of the puzzle was that the application was running from a network drive, which was being accessed by a large number of machines, putting great load on the server’s network connection. The working hypothesis, then, was the following:

  1. An exception occurred early in the application’s initialization path
  2. A connectivity error also occurred, causing the network drive where the application’s binary resides to be temporarily disconnected
  3. The Visual C++ exception filter needed access to an data structure stored in the application’s binary, but that data structure wasn’t yet cached in RAM from the network drive because it is only accessed when an exception occurs
  4. The exception filter then failed with an I/O error, which transferred control to the exception filter again — causing the nested loop of exceptions

I was able to reconstruct this issue by writing an exception filter that accesses a large read-only global data structure (read-only globals are also stored in the application binary), and running the sample app from a network drive. When I disabled the network adapter prior to causing the exception, we got exactly the symptoms described above.

How did we fix the problem? Turns out, Visual C++ has a linker setting called /SWAPRUN (specifically, /SWAPRUN:NET), which can be used to instruct the system to load the application binary completely into memory prior to executing it. It means there’s no way the application started running but suddenly parts of the binary would become unavailable because of a connectivity problem. It’s obviously better to avoid the connectivity problem in the first place, but given that networks are inherently unreliable, this is a must-use switch if you’re running an application from a network drive.

Case 2: The stack corruption that caused a stack overflow

The second case I had to look at also exhibited a nested exception chain leading up to a stack overflow. The only difference was that the stack was also corrupted. Here are some frames from the bottom of the stack:

0:000> kn
... <repeated hundreds more times>
f04 002be8c4 7794c881 0xcccccccc
f05 002be8e8 7794c853 ntdll!ExecuteHandler2+0x26
f06 002be9b0 7794c6bb ntdll!ExecuteHandler+0x24
f07 002be9b0 cccccccc ntdll!KiUserExceptionDispatcher+0xf
f08 002becfc 7794c881 0xcccccccc
f09 002bed20 7794c853 ntdll!ExecuteHandler2+0x26
f0a 002bede8 7794c6bb ntdll!ExecuteHandler+0x24
f0b 002bede8 cccccccc ntdll!KiUserExceptionDispatcher+0xf
f0c 002bf134 7794c881 0xcccccccc
f0d 002bf158 7794c853 ntdll!ExecuteHandler2+0x26
f0e 002bf224 7794c6bb ntdll!ExecuteHandler+0x24
f0f 002bf224 010b51d5 ntdll!KiUserExceptionDispatcher+0xf
f10 002bf674 cccccccc NestedExceptions2!trigger_exception+0x65
f11 002bf748 010b5cd9 0xcccccccc
f12 002bf798 010b5ecd NestedExceptions2!__tmainCRTStartup+0x199
f13 002bf7a0 76dc919f NestedExceptions2!mainCRTStartup+0xd
f14 002bf7ac 77960bbb kernel32!BaseThreadInitThunk+0xe
f15 002bf7f0 77960b91 ntdll!__RtlUserThreadStart+0x20
f16 002bf800 00000000 ntdll!_RtlUserThreadStart+0x1b

The 0xcccccccc addresses on the stack look like a sure symptom of a stack corruption. In fact, you probably already have a working hypothesis: the stack has been corrupted with 0xcccccccc, and the exception handling code in ntdll somehow attempts to execute code from the address 0xcccccccc. This causes a nested exception, and we’re in the same situation as in case #1. If we look at the exception records, we can confirm that’s the case (the first exception record is the root cause, and the second exception record is due to the invalid exception handler):

0:000> .exr 002bf23c 
ExceptionAddress: 010b51d5 (NestedExceptions2!trigger_exception+0x00000065)
 ExceptionCode: c0000005 (Access violation)
 ExceptionFlags: 00000000
NumberParameters: 2
 Parameter[0]: 00000001
 Parameter[1]: 00000000
Attempt to write to address 00000000
0:000> .exr 002bee00 
ExceptionAddress: cccccccc
 ExceptionCode: c0000005 (Access violation)
 ExceptionFlags: 00000010
NumberParameters: 2
 Parameter[0]: 00000000
 Parameter[1]: cccccccc
Attempt to read from address cccccccc

But there are a couple of subtleties. Why would ntdll try to execute code from the invalid address 0xcccccccc? The answer is that in 32-bit Windows applications the exception filter’s address is stored on the stack, as part of a data structure called the exception registration record. If that structure is corrupted, the exception handling code in ntdll might attempt to execute an invalid address, thinking it’s a pointer to an exception filter.

Normal stack structure on x86, with exception registration records

But that’s actually a pretty serious security vulnerability! If an attacker can overwrite the exception registration record and then trigger an exception, they can achieve arbitrary code execution. In fact, exploiting the exception registration record is one of the ways to overcome stack canary defenses introduced in Visual C++ 2003 (the /GS flag).

Corrupted exception registration record, which leads to exploitation

Fortunately, Windows has a few tricks up its sleeve. First, Visual C++ 2003 introduced a linker flag called/SAFESEH. When this flag is available, the linker embeds a directory of all valid exception handlers in the binary. At runtime, Windows can check whether the exception handler that’s about to run is in the list of valid exception handlers. If it isn’t, it will straight out refuse to execute the invalid exception handler, and halt the process — which is way safer than trying to execute an unknown exception handler.

Second, Windows Vista SP1 and Windows Server 2008 introduced another system-wide protection mechanism called SEHOP (Structured Exception Handling Overwrite Protection). By default, SEHOP is disabled on client operating systems (Windows Vista, Windows 7, Windows 8) and enabled on server operating systems (Windows Server 2008 and Windows Server 2012). SEHOP is a fairly simple defense: when it’s on, every thread has a dummy exception registration record at the beginning of the chain. When an exception occurs, the exception handling code in ntdll verifies that the current exception handler chain terminates with that dummy exception registration record. If it doesn’t, the system assumes the exception handler chain has been tampered with, and terminates the process.

When either of these defenses is turned on, the situation described above is simply impossible. The control transfer to 0xcccccccc as if it was an exception handler would be prevented and the process would abruptly terminate. Therefore, we can conclude that the system on which this crash occurred is not up to par with modern security guidance: the application should have been compiled with /SAFESEH, and the system should have had SEHOP enabled. By the way, starting with Windows 7, you can configure SEHOP for an individual process without affecting the rest of the system using the ImageFileExecutionOptions registry key.

As an illustration, here’s what happens when the binary is compiled with /SAFESEH (note that the /NXCOMPAT flag, which enables software DEP, is also required):

0:000> kn
 # ChildEBP RetAddr 
00 008aee54 779c625f ntdll!NtWaitForMultipleObjects+0xc
01 008af2b8 779c5e38 ntdll!RtlReportExceptionEx+0x3eb
02 008af314 779e81bf ntdll!RtlReportException+0x9b
03 008af394 7798b2e3 ntdll!RtlInvalidHandlerDetected+0x4e
04 008af3ec 7797734a ntdll!RtlIsValidHandler+0x13f1a
05 008af484 7794c6bb ntdll!RtlDispatchException+0xfc
06 008af484 01284ef5 ntdll!KiUserExceptionDispatcher+0xf
07 008af8d4 cccccccc NestedExceptions2!trigger_exception+0x65
WARNING: Frame IP not in any known module. Following frames may be wrong.
08 008af9a8 012858c9 0xcccccccc
09 008af9f8 01285a0d NestedExceptions2!__tmainCRTStartup+0x199
0a 008afa00 76dc919f NestedExceptions2!mainCRTStartup+0xd
0b 008afa0c 77960bbb kernel32!BaseThreadInitThunk+0xe
0c 008afa50 77960b91 ntdll!__RtlUserThreadStart+0x20
0d 008afa60 00000000 ntdll!_RtlUserThreadStart+0x1b
0:000> !analyze -v
... <removed for brevity>
DEFAULT_BUCKET_ID: APPLICATION_FAULT
PROCESS_NAME: NestedExceptions2.exe
ERROR_CODE: (NTSTATUS) 0xc00001a5 - An invalid exception handler routine has been detected.
EXCEPTION_CODE: (NTSTATUS) 0xc00001a5 - An invalid exception handler routine has been detected.
... <removed for brevity>

Importantly, there is no infinite loop of nested exceptions. The process is immediately terminated and Windows Error Reporting is invoked.

On the other hand, when SEHOP is enabled for that particular process, it prevents the invalid exception handler from executing. In a typical WER dump file, the result appears as though the original exception, which was supposed to be handled normally, ends up unhandled:

0:000> !analyze -v
...
FAULTING_IP: 
NestedExceptions2!trigger_exception+65
002c51d5 c705000000002a000000 mov dword ptr ds:[0],2Ah

EXCEPTION_RECORD: ffffffff -- (.exr 0xffffffffffffffff)
  ExceptionAddress: 002c51d5 (NestedExceptions2!trigger_exception+0x00000065)
  ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000008
  NumberParameters: 2
  Parameter[0]: 00000001
  Parameter[1]: 00000000
  Attempt to write to address 00000000
 ...
 BUGCHECK_STR: APPLICATION_FAULT_NULL_POINTER_WRITE_SEHOP
 PRIMARY_PROBLEM_CLASS: NULL_POINTER_WRITE_SEHOP
 DEFAULT_BUCKET_ID: NULL_POINTER_WRITE_SEHOP
 ...
 FAILURE_BUCKET_ID: NULL_POINTER_WRITE_SEHOP_c0000005_NestedExceptions2.exe!trigger_exception
 BUCKET_ID: APPLICATION_FAULT_NULL_POINTER_WRITE_SEHOP_nestedexceptions2!trigger_exception+65
 ANALYSIS_SOURCE: UM
 FAILURE_ID_HASH_STRING: um:null_pointer_write_sehop_c0000005_nestedexceptions2.exe!trigger_exception
 ...

But there is a good hint that SEHOP was involved — the word “SEHOP” appears multiple times in the analysis. I’m not sure if there’s a better way of identifying SEHOP’s involvement, but that’s good enough for me

Conclusions

First, when dealing with an infinite loop of nested exceptions, don’t panic. You need to identify the original exception that started the chain, and then look for a repeating pattern. An exception filter/handler at some point in the chain must have failed, and a series of control transfers lead back to the same exception filter.

Second, structured exception handling is a vulnerable mechanism, especially in 32-bit applications. Make sure you’re using all the protection afforded by your compiler and operating system: SafeSEH, DEP, and SEHOP.

I am posting short links and updates on Twitter as well as on this blog. You can follow me: @goldshtn

Topics:

Published at DZone with permission of Sasha Goldshtein, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}