Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Searching for and Displaying C++ Heap Objects in WinDbg

DZone's Guide to

Searching for and Displaying C++ Heap Objects in WinDbg

Free Resource

This is something I am pretty excited to tell you about.

But first, some motivation. In managed applications, there’s a huge number of tools and ways to inspect the manage heap contents. You can use a memory profiler to see references between objects and inspect individual objects. You can use WinDbg with the SOS extension to dump all objects of a particular type and execute additional scripts and commands for each object. You can even write C# code that uses the ClrMd library to parse heap contents and write your own diagnostic tools.

C++ has nothing of that sort. And this is something I wanted to change. C++ heap objects are often classes. Classes often have virtual methods. And classes with virtual methods have a vtable pointer as their first field, which makes it possible to identify them and display them.

Enter heap_stat.py, a script that you can run in WinDbg to search for and display heap objects (given that they have a vtable).

To use heap_stat.py, you need to install PyKD. This is a free extension for WinDbg that makes it possible to write Python scripts that access most of the debugger engine API. It’s definitely easier than writing full-fledged C++ extensions, and definitely more powerful than WinDbg scripts.

Next, you load pykd.pyd and start inspecting your C++ heap:

0:001> .load pykd.pyd
0:001> !py heap_stat.py
Running x /2 *!*`vftable' command...DONE
Running !heap -h 0 command...DONE
Enumerating 218 heap blocks
005c4170    MSVCP110!std::locale::_Locimp
005c6a70    MSVCR110!std::bad_alloc
005ccc00    Payroll!employee
005ccc28    Payroll!employee
005ccc50    Payroll!employee
005ccc78    Payroll!employee
... snipped ...
005cee90    Payroll!employee

Statistics:
                                         Type name         Count    Size
                                  Payroll!employee           100    3200
                         MSVCP110!std::ctype<char>             1    Unknown
                      MSVCP110!std::ctype<wchar_t>             1    Unknown
... snipped ...
                                   Payroll!manager             1    44
                     MSVCP110!std::locale::_Locimp             1    Unknown

You can also ask for a statistics-only display using the -stat switch:

0:001> !py heap_stat.py -stat
Running x /2 *!*`vftable' command...DONE
Running !heap -h 0 command...DONE
Enumerating 218 heap blocks
    Enumerated 100 heap blocks
    Enumerated 200 heap blocks 


Statistics: 

                                         Type name         Count    Size
                                  Payroll!employee           100    3200
                         MSVCP110!std::ctype<char>             1    Unknown
                      MSVCP110!std::ctype<wchar_t>             1    Unknown
               MSVCP110!std::ctype<unsigned short>             1    Unknown
                           MSVCR110!std::bad_alloc             1    Unknown
                                   Payroll!manager             1    44
                     MSVCP110!std::locale::_Locimp             1    Unknown

OK, suppose you’re now interested in these Payroll!employee objects. You can ask the output to be filtered to these objects only (the -type switch used here accepts any regular expression Python understands, so you can do things like Payroll!(employee|manager)):

0:001> !py heap_stat.py -type Payroll!employee
Running x /2 *!*`vftable' command...DONE
Running !heap -h 0 command...DONE
Enumerating 218 heap blocks
005ccc00    Payroll!employee
005ccc28    Payroll!employee
005ccc50    Payroll!employee
005ccc78    Payroll!employee
... snipped ...
005cee90    Payroll!employee

Statistics:
                                         Type name         Count    Size
                                  Payroll!employee           100    3200

You can also request a short output, that allows you to run other commands and scripts for each object. The -short switch is responsible for that. Suppose you want the salaries of all employees earning more than $97,500:

0:001> .foreach (emp {!py heap_stat.py -type Payroll!employee -short}) { .block { r? $t0=(Payroll!employee*)0x${emp}; .if (@@c++(@$t0->_salary) > 0n97500) { .printf "%ma earns $%d\n", @@c++(@$t0->_name._Bx._Buf), @@c++(@$t0->_salary) } } }

Kate earns $97757
Lyanna earns $97662 

Whoa. There’s a lot to digest here. The script simply returns a bunch of addresses for employee objects. Most of the work is outside the script: for each of these objects, we execute a few additional debugger commands that retrieve the employee name and salary from the object and display it.

Finally, there is support for saving some time with the -save and -load switches. The -save switch saves some debug information to a file (type sizes, vtable addresses, etc.), which the -load switch can then load. This can reduce execution times considerably, especially when you have a large number of modules with many vtables.

Sure enough, this script is based on a bunch of heuristics and can be fooled by certain types of multiple inheritance and also just a plain bit pattern that happens to be the same as the address of a vtable. Still, I believe this approach can be fairly useful to many C++ application developers. I intend to continue adding features to the script as the need arises, and of course if you’d like to contribute, pull requests are welcome on GitHub.

Topics:

Published at DZone with permission of Sasha Goldshtein, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}