How to Debug a Segmentation Fault Without a Core Dump
Use dmesg and catchsegv to identify the fault location and callers. Use objdump or GDB to analyze the scenario and map instructions to source code with debug symbols.
Join the DZone community and get the full member experience.Join For Free
In the past, I had to deal with this kind of restriction on several occasions. A segmentation fault or, more generally, abnormal process termination had to be investigated with the caveat that a core dump was not available.
For Linux, our platform of choice for this walkthrough, a few reasons come to mind:
- Core dump generation is disabled altogether (using
- The target directory (current working directory or a directory in
/proc/sys/kernel/core_pattern) does not exist or is inaccessible due to filesystem permissions or SELinux.
- The target filesystem has insufficient disk space resulting in a partial dump.
For all of those, the net result is the same: there's no (valid) core dump to use for analysis. Fortunately, a workaround exists for post-mortem debugging that has the potential to save the day, but given its inherent limitations, your mileage may vary from case to case.
Identifying the Faulting Instruction
The following example contains a classic use-after-free memory error:
delete value, the
Test::m_value points to inaccessible memory. Therefore, running it results in a segmentation fault:
When a process terminates due to an access violation, the Linux kernel creates a log entry accessible via
dmesg and, depending on the system's configuration, the Syslog (usually
/var/log/messages). The example (compiled with
-O0) creates the following entry:
The corresponding Linux kernel source from
The error (
error_code) reveals what the trigger was. It's a CPU-specific bit set (x86). In our case, the value
101 in binary) indicates that the page represented by the faulting address
0xffffffffffffffe8 was mapped but inaccessible due to page protection and a read was attempted.
The log message identifies the module that executed the faulting instruction:
libstdc++.so.6.0.1. The sample was compiled without optimization, so the call to
const&) was not inlined:
The STL performs the read access. Knowing those basics, how can we identify where the segmentation fault occurred exactly? The log entry features two essential addresses we need for doing so:
The first is the instruction pointer (
rip) at the time of the access violation, the second is the address the
.text section of the library is mapped to. By subtracting the
.text base address from
rip, we get the relative address of the instruction in the library (
0x7f9f2c2b56a3-0x7f9f2c220000=0x956a3) and can disassemble the implementation using
objdump (you can simply search for the offset):
Is that the correct instruction? We can consult GDB to confirm our analysis:
GDB shows the very same instruction. We can also use a debugging session to verify the read address:
This value matches the read address in the log entry.
Identifying the Callers
So, despite the absence of a core dump, the kernel output enables us to identify the exact location of the segmentation fault. In many scenarios, though, that is far from being enough. For one thing, we're missing the list of calls that got us to that point — the call stack or stack trace.
Without a dump in the backpack, you have two options to get hold of the callers: you can start your process using
catchsegv (a glibc utility) or you can implement your own signal handler.
catchsegv serves as a wrapper, generates the stack trace, and also dumps register values and the memory map:
catchsegv work? It essentially injects a signal handler using
LD_PRELOAD and the library
libSegFault.so. If your application already happens to install a signal handler for
SIGSEGV and you intend to take advantage of
libSegFault.so, your signal handler needs to forward the signal to the original handler (as returned by
The second option is to implement the stack trace functionality yourself using a custom signal handler and
backtrace(). This allows you to customize the output location and the output itself.
Based on that information, we can essentially do the same we did before (
0x7f1794fd36a3-0x7f1794f3e000=0x956a3). This time around, we can go back to the callers to dig deeper. The second frame is represented by the following line:
0x400bf4 is the address the callee returns control to after
Test::print(), it's located in the executable. We can visualize the call site:
Note that the output of objdump matches the address in this instance because we run it against the executable, which has a default base address of
0x400000 on x86_64 — objdump takes that into account. With address space layout randomization (ASLR) enabled (compiled with
-fpie, linked with
-pie), the base address has to be taken into account as outlined before.
Going back further involves the same steps:
Until now, we've been manually translating the absolute address to a relative address. Instead, the base address of the module can be passed to objdump via
--adjust-vma=<base-address>. Now, the value of
rip or a caller's address can be used directly.
Adding Debug Symbols
We've come a long way without a dump. For debugging to be effective, another critical puzzle piece is absent, however: debug symbols. Without them, it can be difficult to map the assembly to the corresponding source code. Compiling the sample with
-O3 and without debug information illustrates the problem:
As a consequence of inlining, the log entry now points to our executable as the trigger. Using objdump gets us to the following:
Part of the stream implementation was inlined, making it harder to identify the associated source code. Without symbols, you have to use export symbols, calls (like
operator delete(void*)) and the surrounding instructions (
mov $0x6020a0 loads the address of
00000000006020a0 <std::cout@@GLIBCXX_3.4>) for the purpose of orientation.
With debug symbols (
-g), more context is available by calling
That worked as expected. In the real world, debug symbols are not embedded in the binaries — they are managed in separate
debuginfo packages. In those circumstances,
objdump ignores debug symbols even if they are installed. To address this limitation, symbols have to be re-added to the affected binary. The following procedure creates detached symbols and re-adds them using
elfutils to the benefit of objdump:
Using GDB Instead of objdump
Thus far, we've been using objdump because it's usually available, even on production systems. Can we just use GDB instead? Yes, by executing
gdb with the module of interest. I use
0x0x400a4b as in the previous objdump invocation:
In contrast to objdump, GDB can deal with external symbol information without a hitch.
disass /m corresponds to
In the case of an optimized binary, GDB might skip instructions in this mode if the source code cannot be mapped unambiguously. Our instruction at
0x400a4b is not listed for that very reason. objdump never skips instructions and might skip the source context instead — an approach, that I prefer for debugging at this level. This does not mean that GDB isn't useful for this task, it's just something to be aware of.
Termination reason, registers, memory map, and stack trace. It's all there without even a trace of a core dump. While definitely useful (I fixed quite a few crashes that way), you have to keep in mind that you're still missing valuable information by going that route, most notably the stack and heap as well as per-thread data (thread metadata, registers, stack).
So, whatever the scenario may be, you should seriously consider enabling core dump generation and ensure that dumps can be generated successfully if push comes to shove. Debugging in itself is complex enough, debugging without the information you could technically have needlessly increased complexity and turnaround time, and, more importantly, significantly lowers the probability that the root cause can be found and addressed in a timely manner.
Published at DZone with permission of George R. See the original article here.
Opinions expressed by DZone contributors are their own.