Wednesday, October 15, 2008

gdb: examining the core dumps

When your program receives SIGSEGV(Segmentation fault) kernel automatically terminates it(if the application doesn't handle SIGSEGV).
After a couple of long nights of debugging, tracing and drinking coffee you finally find the line in the sources where your application causes system to send this crafty signal.

The most general problem is that you usually unable to run application withing debugger. The fault may be caused by special circumstances. It's really painful to sit in front of the debugger and trying to reproduce the fault. More complexity add multi-threading/processing, network interaction.

Core dumps would be a good solution here.
The linux kernel is able to write a core dump if the application crashes. This core dump records the state of the process at the time of the crash.

Later you can use gdb to analyze the core dump.

Core dumps are disabled by default in linux.
To enable you should run

ulimit -c unlimited
By default kernel writes core dump in the current working directory of the application. You may customize the pattern of file path for core dumps by writing it to /proc/sys/kernel/core_pattern.
According to current documentation pattern consists of following templates
%%  A single % character 
%p  PID of dumped process 
%u  real UID of dumped process 
%g  real GID of dumped process 
%s  number of signal causing dump 
%t  time of dump (secs since 0:00h, 1 Jan 1970) 
%h  hostname (same as the 'nodename' returned by uname(2)) 
%e  executable filename
So with
echo /tmp/%e-%p.core > /proc/sys/kernel/core_pattern
linux should put core dumps into /tmp with -.core filename.
Let's try all this.
Say we have this code
void
crash()
{
    char a[0];
    free(a);
}
int
main(int argc, char **argv)
{
    crash();
    return 0;
}
As you can see application should cause segmentation violation on free call. Let's compile it
gdb test.c -g -o test
and execute
./test 
Segmentation fault (core dumped)
System tells us that core was dumped. Let's see what we have
ll /tmp/*core
-rw------- 1 niam niam  151552 2008-10-15 15:19 /tmp/test-25301.core
Got it. Now I'm going to run gdb
gdb --core /tmp/test-25301.core ./test
gdb clearly tells that application was terminated with SIGSEGV
Core was generated by `./test'.
Program terminated with signal 11, Segmentation fault.
#0  0xb7e4ff97 in free () from /lib/libc.so.6
Now we can use power of gdb to catch the problem code
(gdb) bt
#0  0xb7e4ff97 in free () from /lib/libc.so.6
#1  0x08048392 in crash () at 1.c:9#2  0x080483aa in main () at 1.c:15
(gdb) up
#1  0x08048392 in crash () at 1.c:99  free(a);
(gdb) p a
$1 = 0xbfc5daf8 "\b�ſ�\203\004\b�D�� �ſx�ſ��߷�����\203\004\bx�ſ��߷\001"
(gdb) whatis a
type = char [0]
(gdb) info frame
Stack level 1, frame at 0xbfc5db00:
 eip = 0x8048392 in crash (1.c:9);
 saved eip 0x80483aa called by frame at 0xbfc5db10, caller of frame at 0xbfc5daf0
 source language c.
 Arglist at 0xbfc5daf8, args:
 Locals at 0xbfc5daf8, Previous frame's sp is 0xbfc5db00
 Saved registers:
  ebp at 0xbfc5daf8, eip at 0xbfc5dafc
We can see here that free attempted to free memory of the stack. It shows 'whatis a' and we see that address of a is in the stack(esp holds 0xbfc5db00 and a is stored at 0xbfc5daf8 - just in the beginning of the stack).
gdb gave all needed information for further investigation. The only thing left is to understand who tought you to free array on the stack o_O.

1 comment:

Unknown said...
This comment has been removed by the author.