kk Blog —— 通用基础

date [-d @int|str] [+%s|"+%F %T"]

How source debuggers work?

http://blog.techveda.org/howsourcedebuggerswork/

Application binaries are a result of compile and build operations performed on a single or a set of source files. Program Source files contain functions, data variables of various types (local, global, register, static), and abstract data objects, all written and neatly indented with nested control structures as per high level programming language syntax (C/C++). Compilers translate code in each source file into machine instructions (1’s and 0’s) as per target processors Instruction set Architecture and bury that code into object files. Further, Linkers integrate compiled object files with other pre-compiled objects files (libraries, runtime binaries) to create end application binary image called executable.

Source debuggers are tools used to trace execution of an application executable binary. Most amazing feature of a source debugger is its ability to list source code of the program being debugged; it can show the line or expression in the source code that resulted in a particular machine code instruction of a running program loaded in memory. This helps the programmer to analyze a program’s behavior in the high-level terms like source-level flow control constructs, procedure calls, named variables, etc, instead of machine instructions and memory locations. Source-level debugging also makes it possible to step through execution a line at a time and set source-level breakpoints. (If you do not have any prior hands on experience with source debuggers I suggest you to look at this before continuing with following.)

lets explore how source debuggers like gnu gdb work ? So how does a debugger know where to stop when you ask it to break at the entry to some function? How does it manage to find what to show you when you ask it for the value of a variable? The answer is – debug information. All modern compilers are designed to generate Debug information together with the machine code of the source file. It is a representation of the relationship between the executable program and the original source code. This information is encoded as per a pre-defined format and stored alongside the machine code. Many such formats were invented over the years for different platforms and executable files (aim of this article isn’t to survey the history of these formats, but rather to show how they work). Gnu compiler and ELF executable on Linux/ UNIX platforms use DWARF, which is widely used today as default debugging information format.

Word of Advice : Does an Application/ Kernel programmer need to know Dwarf?

Obvious answer to this question is a big NO. It is purely subject matter for developers involved in implementation of a Debugger tool. A normal Application developer using debugger tools would never need to learn or dig into binary files for debug information. This in no way adds any edge to your debugging skills nor adds any new skills into your armory. However, if you are a developer using debuggers for years and curious about how debuggers work read this document for an outline into debug information. If you are a beginner to systems programming or fresher’s learning programming I would suggest not to waste your time as you can safely ignore this.

ELF -DWARF sections

Gnu compiler generates debug information which is organized into various sections of the ELF object file. Let’s use the following source file for compiling and observing DWARF sections

root@techveda:~# vim sample.c

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
#include <stdio.h>
#include <stdlib.h>
 
int add(int x, int y)
{
	return x + y;
}
int main()
{
	int a = 10, b = 20;
	int result;
	int (*fp) (int, int);
 
	fp = add;
	result = (*fp) (a, b);
	printf(" %dn", result);
}

root@techveda:~# gcc -c -g sample.c -o sample.o
root@techveda:~# objdump -h sample.o | more

sample.o:     file format elf32-i386
 
Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         0000005f  00000000  00000000  00000034  2**2
              CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  1 .data         00000000  00000000  00000000  00000094  2**2
              CONTENTS, ALLOC, LOAD, DATA
  2 .bss          00000000  00000000  00000000  00000094  2**2
              ALLOC
  3 .debug_abbrev 000000a2  00000000  00000000  00000094  2**0
              CONTENTS, READONLY, DEBUGGING
  4 .debug_info   00000114  00000000  00000000  00000136  2**0
              CONTENTS, RELOC, READONLY, DEBUGGING
  5 .debug_line   00000040  00000000  00000000  0000024a  2**0
              CONTENTS, RELOC, READONLY, DEBUGGING
  6 .rodata       00000005  00000000  00000000  0000028a  2**0
              CONTENTS, ALLOC, LOAD, READONLY, DATA
  7 .debug_loc    00000070  00000000  00000000  0000028f  2**0
              CONTENTS, READONLY, DEBUGGING
  8 .debug_pubnames 00000023  00000000  00000000  000002ff  2**0
              CONTENTS, RELOC, READONLY, DEBUGGING
  9 .debug_pubtypes 00000012  00000000  00000000  00000322  2**0
              CONTENTS, RELOC, READONLY, DEBUGGING
 10 .debug_aranges 00000020  00000000  00000000  00000334  2**0
              CONTENTS, RELOC, READONLY, DEBUGGING
 11 .debug_str    000000b0  00000000  00000000  00000354  2**0
              CONTENTS, READONLY, DEBUGGING
 12 .comment      0000002b  00000000  00000000  00000404  2**0
              CONTENTS, READONLY
 13 .note.GNU-stack 00000000  00000000  00000000  0000042f  2**0
              CONTENTS, READONLY
 14 .debug_frame  00000054  00000000  00000000  00000430  2**2
              CONTENTS, RELOC, READONLY, DEBUGGING

All of the sections with naming debug_xxx are debugging information sections. Information in these sections is interpreted by source debugger like gdb. Each debug_ section holds specific information like

1
2
3
4
5
6
7
8
9
10
11
.debug_info               core DWARF data containing DIEs
.debug_line               Line Number Program
.debug_frame              Call Frame Information
.debug_macinfo            lookup table for global objects and functions
.debug_pubnames           lookup table for global objects and functions
.debug_pubtypes           lookup table for global types
.debug_loc                Macro descriptions
.debug_abbrev             Abbreviations used in the .debug_info section
.debug_aranges            mapping between memory address and compilation
.debug_ranges             Address ranges referenced by DIEs
.debug_str                String table used by .debug_info

Debugging Information Entry (DIE)

Dwarf format organizes debug data in all of the above sections using special objects (program descriptive entities) called Debugging Information Entry (DIE). Each DIE has a tag filed whose value specifies its type, and a set of attributes. DIEs are interlinked via sibling and child links, and values of attributes can point at other DIEs. Now let’s dig into ELF file to view how a DIE looks like. We will begin our exploration with .debug_info section of the ELF file since core DIE’s are listed in it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
root@techveda:~# objdump --dwarf=info . /sample

Above operation shows long list of DIE’s. Let’s limit ourselves to relevant information

./sample:     file format elf32-i386
 
Contents of the .debug_info section:
 
  Compilation Unit @ offset 0x0:
   Length:        0x110 (32-bit)
   Version:       2
   Abbrev Offset: 0
   Pointer Size:  4
 <0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)
    < c>   DW_AT_producer    : (indirect string, offset: 0xe): GNU C 4.5.2   
    <10>   DW_AT_language    : 1  (ANSI C)
    <11>   DW_AT_name        : (indirect string, offset: 0x44): sample.c 
    <15>   DW_AT_comp_dir    : (indirect string, offset: 0x71): /root
    <19>   DW_AT_low_pc      : 0x80483c4 
    <1d>   DW_AT_high_pc     : 0x8048423 
    <21>   DW_AT_stmt_list   : 0x0

Each source file in the application is referred in dwarf terminology as a “compilation unit”. Dwarf data for each compilation unit (source file) starts with a compilation unit DIE. Above dump shows the first DIE’s and tag value “DW_TAG_compile_unit “. This DIE provides general information about compilation unit like source file name (DW_AT_name : (indirect string, offset: 0×44): sample.c), high level programming language used to write source file(DW_AT_language : 1 (ANSI C)) , directory of the source file(DW_AT_comp_dir : (indirect string, offset: 0×71): /root) , compiler and producer of dwarf data( DW_AT_producer : (indirect string, offset: 0xe): GNU C 4.5.2) , start virtual address of the compilation unit (DW_AT_low_pc : 0x80483c4), end virtual address of the unit (DW_AT_high_pc : 0×8048423).

Compilation Unit DIE is the parent for all the other DIE’s that describe elements of source file. Generally, the list of DIE’s that follow will describe data types, followed by global data, then the functions that make up the source file. The DIEs for variables and functions are in the same order in which they appear in the source file.

How does debugger locate Function Information ?

While using source debuggers we often instruct debugger to insert or place break point at some function, expecting the debugger to pause program execution at functions. To be able to perform this task, debugger must have some mapping between a function name in the high-level code and the address in the machine code where the instructions for this function begin. For this mapping information debuggers rely on DIE’s that describes specified function. DIE’s describing functions in a compilation unit are assigned tag value “DW_TAG_subprogram” subprogram as per dwarf terminology is a function.

In our sample application source we have two functions (main, add), dwarf should generate a “DW_TAG_subprogram” DIE’s for each function, these DIE attributes would define function mapping information that debugger needs for resolving machine code addresses with function name.

1
2
3
4
5
6
7
8
9
10
Each “DW_TAG_subprogam” DIE contains

    function scope
    function name
    source file or compilation unit in which function is located
     line no in the source file where the function starts
    Functions return type
    Start address of the fucntion
    End address of the function
    Frame information of the function.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
root@techveda:~# objdump --dwarf=info ./sample | grep "DW_TAG_subprogram"
 <1><72>: Abbrev Number: 4 (DW_TAG_subprogram)
 <1><a8>: Abbrev Number: 6 (DW_TAG_subprogram)
 
<1><72>: Abbrev Number: 4 (DW_TAG_subprogram)
    <73>   DW_AT_external    : 1 
    <74>   DW_AT_name        : add   
    <78>   DW_AT_decl_file   : 1 
    <79>   DW_AT_decl_line   : 4 
    <7a>   DW_AT_prototyped  : 1 
    <7b>   DW_AT_type        : <0x4f>  
    <7f>   DW_AT_low_pc      : 0x80483c4 
    <83>   DW_AT_high_pc     : 0x80483d1 
    <87>   DW_AT_frame_base  : 0x0    (location list)
    <8b>   DW_AT_sibling     : <0xa8>  
 
<1><a8>: Abbrev Number: 6 (DW_TAG_subprogram)
    <a9>   DW_AT_external    : 1 
    <aa>   DW_AT_name        : (indirect string, offset: 0x1a): main 
    <ae>   DW_AT_decl_file   : 1 
    <af>   DW_AT_decl_line   : 8 
    <b0>   DW_AT_type        : <0x4f>  
    <b4>   DW_AT_low_pc      : 0x80483d2 
    <b8>   DW_AT_high_pc     : 0x8048423 
    <bc>   DW_AT_frame_base  : 0x38   (location list)
    <c0>   DW_AT_sibling     : <0xf8>

We now have accessed DIE description of function’s main and add. Let’s analyze attribute information of add fucntions DIE.

Function scope: DW_AT_external : 1 (scope external)

Function name: DW_AT_name : add

Source file or compilation unit in which function is located: DW_AT_decl_file : 1 (indicates 1st compilation unit which is sample.c)

line no in the source file where the function starts: DW_AT_decl_line : 4 ( indicates line no 4 in source file)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
1#include <stdio.h>
2#include <stdlib.h>
3 
4int add(int x, int y)
5{
6        return x + y;
7}
8int main()
9{
10        int a = 10, b = 20;
11        int result;
12        int (*fp) (int, int);
13 
14        fp = add;
15        result = (*fp) (a, b);
16        printf(" %dn", result);
17}

Function’s source line no matched with DIE description of line no. let’s continue with rest of the attribute values

Functions return type: DW_AT_type : <0x4f>

As we have already understood that values of attributes can point to other DIE , here is an example of it. Value DW_AT_type : <0x4f> indicates that return type description is stored in other DIE at offset 0x4f.

1
2
3
4
<4f>: Abbrev Number: 3 (DW_TAG_base_type)
    <50>   DW_AT_byte_size   : 4 
    <51>   DW_AT_encoding    : 5  (signed)
    <52>   DW_AT_name        : int

This DIE describes data type and composition of return type of the function add, as per DIE attribute values return type is signed int of size 4 bytes.

Start address of the function : DW_AT_low_pc : 0x80483c4

End address of the function: DW_AT_high_pc : 0x80483d1

Above values indicate start and end virtual address of the machine instructions of add function, we can verify that with binary dump of the function

1
2
3
4
5
6
7
8
080483c4 <add>:
 80483c4:   55                      push   %ebp
 80483c5:   89 e5                   mov    %esp,%ebp
 80483c7:   8b 45 0c                mov    0xc(%ebp),%eax
 80483ca:   8b 55 08                mov    0x8(%ebp),%edx
 80483cd:   8d 04 02                lea    (%edx,%eax,1),%eax
 80483d0:   5d                      pop    %ebp
 80483d1:   c3                      ret

How does debugger find program data (variables…) Information?

When the program hits assigned break point in a function, debugger pauses the program execution, at this time we can instruct debugger to show or print values of variables, by using debugger commands like print or display followed by variable name (ex: print a) How does debugger know where to find memory location of the variable ? Variables can be located in global storage, on the stack, and even in registers.The debugging information has to be able to reflect all these variations, and indeed DWARF does. As an example let’s take a look at complete DIE information set for main function.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
<1><a8>: Abbrev Number: 6 (DW_TAG_subprogram)
    <a9>   DW_AT_external    : 1 
    <aa>   DW_AT_name        : (indirect string, offset: 0x1a): main 
    <ae>   DW_AT_decl_file   : 1 
    <af>   DW_AT_decl_line   : 8 
    <b0>   DW_AT_type        : <0x4f>  
    <b4>   DW_AT_low_pc      : 0x80483d2 
    <b8>   DW_AT_high_pc     : 0x8048423 
    <bc>   DW_AT_frame_base  : 0x38   (location list)
    <c0>   DW_AT_sibling     : <0xf8>  
 <2><c4>: Abbrev Number: 7 (DW_TAG_variable)
    <c5>   DW_AT_name        : a 
    <c7>   DW_AT_decl_file   : 1 
    <c8>   DW_AT_decl_line   : 10
    <c9>   DW_AT_type        : <0x4f>  
    <cd>   DW_AT_location    : 2 byte block: 74 1c    (DW_OP_breg4 (esp): 28)
 <2><d0>: Abbrev Number: 7 (DW_TAG_variable)
    <d1>   DW_AT_name        : b 
    <d3>   DW_AT_decl_file   : 1 
    <d4>   DW_AT_decl_line   : 10
    <d5>   DW_AT_type        : <0x4f>  
    <d9>   DW_AT_location    : 2 byte block: 74 18    (DW_OP_breg4 (esp): 24)
 <2><dc>: Abbrev Number: 8 (DW_TAG_variable)
    <dd>           : (indirect string, offset: 0x6a): result 
    <e1>   DW_AT_decl_file   : 1 
    <e2>   DW_AT_decl_line   : 11
    <e3>   DW_AT_type        : <0x4f>  
    <e7>   DW_AT_location    : 2 byte block: 74 10    (DW_OP_breg4 (esp): 16)
 <2><ea>: Abbrev Number: 7 (DW_TAG_variable)
    <eb>   DW_AT_name        : fp
    <ee>   DW_AT_decl_file   : 1 
    <ef>   DW_AT_decl_line   : 12
    <f0>   DW_AT_type        : <0x10d> 
    <f4>   DW_AT_location    : 2 byte block: 74 14    (DW_OP_breg4 (esp): 20)

Note the first number inside the angle brackets in each entry. This is the nesting level – in this example entries with <2> are children of the entry with <1>. main function has three integer variables a,b and result each of these variables are described with DW_TAG_variable nested DIE’s (0xc4, 0xd0, 0xdc). main function also has a function pointer fp described in DIE 0xea . Variable DIE attributes specify variable name (DW_AT_name), declaration line no in source function (DW_AT_decl_line ), pointer to address of DIE describing variables data type (DW_AT_type) and relative location of the variable within function’s frame (DW_AT_location).

To locate the variable in the memory image of the executing process, the debugger will look at the DW_AT_location attribute of DIE. For a its value is DW_OP_fbreg4 (esp):28. This means that the variable is stored at offset 28 from the top in the frame of containing function. The DW_AT_frame_base attribute of main has the value 0×38(location list), which means that this value actually has to be looked up in the location list section. Let’s look at it:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
root@techveda:~# objdump --dwarf=loc sample
 
sample:     file format elf32-i386
 
Contents of the .debug_loc section:
 
    Offset   Begin    End      Expression
    00000000 080483c4 080483c5 (DW_OP_breg4 (esp): 4)
    00000000 080483c5 080483c7 (DW_OP_breg4 (esp): 8)
    00000000 080483c7 080483d1 (DW_OP_breg5 (ebp): 8)
    00000000 080483d1 080483d2 (DW_OP_breg4 (esp): 4)
    00000000 <End of list>
    00000038 080483d2 080483d3 (DW_OP_breg4 (esp): 4)
    00000038 080483d3 080483d5 (DW_OP_breg4 (esp): 8)
    00000038 080483d5 08048422 (DW_OP_breg5 (ebp): 8)
    00000038 08048422 08048423 (DW_OP_breg4 (esp): 4)
    00000038 <End of list>

Offset column 0×38 values are the entries for main function variables. Each entry here describes possible frame base address with respect to where debugger may be paused by break point within function instructions; it specifies the current frame base from which offsets to variables are to be computed as an offset from a register. For x86, bpreg4 refers to esp and bpreg5 refers to ebp. Before analyzing further lets look at disassemble dump for main function

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
080483d2 <main>:
 80483d2:   55                      push   %ebp
 80483d3:   89 e5                   mov    %esp,%ebp
 80483d5:   83 e4 f0                and    $0xfffffff0,%esp
 80483d8:   83 ec 20                sub    $0x20,%esp
 80483db:   c7 44 24 1c 0a 00 00    movl   $0xa,0x1c(%esp)
 80483e2:   00
 80483e3:   c7 44 24 18 14 00 00    movl   $0x14,0x18(%esp)
 80483ea:   00
 80483eb:   c7 44 24 14 c4 83 04    movl   $0x80483c4,0x14(%esp)
 80483f2:   08
 80483f3:   8b 44 24 18             mov    0x18(%esp),%eax
 80483f7:   89 44 24 04             mov    %eax,0x4(%esp)
 80483fb:   8b 44 24 1c             mov    0x1c(%esp),%eax
 80483ff:   89 04 24                mov    %eax,(%esp)
 8048402:   8b 44 24 14             mov    0x14(%esp),%eax
 8048406:   ff d0                               call   *%eax
 8048408:   89 44 24 10             mov    %eax,0x10(%esp)
 804840c:   b8 f0 84 04 08          mov    $0x80484f0,%eax
 8048411:   8b 54 24 10             mov    0x10(%esp),%edx
 8048415:   89 54 24 04             mov    %edx,0x4(%esp)
 8048419:   89 04 24                mov    %eax,(%esp)
 804841c:   e8 d3 fe ff ff          call   80482f4 <printf@plt>
 8048421:   c9                      leave 
 8048422:   c3                      ret

First two instructions deal with function’s preamble, function’s stack frame base pointer is determined after pre-amble instructions are executed. Ebp remains constant throughout function’s execution and esp keeps changing with data being pushed and popped from the stack frame. From the above dump instructions at offset 80483db and 80483e3 are assigning values 10 and 20 to variables a, and b. These variables are being accessed their offset in stack frame relative to location of current esp( variable a: 0x1c(%esp), variable b: 0×18(%esp)). Now let’s assume that break point was set after initializing a and b variables and program paused, and we have run print command to view conents of a or b variables. Debugger would access 3rd record of the main function’s dwarf debug.loc table since our break point falls between 080483d5 – 08048422 region of the function code.

1
2
3
4
00000038 080483d2 080483d3 (DW_OP_breg4 (esp): 4)
00000038 080483d3 080483d5 (DW_OP_breg4 (esp): 8)
00000038 080483d5 08048422 (DW_OP_breg5 (ebp): 8)--- 3rd record
00000038 08048422 08048423 (DW_OP_breg4 (esp): 4)

Now as per records debugger will locate a with esp + 28 , b with esp +24 and so on…

Looking up line number information

We can set breakpoints mentioning line no’s Lets now look at how debuggers resolve line no’s to machine instruction’s? DWARF encodes a full mapping between lines in the C source code and machine code addresses in the executable. This information is contained in the .debug_line section and can be extracted using objdump

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
root@techveda:~# objdump --dwarf=decodedline ./sample
 
./sample:     file format elf32-i386
 
Decoded dump of debug contents of section .debug_line:
 
CU: sample.c:
File name                            Line number    Starting address                   
sample.c                                       5          0x80483c4
sample.c                                       6          0x80483c7
sample.c                                       7          0x80483d0
sample.c                                       9          0x80483d2
sample.c                                      10          0x80483db
sample.c                                      14          0x80483eb
sample.c                                      15          0x80483f3
sample.c                                      16          0x804840c
sample.c                                      17          0x8048421

This dump shows machine instruction line no’s for our program. It is quite obvious that line no’s for non-executable statements of the source file need not be tracked by dwarf .we can map the above dump to our source code for analysis.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
1 #include <stdio.h>
 2 #include <stdlib.h>
 3
 4 int add(int x, int y)
 5 {
 6         return x + y;
 7 }
 8 int main()
 9 {
10         int a = 10, b = 20;
11         int result;
12         int (*fp) (int, int);
13
14         fp = add;
15         result = (*fp) (a, b);
16         printf(" %dn", result);
17 }

From the above it should be clear of what debugger does it is instructed to set breakpoint at entry into function add, it would insert break point at line no 6 and pause after pre-amble of function add is executed.

What’s next?

if you are into implementation of debugging tools or involved in writing programs/tools that simulate debugger facilities/ read binary files , you may be interested in specific programming libraries libbfd or libdwarf.

Binary File Descriptor library (BFD) or libbfd as it is called provides ready to use functions to read into ELF and other popular binary files. BFD works by presenting a common abstract view of object files. An object file has a “header” with descriptive info; a variable number of “sections” that each has a name, some attributes, and a block of data; a symbol table; relocation entries; and so forth. Gnu binutils package tools like objdump, readelf and others have been written using these libraries.

Libdwarf is a C library intended to simplify reading (and writing) applications built with DWARF2, DWARF3 debug information.

Dwarfdump is an application written using libdwarf to print dwarf information in a human readable format. It is also open sourced and is copyrighted GPL. It provides an example of using libdwarf to read DWARF2/3 information as well as providing readable text output.

References

Dwarf Standard
Libdwarf Wiki
Libbfd draft

GDB MI接口相关

所谓GDB MI就是GNU Debugger Machine-Interface,是GNU设计来给其它前端使用的交互协议.

说实在的,这个接口设计得并不是很好,仅仅是能用而已.它的指令和GDB/CLI即GDB Command Line Interface基本是对应的.
为 了方便机器交互,它把允许所有的指令有一个前缀,比如901-stack-list-frames 0 99.这样,在GDB返回的结果前,也会有同样的前缀901,我们可以根据这个前缀进行命令/结果匹配. 同时它还保证结果格式的统一性.不会出现CLI那种百花齐放的结果.当然,如果你安了GDB的python插件来做变量格式化,就可能出现例外的情况.

它的命令有同步的,也有异步的.这对于前端的设计是个很大的障碍,而且除了命令结果以外,它还会有很多异步的消息,事件…这些东西混在一起,处理起来会相当麻烦.

这是 GDB/MI的官方文档 有兴趣的话可以仔细研究一下.

调试器工作原理之三——调试信息

调试器工作原理之一——基础篇
调试器工作原理之二——实现断点
调试器工作原理之三——调试信息

本篇主要内容

在本文中我将向大家解释关于调试器是如何在机器码中寻找C函数以及变量的,以及调试器使用了何种数据能够在C源代码的行号和机器码中来回映射。

调试信息

现代的编译器在转换高级语言程序代码上做得十分出色,能够将源代码中漂亮的缩进、嵌套的控制结构以及任意类型的 变量全都转化为一长串的比特流——这就是机器码。这么做的唯一目的就是希望程序能在目标CPU上尽可能快的运行。大多数的C代码都被转化为一些机器码指 令。变量散落在各处——在栈空间里、在寄存器里,甚至完全被编译器优化掉。结构体和对象甚至在生成的目标代码中根本不存在——它们只不过是对内存缓冲区中 偏移量的抽象化表示。

那么当你在某些函数的入口处设置断点时,调试器如何知道该在哪里停止目标进程的运行呢?当你希望查看一个变量的值时,调试器又是如何找到它并展示给你呢?答案就是——调试信息。

调试信息是在编译器生成机器码的时候一起产生的。它代表着可执行程序和源代码之间的关系。这个信息以预定义的格式进行编码,并同机器码一起存储。许 多年以来,针对不同的平台和可执行文件,人们发明了许多这样的编码格式。由于本文的主要目的不是介绍这些格式的历史渊源,而是为您展示它们的工作原理,所 以我们只介绍一种最重要的格式,这就是DWARF。作为Linux以及其他类Unix平台上的ELF可执行文件的调试信息格式,如今的DWARF可以说是 无处不在。

ELF文件中的DWARF格式

根据维基百科上的词条解释,DWARF是同ELF可执行文件格式一同设计出来的,尽管在理论上DWARF也能够嵌入到其它的对象文件格式中。

DWARF是一种复杂的格式,在多种体系结构和操作系统上经过多年的探索之后,人们才在之前的格式基础上创建了DWARF。它肯定是很复杂的,因为 它解决了一个非常棘手的问题——为任意类型的高级语言和调试器之间提供调试信息,支持任意一种平台和应用程序二进制接口(ABI)。要完全解释清楚这个主 题,本文就显得太微不足道了。说实话,我也不理解其中的所有角落。本文我将采取更加实践的方法,只介绍足量的DWARF相关知识,能够阐明实际工作中调试 信息是如何发挥其作用的就可以了。

ELF文件中的调试段

首先,让我们看看DWARF格式信息处在ELF文件中的什么位置上。ELF可以为每个目标文件定义任意多个段(section)。而Section header表中则定义了实际存在有哪些段,以及它们的名称。不同的工具以各自特殊的方式来处理这些不同的段,比如链接器只寻找它关注的段信息,而调试器 则只关注其他的段。 我们通过下面的C代码构建一个名为traceprog2的可执行文件来做下实验。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#include <stdio.h>
 
void do_stuff(int my_arg)
{
	int my_local = my_arg + 2;
	int i;
 
	for (i = 0; i < my_local; ++i)
		printf("i = %d\n", i);
}
 
int main()
{
	do_stuff(2);
	return 0;
}

通过objdump –h导出ELF可执行文件中的段头信息,我们注意到其中有几个段的名字是以.debug_打头的,这些就是DWARF格式的调试段:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
26 .debug_aranges 00000020  00000000  00000000  00001037
                  CONTENTS, READONLY, DEBUGGING
27 .debug_pubnames    00000028  00000000  00000000  00001057
                  CONTENTS, READONLY, DEBUGGING
28 .debug_info    000000cc  00000000  00000000  0000107f
                  CONTENTS, READONLY, DEBUGGING
29 .debug_abbrev  0000008a  00000000  00000000  0000114b
                  CONTENTS, READONLY, DEBUGGING
30 .debug_line    0000006b  00000000  00000000  000011d5
                  CONTENTS, READONLY, DEBUGGING
31 .debug_frame   00000044  00000000  00000000  00001240
                  CONTENTS, READONLY, DEBUGGING
32 .debug_str     000000ae  00000000  00000000  00001284
                  CONTENTS, READONLY, DEBUGGING
33 .debug_loc     00000058  00000000  00000000  00001332
                  CONTENTS, READONLY, DEBUGGING

每行的第一个数字表示每个段的大小,而最后一个数字表示距离ELF文件开始处的偏移量。调试器就是利用这个信息来从可执行文件中读取相关的段信息。现在,让我们通过一些实际的例子来看看如何在DWARF中找寻有用的调试信息。

定位函数

当我们在调试程序时,一个最为基本的操作就是在某些函数中设置断点,期望调试器能在函数入口处将程序断下。要完成这个功能,调试器必须具有某种能够从源代码中的函数名称到机器码中该函数的起始指令间相映射的能力。

这个信息可以通过从DWARF中的.debug_info段获取到。在我们继续之前,先说点背景知识。DWARF的基本描述实体被称为调试信息表项 (Debugging Information Entry —— DIE),每个DIE有一个标签——包含它的类型,以及一组属性。各个DIE之间通过兄弟和孩子结点互相链接,属性值可以指向其他的DIE。
我们运行

1
objdump –dwarf=info traceprog2

得到的输出非常长,对于这个例子,我们只用关注这几行就可以了:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
<1><71>: Abbrev Number: 5 (DW_TAG_subprogram)
  <72>   DW_AT_external : 1
  <73>   DW_AT_name     : (...): do_stuff
  <77>   DW_AT_decl_file    : 1
  <78>   DW_AT_decl_line    : 4
  <79>   DW_AT_prototyped   : 1
  <7a>   DW_AT_low_pc       : 0x8048604
  <7e>   DW_AT_high_pc  : 0x804863e
  <82>   DW_AT_frame_base   : 0x0     (location list)
  <86>   DW_AT_sibling  : <0xb3>
 
<1><b3>: Abbrev Number: 9 (DW_TAG_subprogram)
  <b4>   DW_AT_external : 1
  <b5>   DW_AT_name     : (...): main
  <b9>   DW_AT_decl_file  : 1
  <ba>   DW_AT_decl_line  : 14
  <bb>   DW_AT_type     : <0x4b>
  <bf>   DW_AT_low_pc       : 0x804863e
  <c3>   DW_AT_high_pc  : 0x804865a
<c7>   DW_AT_frame_base     : 0x2c   (location list)

这里有两个被标记为DW_TAG_subprogram的DIE,从DWARF的角度看这就是函数。注意,这里do_stuff和main都各有一 个表项。这里有许多有趣的属性,但我们感兴趣的是DW_AT_low_pc。这就是函数起始处的程序计数器的值(x86下的EIP)。注意,对于 do_stuff来说,这个值是0×8048604。现在让我们看看,通过objdump –d做反汇编后这个地址是什么:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
08048604 <do_stuff>:
 8048604:       55           push   ebp
 8048605:       89 e5        mov    ebp,esp
 8048607:       83 ec 28     sub    esp,0x28
 804860a:       8b 45 08     mov    eax,DWORD PTR [ebp+0x8]
 804860d:       83 c0 02     add    eax,0x2
 8048610:       89 45 f4     mov    DWORD PTR [ebp-0xc],eax
 8048613:       c7 45 (...)  mov    DWORD PTR [ebp-0x10],0x0
 804861a:       eb 18        jmp    8048634 <do_stuff+0x30>
 804861c:       b8 20 (...)  mov    eax,0x8048720
 8048621:       8b 55 f0     mov    edx,DWORD PTR [ebp-0x10]
 8048624:       89 54 24 04  mov    DWORD PTR [esp+0x4],edx
 8048628:       89 04 24     mov    DWORD PTR [esp],eax
 804862b:       e8 04 (...)  call   8048534 <printf@plt>
 8048630:       83 45 f0 01  add    DWORD PTR [ebp-0x10],0x1
 8048634:       8b 45 f0     mov    eax,DWORD PTR [ebp-0x10]
 8048637:       3b 45 f4     cmp    eax,DWORD PTR [ebp-0xc]
 804863a:       7c e0        jl     804861c <do_stuff+0x18>
 804863c:       c9           leave
 804863d:       c3           ret

没错,从反汇编结果来看0×8048604确实就是函数do_stuff的起始地址。因此,这里调试器就同函数和它们在可执行文件中的位置确立了映射关系。

定位变量

假设我们确实在do_stuff中的断点处停了下来。我们希望调试器能够告诉我们my_local变量的值,调试器怎么知道去哪里找到相关的信息 呢?这可比定位函数要难多了,因为变量可以在全局数据区,可以在栈上,甚至是在寄存器中。另外,具有相同名称的变量在不同的词法作用域中可能有不同的值。 调试信息必须能够反映出所有这些变化,而DWARF确实能做到这些。

我不会涵盖所有的可能情况,作为例子,我将只展示调试器如何在do_stuff函数中定位到变量my_local。我们从.debug_info段开始,再次看看do_stuff这一项,这一次我们也看看其他的子项:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
<1><71>: Abbrev Number: 5 (DW_TAG_subprogram)
    <72>   DW_AT_external    : 1
    <73>   DW_AT_name        : (...): do_stuff
    <77>   DW_AT_decl_file   : 1
    <78>   DW_AT_decl_line   : 4
    <79>   DW_AT_prototyped  : 1
    <7a>   DW_AT_low_pc      : 0x8048604
    <7e>   DW_AT_high_pc     : 0x804863e
    <82>   DW_AT_frame_base  : 0x0      (location list)
    <86>   DW_AT_sibling     : <0xb3>
 <2><8a>: Abbrev Number: 6 (DW_TAG_formal_parameter)
    <8b>   DW_AT_name        : (...): my_arg
    <8f>   DW_AT_decl_file   : 1
    <90>   DW_AT_decl_line   : 4
    <91>   DW_AT_type        : <0x4b>
    <95>   DW_AT_location    : (...)       (DW_OP_fbreg: 0)
 <2><98>: Abbrev Number: 7 (DW_TAG_variable)
    <99>   DW_AT_name        : (...): my_local
    <9d>   DW_AT_decl_file   : 1
    <9e>   DW_AT_decl_line   : 6
    <9f>   DW_AT_type        : <0x4b>
    <a3>   DW_AT_location    : (...)      (DW_OP_fbreg: -20)
<2><a6>: Abbrev Number: 8 (DW_TAG_variable)
    <a7>   DW_AT_name        : i
    <a9>   DW_AT_decl_file   : 1
    <aa>   DW_AT_decl_line   : 7
    <ab>   DW_AT_type        : <0x4b>
<af>   DW_AT_location    : (...)      (DW_OP_fbreg: -24)

注意每一个表项中第一个尖括号里的数字,这表示嵌套层次——在这个例子中带有<2>的表项都是表项<1>的子项。因此我们 知道变量my_local(以DW_TAG_variable作为标签)是函数do_stuff的一个子项。调试器同样还对变量的类型感兴趣,这样才能正 确的显示变量的值。这里my_local的类型根据DW_AT_type标签可知为<0x4b>。如果查看objdump的输出,我们会发现 这是一个有符号4字节整数。

要在执行进程的内存映像中实际定位到变量,调试器需要检查DW_AT_location属性。对于my_local来说,这个属性为 DW_OP_fberg: -20。这表示变量存储在从所包含它的函数的DW_AT_frame_base属性开始偏移-20处,而DW_AT_frame_base正代表了该函数 的栈帧起始点。

函数do_stuff的DW_AT_frame_base属性的值是0×0(location list),这表示该值必须要在location list段去查询。我们看看objdump的输出:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$ objdump --dwarf=loc tracedprog2
 
tracedprog2:     file format elf32-i386
 
Contents of the .debug_loc section:
 
    Offset   Begin    End      Expression
    00000000 08048604 08048605 (DW_OP_breg4: 4 )
    00000000 08048605 08048607 (DW_OP_breg4: 8 )
    00000000 08048607 0804863e (DW_OP_breg5: 8 )
    00000000 <End of list>
    0000002c 0804863e 0804863f (DW_OP_breg4: 4 )
    0000002c 0804863f 08048641 (DW_OP_breg4: 8 )
    0000002c 08048641 0804865a (DW_OP_breg5: 8 )
0000002c <End of list>

关于位置信息,我们这里感兴趣的就是第一个。对于调试器可能定位到的每一个地址,它都会指定当前栈帧到变量间的偏移量,而这个偏移就是通过寄存器来计算的。对于x86体系结构,bpreg4代表esp寄存器,而bpreg5代表ebp寄存器。

让我们再看看do_stuff的开头几条指令:

1
2
3
4
5
6
7
08048604 <do_stuff>:
 8048604:       55          push   ebp
 8048605:       89 e5       mov    ebp,esp
 8048607:       83 ec 28    sub    esp,0x28
 804860a:       8b 45 08    mov    eax,DWORD PTR [ebp+0x8]
 804860d:       83 c0 02    add    eax,0x2
 8048610:       89 45 f4    mov    DWORD PTR [ebp-0xc],eax

注意,ebp只有在第二条指令执行后才与我们建立起关联,对于前两个地址,基地址由前面列出的位置信息中的esp计算得出。一旦得到了ebp的有效值,就可以很方便的计算出与它之间的偏移量。因为之后ebp保持不变,而esp会随着数据压栈和出栈不断移动。

那么这到底为我们定位变量my_local留下了什么线索?我们感兴趣的只是在地址0×8048610上的指令执行过后my_local的值(这里 my_local的值会通过eax寄存器计算,而后放入内存)。因此调试器需要用到DW_OP_breg5: 8 基址来定位。现在回顾一下my_local的DW_AT_location属性:DW_OP_fbreg: -20。做下算数:从基址开始偏移-20,那就是ebp – 20,再偏移+8,我们得到ebp – 12。现在再看看反汇编输出,注意到数据确实是从eax寄存器中得到的,而ebp – 12就是my_local存储的位置。

定位到行号

当我说到在调试信息中寻找函数时,我撒了个小小的谎。当我们调试C源代码并在函数中放置了一个断点时,我们通常并不会对第一条机器码指令感兴趣。我们真正感兴趣的是函数中的第一行C代码。

这就是为什么DWARF在可执行文件中对C源码到机器码地址做了全部映射。这部分信息包含在.debug_line段中,可以按照可读的形式进行解读:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
$ objdump --dwarf=decodedline tracedprog2
 
tracedprog2:     file format elf32-i386
 
Decoded dump of debug contents of section .debug_line:
 
CU: /home/eliben/eli/eliben-code/debugger/tracedprog2.c:
File name           Line number    Starting address
tracedprog2.c                5           0x8048604
tracedprog2.c                6           0x804860a
tracedprog2.c                9           0x8048613
tracedprog2.c               10           0x804861c
tracedprog2.c                9           0x8048630
tracedprog2.c               11           0x804863c
tracedprog2.c               15           0x804863e
tracedprog2.c               16           0x8048647
tracedprog2.c               17           0x8048653
tracedprog2.c               18           0x8048658

不难看出C源码同反汇编输出之间的关系。第5行源码指向函数do_stuff的入口点——地址0×8040604。接下第6行源码,当在 do_stuff上设置断点时,这里就是调试器实际应该停下的地方,它指向地址0x804860a——刚过do_stuff的开场白。这个行信息能够方便 的在C源码的行号同指令地址间建立双向的映射关系。
1. 当在某一行上设定断点时,调试器将利用行信息找到实际应该陷入的地址(还记得前一篇中的int 3指令吗?)
2. 当某个指令引起段错误时,调试器会利用行信息反过来找出源代码中的行号,并告诉用户。

libdwarf —— 在程序中访问DWARF

通过命令行工具来访问DWARF信息这虽然有用但还不能完全令我们满意。作为程序员,我们希望知道应该如何写出实际的代码来解析DWARF格式并从中读取我们需要的信息。

自然的,一种方法就是拿起DWARF规范开始钻研。还记得每个人都告诉你永远不要自己手动解析HTML,而应该使用函数库来做吗?没错,如果你要手 动解析DWARF的话情况会更糟糕,DWARF比HTML要复杂的多。本文展示的只是冰山一角而已。更困难的是,在实际的目标文件中,这些信息大部分都以 非常紧凑和压缩的方式进行编码处理。

因此我们要走另一条路,使用一个函数库来同DWARF打交道。我知道的这类函数库主要有两个:
1. BFD(libbfd),GNU binutils就是使用的它,包括本文中多次使用到的工具objdump,ld(GNU链接器),以及as(GNU汇编器)。
2. libdwarf —— 同它的老大哥libelf一样,为Solaris以及FreeBSD系统上的工具服务。
我这里选择了libdwarf,因为对我来说它看起来没那么神秘,而且license更加自由(LGPL,BFD是GPL)。
由于libdwarf自身非常复杂,需要很多代码来操作。我这里不打算把所有代码贴出来,但你可以下载,然后自己编译运行。要编译这个文件,你需要安装libelf以及libdwarf,并在编译时为链接器提供-lelf以及-ldwarf标志。

这个演示程序接收一个可执行文件,并打印出程序中的函数名称同函数入口点地址。下面是本文用以演示的C程序产生的输出:

1
2
3
4
5
6
7
$ dwarf_get_func_addr tracedprog2
DW_TAG_subprogram: 'do_stuff'
low pc  : 0x08048604
high pc : 0x0804863e
DW_TAG_subprogram: 'main'
low pc  : 0x0804863e
high pc : 0x0804865a

libdwarf的文档非常好(见本文的参考文献部分),花点时间看看,对于本文中提到的DWARF段信息你处理起来就应该没什么问题了。

结论及下一步

调试信息只是一个简单的概念,具体实现细节可能相当复杂。但最终我们知道了调试器是如何从可执行文件中找出同源代码之间的关系。有了调试信息在手,调试器为用户所能识别的源代码和数据结构同可执行文件之间架起了一座桥。

本文加上之前的两篇文章总结了调试器内部的工作原理。通过这一系列文章,再加上一点编程工作就应该可以在Linux下创建一个具有基本功能的调试器。

至于下一步,我还不确定。也许我会就此终结这一系列文章,也许我会再写一些高级主题比如backtrace,甚至Windows系统上的调试。读者们也可以为今后这一系列文章提供意见和想法。不要客气,请随意在评论栏或通过Email给我提些建议吧。