kk Blog —— 通用基础

date [-d @int|str] [+%s|"+%F %T"]

offsetof宏 container_of宏

Linux内核中,用两个非常巧妙地宏实现了,一个是offsetof宏,另一个是container_of宏,下面讲解一下这两个宏。

1. offsetof宏

【定义】:
1
#define offsetof(TYPE, MEMBER) ((size_t) & ((TYPE *)0)->MEMBER )
【功能】: 获得一个结构体变量成员在此结构体中的偏移量。
【例子】:
1
2
3
4
5
6
7
8
9
10
11
struct A 
	{ 
	int x ; 
	int y; 
	int z; 
}; 

void main() 
{ 
	printf("the offset of z is %d",offsetof( struct A, z )  ); 
} 

// 输出结果为 8

【分析】:

该宏,TYPE为结构体类型,MEMBER 为结构体内的变量名。
(TYPE )0) 是欺骗编译器说有一个指向结构TYPE 的指针,其地址值0
(TYPE
)0)->MEMBER 是要取得结构体TYPE中成员变量MEMBER的地址. 因为基址为0,所以,这时MEMBER的地址当然就是MEMBER在TYPE中的偏移了。

2. container_of宏(即实现了题目中的功能)

【定义】:
1
#define container_of(ptr, type, member)   ({const typeof( ((type *)0)->member ) *__mptr = (ptr); (type *)( (char *)__mptr - offsetof(type,member) );})
【功能】:

从结构体(type)某成员变量(member)指针(ptr)来求出该结构体(type)的首指针。

【例子】:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
struct A 
{ 
	int x ; 
	int y; 
	int z; 
}; 
 
struct A myTest; 
 
int *pz = &myTest.z; 
 
struct A* getHeaderPtr( int *pz ) 
{ 
	return container_of( pz , struct A, z ); 
} 
【分析】:

(1) typeof( ( (type )0)->member )为取出member成员的变量类型。
(2) 定义__mptr指针ptr为指向该成员变量的指针(即指向ptr所指向的变量处)
(3) (char
)__mptr - offsetof(type,member)) 用该成员变量的实际地址减去该变量在结构体中的偏移,来求出结构体起始地址。
(4) ({ })这个扩展返回程序块中最后一个表达式的值。

Universal Build-ID

http://fedoraproject.org/wiki/Summer_Coding_2010_ideas_-_Universal_Build-ID

Summary

Build-IDs are currently being put into binaries, shared libraries, core files and related debuginfo files to uniquely identify the build a user or developer is working with. There are a couple of conventions in place to use this information to identify “currently running” or “distro installed” builds. This helps with identifying what was being run and match it to the corresponding package, sources and debuginfo for tools that want to help the user show what is going on (at the moment mostly when things break). We would like to extend this to a more universial approach, that helps people identify historical, local, non- or cross-distro or organisational builds. So that Build-IDs become useful outside the current “static” setup and retain information over time and across upgrades.

Build-ID background

Build-IDs are unique identifiers of “builds”. A build is an executable, a shared library, the kernel, a module, etc. You can also find the build-id in a running process, a core file or a separate debuginfo file.

The main idea behind Build-IDs is to make elf files “self-identifying”. This means that when you have a Build-ID it should uniquely identify a final executable or shared library. The default Build-ID calculation (done through ld –build-id, see the ld manual) calculates a sha1 hash (160 bits/20 bytes) based on all the ELF header bits and section contents in the file. Which means that it is unique among the set of meaningful contents for ELF files and identical when the output file would otherwise have been identical. GCC now passes –build-id to the linker by default.

When an executable or shared library is loaded into memory the Build-ID will also be loaded into memory, a core dump of a process will also have the Build-IDs of the executable and the shared libraries embedded. And when separating debuginfo from the main executable or shared library into .debug files the original Build-ID will also be copied over. This means it is easy to match a core file or a running process to the original executable and shared library builds. And that matching those against the debuginfo files that provide more information for introspection and debugging should be trivial.

Fedora has had full support for build-ids since Fedora Core 8: https://fedoraproject.org/wiki/Releases/FeatureBuildId

Getting Build-IDs

A simple way to get the build-id(s) is through eu-unstrip (part of elfutils).

build-id from an executable, shared library or separate debuginfo file:
$ eu-unstrip -n -e <exec|.sharedlib|.debug>

build-ids of an executable and all shared libraries from a core file:
$ eu-unstrip -n –core

build-ids of an executable and all shared libraries of a running process:
$ eu-unstrip -n –pid

build-id of the running kernel and all loaded modules:
$ eu-unstrip -n -k

强制内联和强制不内联

1.强制不内联

一个函数,如果代码量比较少的话,用 -O3优化开关的话,gcc有可能将这个函数强制内联(inline)即使,你在函数前没有写inline助记符。
如果是一个手写汇编的函数,那样的话很有可能破坏参数。gcc里有强制不内联的,用法如下

1
void foo() __attribute__((noinline));

但是有的gcc可能会忽略 noinline。
那么你可以将你实现的这个函数写到调用函数之后,就不会被inline了。这是因为编译器gcc只内联当前函数之前可见(实现代码在前)的函数。

2.优化时无法识别inline函数中的ASM汇编

当GCC尝试内联一个函数时,如果该函数中存在内联汇编,则该汇编语句块可能被丢弃;

1
2
3
4
5
6
7
8
9
10
11
12
13
__inline__ __attribute__((always_inline))int Increment(int volatile *add, int inc)
{
    int res;
    __asm__
    (
    "lock \n\t"
    "xaddl %0,(%1)\n\t"
    :"=r"(res)
    :"r"(add),"0"(inc)
    :"memory"
    );
    return res;
}

kdump el5 --dump-dmesg 错误

原因:

http://vault.centos.org/5.11/os/SRPMS/kexec-tools-1.102pre-165.el5.src.rpm
这个包的一个patch(kexec-tools-1.102pre-makedumpfile-dump-dmesg.patch)是为了得到dmesg的,
但是它判断dmesg的结束是用logged_chars(看kernel/printk.c),logged_chars应该是输出的结束,所以不对。
改成log_end就行,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
diff --git a/kexec-tools-1.102pre-makedumpfile-dump-dmesg.patch b/kexec-tools-1.102pre-makedumpfile-dump-dmesg.patch
index 3938280..76c402a 100644
--- a/kexec-tools-1.102pre-makedumpfile-dump-dmesg.patch
+++ b/kexec-tools-1.102pre-makedumpfile-dump-dmesg.patch
@@ -68,7 +68,7 @@ diff -up kexec-tools-testing-20070330/makedumpfile/makedumpfile.c.orig kexec-too
 +dump_dmesg()
 +{
 +      int log_buf_len, length_log, length_oldlog, ret = FALSE;
-+      unsigned long log_addr, logged_chars, index;
++      unsigned long log_addr, logged_chars, log_end, index;
 +      char *log_buffer = NULL;
 +
 +      if (!open_files_for_creating_dumpfile())
@@ -101,10 +101,15 @@ diff -up kexec-tools-testing-20070330/makedumpfile/makedumpfile.c.orig kexec-too
 +               printf("Failed to get logged_chars.\n");
 +               return FALSE;
 +      }
++      if (!readmem(VADDR, SYMBOL(log_end), &log_end, sizeof(log_end))) {
++               printf("Failed to get log_end.\n");
++               return FALSE;
++      }
 +      DEBUG_MSG("\n");
 +      DEBUG_MSG("log_addr     : %lx\n", log_addr);
 +      DEBUG_MSG("log_buf_len  : %d\n", log_buf_len);
 +      DEBUG_MSG("logged_chars : %ld\n", logged_chars);
++      DEBUG_MSG("log_end      : %ld\n", log_end);
 +
 +      if ((log_buffer = malloc(log_buf_len)) == NULL) {
 +               ERRMSG("Can't allocate memory for log_buf. %s\n",
@@ -112,21 +117,16 @@ diff -up kexec-tools-testing-20070330/makedumpfile/makedumpfile.c.orig kexec-too
 +               return FALSE;
 +       }
 +
-+      if (logged_chars < log_buf_len) {
++      if (log_end < log_buf_len) {
 +               index = 0;
-+               length_log = logged_chars;
++               length_log = log_end;
 +
 +               if(!readmem(VADDR, log_addr, log_buffer, length_log)) {
 +                        printf("Failed to read dmesg log.\n");
 +                        goto out;
 +               }
 +      } else {
-+               if (!readmem(VADDR, SYMBOL(log_end), &index, sizeof(index))) {
-+                        printf("Failed to get log_end.\n");
-+                        goto out;
-+               }
-+               DEBUG_MSG("log_end      : %lx\n", index);
-+               index &= log_buf_len - 1;
++               index = log_end & (log_buf_len - 1);
 +               length_log = log_buf_len;
 +               length_oldlog = log_buf_len - index;
 +


如果不修改上面bug,kdump得到vmcore后用 makedumpfile –dump-dmesg 无法解得dmesg,补救办法如下:

kdump得到vmcore后
1、vmlinux没有debuginfo,crash不能运行
2、makedumpfile -F –dump-dmesg vmcore > dmesg 只能显示开头一下部分dmesg (不懂为什么)
解决:

方法一、

通过/boot/System.map 或者 /proc/kallsyms 找到 log_buf 地址,例如 0xffffffff81a9ac30

1
2
3
4
5
6
7
8
9
10
11
12
13
14
gdb vmlinux vmcore

set print repeats 100
set print elements 0
set logging file XXX
set pagination off
set logging on
p {char*} 0xffffffff81a9ac30
quit

---

vi XXX
:%s/\\n/\r/g
方法二、

是另一命令,但不好用

1
2
3
4
5
6
7
8
9
10
11
cat /proc/kallsyms | grep log_end
ffffffff81e30de0 b log_end

x/1dw 0xffffffff81e30de0
0xffffffff81e30de0:     85689

x/1xg 0xffffffff81a9ac30
0xffffffff81a9ac30:     0xffffffff81e30ee0

显示最后4000字符
x/5s 0xffffffff81e30ee0+85689-4000

gdb 没有debug信息step单步调试

1
step <count>

单步跟踪,如果有函数调用,他会进入该函数。进入函数的前提是,此函数被编译有 debug信息。很像 VC等工具中的 step in。后面可以加 count也可以不加,不加表示一条条地执行,加表示执行后面的 count条指令,然后再停住。

1
next <count>

同样单步跟踪,如果有函数调用,他不会进入该函数。很像 VC等工具中的 step over。后面可以加 count也可以不加,不加表示一条条地执行,加表示执行后面的 count条指令,然后再停住。

1
2
3
4
5
6
set step-mode [on/off]
set step-mode on
打开 step-mode模式,于是,在进行单步跟踪时,程序不会因为没有 debug信息而不停住。这个参数有很利于查看机器码。

set step-mod off
关闭 step-mode模式。