kk Blog —— 通用基础


date [-d @int|str] [+%s|"+%F %T"]
netstat -ltunp
sar -n DEV 1

offsetof宏 container_of宏

Linux内核中,用两个非常巧妙地宏实现了,一个是offsetof宏,另一个是container_of宏,下面讲解一下这两个宏。

1. offsetof宏

【定义】:
1
#define offsetof(TYPE, MEMBER) ((size_t) & ((TYPE *)0)->MEMBER )
【功能】: 获得一个结构体变量成员在此结构体中的偏移量。
【例子】:
1
2
3
4
5
6
7
8
9
10
11
struct A 
	{ 
	int x ; 
	int y; 
	int z; 
}; 

void main() 
{ 
	printf("the offset of z is %d",offsetof( struct A, z )  ); 
} 

// 输出结果为 8

【分析】:

该宏,TYPE为结构体类型,MEMBER 为结构体内的变量名。
(TYPE )0) 是欺骗编译器说有一个指向结构TYPE 的指针,其地址值0
(TYPE
)0)->MEMBER 是要取得结构体TYPE中成员变量MEMBER的地址. 因为基址为0,所以,这时MEMBER的地址当然就是MEMBER在TYPE中的偏移了。

2. container_of宏(即实现了题目中的功能)

【定义】:
1
#define container_of(ptr, type, member)   ({const typeof( ((type *)0)->member ) *__mptr = (ptr); (type *)( (char *)__mptr - offsetof(type,member) );})
【功能】:

从结构体(type)某成员变量(member)指针(ptr)来求出该结构体(type)的首指针。

【例子】:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
struct A 
{ 
	int x ; 
	int y; 
	int z; 
}; 
 
struct A myTest; 
 
int *pz = &myTest.z; 
 
struct A* getHeaderPtr( int *pz ) 
{ 
	return container_of( pz , struct A, z ); 
} 
【分析】:

(1) typeof( ( (type )0)->member )为取出member成员的变量类型。
(2) 定义__mptr指针ptr为指向该成员变量的指针(即指向ptr所指向的变量处)
(3) (char
)__mptr - offsetof(type,member)) 用该成员变量的实际地址减去该变量在结构体中的偏移,来求出结构体起始地址。
(4) ({ })这个扩展返回程序块中最后一个表达式的值。

Universal Build-ID

http://fedoraproject.org/wiki/Summer_Coding_2010_ideas_-_Universal_Build-ID

Summary

Build-IDs are currently being put into binaries, shared libraries, core files and related debuginfo files to uniquely identify the build a user or developer is working with. There are a couple of conventions in place to use this information to identify “currently running” or “distro installed” builds. This helps with identifying what was being run and match it to the corresponding package, sources and debuginfo for tools that want to help the user show what is going on (at the moment mostly when things break). We would like to extend this to a more universial approach, that helps people identify historical, local, non- or cross-distro or organisational builds. So that Build-IDs become useful outside the current “static” setup and retain information over time and across upgrades.

Build-ID background

Build-IDs are unique identifiers of “builds”. A build is an executable, a shared library, the kernel, a module, etc. You can also find the build-id in a running process, a core file or a separate debuginfo file.

The main idea behind Build-IDs is to make elf files “self-identifying”. This means that when you have a Build-ID it should uniquely identify a final executable or shared library. The default Build-ID calculation (done through ld –build-id, see the ld manual) calculates a sha1 hash (160 bits/20 bytes) based on all the ELF header bits and section contents in the file. Which means that it is unique among the set of meaningful contents for ELF files and identical when the output file would otherwise have been identical. GCC now passes –build-id to the linker by default.

When an executable or shared library is loaded into memory the Build-ID will also be loaded into memory, a core dump of a process will also have the Build-IDs of the executable and the shared libraries embedded. And when separating debuginfo from the main executable or shared library into .debug files the original Build-ID will also be copied over. This means it is easy to match a core file or a running process to the original executable and shared library builds. And that matching those against the debuginfo files that provide more information for introspection and debugging should be trivial.

Fedora has had full support for build-ids since Fedora Core 8: https://fedoraproject.org/wiki/Releases/FeatureBuildId

Getting Build-IDs

A simple way to get the build-id(s) is through eu-unstrip (part of elfutils).

build-id from an executable, shared library or separate debuginfo file:
$ eu-unstrip -n -e <exec|.sharedlib|.debug>

build-ids of an executable and all shared libraries from a core file:
$ eu-unstrip -n –core

build-ids of an executable and all shared libraries of a running process:
$ eu-unstrip -n –pid

build-id of the running kernel and all loaded modules:
$ eu-unstrip -n -k

the meaning of '?' in Linux kernel panic call trace

  • ‘?’ means that the information about this stack entry is probably not reliable.

The stack output mechanism (see the implementation of dump_trace() function) was unable to prove that the address it has found is a valid return address in the call stack.

‘?’ itself is output by printk_stack_address().

The stack entry may be valid or not. Sometimes one may simply skip it. It may be helpful to investigate the disassembly of the involved module to see which function is called at ClearFunctionName+0x88 (or, on x86, immediately before that position).

Concerning reliability

On x86, when dump_stack() is called, the function that actually examines the stack is print_context_stack() defined in arch/x86/kernel/dumpstack.c. Take a look at its code, I’ll try to explain it below.

I assume DWARF2 stack unwind facilities are not available in your Linux system (most likely, they are not, if it is not OpenSUSE or SLES). In this case, print_context_stack() seems to do the following.

It starts from an address (‘stack’ variable in the code) that is guaranteed to be an address of a stack location. It is actually the address of a local variable in dump_stack().

The function repeatedly increments that address (while (valid_stack_ptr …) { … stack++}) and checks if what it points to could also be an address in the kernel code (if (__kernel_text_address(addr)) …). This way it attempts to find the functions' return addresses pushed on stack when these functions were called.

Of course, not every unsigned long value that looks like a return address is actually a return address. So the function tries to check it. If frame pointers are used in the code of the kernel (%ebp/%rbp registers are employed for that if CONFIG_FRAME_POINTER is set), they can be used to traverse the stack frames of the functions. The return address for a function lies just above the frame pointer (i.e. at %ebp/%rbp + sizeof(unsigned long)). print_context_stack checks exactly that.

If there is a stack frame for which the value ‘stack’ points to is the return address, the value is considered a reliable stack entry. ops->address will be called for it with reliable == 1, it will eventually call printk_stack_address() and the value will be output as a reliable call stack entry. Otherwise the address will be considered unreliable. It will be output anyway but with ‘?’ prepended.

[NB] If frame pointer information is not available (e.g. like it was in Debian 6 by default), all call stack entries will be marked as unreliable for this reason.

The systems with DWARF2 unwinding support (and with CONFIG_STACK_UNWIND set) is a whole another story.