kk Blog —— 通用基础

date [-d @int|str] [+%s|"+%F %T"]

linux内核网络分层结构

http://liucw.blog.51cto.com/6751239/1221140

内核网络结构

在Linux内核中,对网络部分按照网络协议层、网络设备层、设备驱动功能层和网络媒介层的分层体系设计。

网络驱动功能层主要通过网络驱动程序实现。

在Linux内核,所有的网络设备都被抽象为一个接口处理,该接口提供了所有的网络操作。

net_device结构表示网络设备在内核中的情况,也就是网络设备接口。网络设备接口既包括软件虚拟的网络设备接口,如环路设备,也包括了网络硬件设备,如以太网卡。

Linux内核有一个dev_base的全局指针,指向一个设备链表,包括了系统内的所有网络设备。该设备链表每个节点是一个网络设备。

在net_device结构中提供了许多供系统访问和协议层调用的设备方法,包括初始化、打开关闭设备、数据包发送和接收等。

与网络有关的数据结构

内核对网络数据包的处理都是基于sk_buff结构的,该结构是内核网络部分最重要的数据结构。

网络协议栈中各层协议都可以通过对该结构的操作实现本层协议数据的添加或者删除。使用sk_buff结构避免了网络协议栈各层来回复制数据导致的效率低下。

sk_buff结构可以分为两个部分,一部分是存储数据包缓存,在图中表示为PackertData,另一部分是由一组用于内核管理的指针组成。
sk_buff管理的指针最主要的是下面4个:

head指向数据缓冲(PackertData)的内核首地址;
data指向当前数据包的首地址;
tail指向当前数据包的尾地址;
end 指向数据缓冲的内核尾地址。
数据包的大小在内核网络协议栈的处理过程中会发生改变,因此data和tail指针也会不断变化,而head和tail指针是不会发生改变的。

对于一个TCP数据包为例,sk_buff还提供了几个指针直接指向各层协议头。mac指针指向数据的mac头;nh指针指向网络协议头,一般是IP协议头;h指向传输层协议头,在本例中是TCP协议头。

对各层设置指针的是方便了协议栈对数据包的处理。

net_device结构

Linux内核中网络设备最重要的数据结构就是net_device结构了,它是网络驱动程序最重要的部分。
net_device结构保存在include/linux/netdevices.h头文件,理解该结构对理解网络设备驱动有很大帮助。
内核中所有网络设备的信息和操作都在net_device设备中,无论是注册网络设备,还是设置网络设备参数,都用到该结构。
下面是主要数据成员。

设备名称
总线参数
协议参数
链接层变量
接口标志

数据包接收流程

在Linux内核中,一个网络数据包从网卡接收到用户空间需要经过链路层、传输层和socket的处理,最终到达用户空间。

以DM9000网卡为例,当网卡收到数据包以后,调用中断处理函数 dm9000_interrupt(),该函数检查中断处理类型,如果是接收数据包中断,则调用 dm9000_rx()函数接收数据包到内核空间。

dm9000_rx()函数收到数据包完成后,内核会继续调用 netif_rx()函数,函数的作用是把网卡接收到数据提交给协议栈处理。

协议栈使用 net_rx_action()函数处理接收数据包队列,该函数处理数据包后如果是 IP数据包则提交给 ip_recv()函数处理。ip_recv()函数主要是检查一个数据包IP头的合法性,检查通过后交给 ip_local_deliver()和 ip_local_deliver_finish()函数处理,之所以分开处理是因为内核中有防火墙相关的代码需要动态加载到此处。

IP头处理完毕后,以UDP数据包为例将交由 udp_recv()函数处理,与 ip_recv()函数类亿,该函数检查 UDP头的合法性,然后交给 udp_queue_recv()函数处理,最后提交给 sock_queue_recv()函数处理。

数据包进入 socket部分的第一个函数是 skb_recv_datagram(),该函数从内核的 socket队列取出数据包,交给 socket部分的 udp_recvmsg()函数,该函数负责处理UDP的数据,sock_recvmsg()处理提交给 sock_read()函数。

sock_read()函数读取接收到的数据缓冲,把数据返回给 sys_read()系统调用。sys_read()函数调用最终把数据复制到用户空间,供用户使得。

数据包发送流程

以UDP数据包发送流程为例,在DM9000网卡上如何发送一个数据包。

当用户空间的应用程序通过 socket函数 sento()发送一个UDP数据后,会调用内核空间的 sock_writev()函数,然后通过 sock_sendmsg()函数处理。sock_sendmsg()函数调用 inet_sendmsg()函数处理,inet_sendmsg()函数会把要发送的数据交给传输层的 udp_sendmsg()函数处理。

udp_sendmsg()函数在数据前加入UDP头,然后把数据交给 ip_build_xmit()函数处理,该函数根据 socket提供的目的 IP和端口信息构造IP头,然后调用 output_maybe_reroute()函数处理。out_maybe_reroute()函数检查数据包是否需要经过路由,最后交给 ip_output()函数写入到发送队列,写入完成后由 ip_finish_output()函数处理后续工作。

链路层的 dev_queue_xmit()函数处理发送队列,调用 DM9000网卡的发送数据包函数 dm9000_xmit()发送数据包,发送完毕后,调用 dm9000_xmit_done函数处理发送结果。

Linux 内核发包

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/workqueue.h>
#include <linux/timer.h>
#include <linux/in.h>
#include <linux/inet.h>
#include <linux/socket.h>
#include <net/sock.h>

struct socket     *sock;

unsigned char buffer[10]=
{ 1,2,3,4,5,6,7,8,9,0,};

static int ker_send_udp(char* ip_addr, unsigned char * data, size_t len )
{
	int ret;
	u32 remote_ip = in_aton(ip_addr);
  
	struct sockaddr_in sin = {
		.sin_family = AF_INET,
		.sin_port = htons(65530),
		.sin_addr = {.s_addr = remote_ip}
	};
 
	struct kvec iov = {.iov_base = (void *)data, .iov_len = len};
	struct msghdr udpmsg;

	udpmsg.msg_name = (void *)&sin;
	udpmsg.msg_namelen = sizeof(sin);
	udpmsg.msg_control = NULL;
	udpmsg.msg_controllen = 0;
	udpmsg.msg_flags=0;

	ret = kernel_sendmsg(sock, &udpmsg, &iov, 1, len);
	printk("rets = %d\n",ret);
   
	return 0;
}

static int socket_init (void)
{
	int ret;
	ret = sock_create_kern (PF_INET, SOCK_DGRAM,IPPROTO_UDP, &sock);
	printk("retc = %d\n",ret);
   
	ker_send_udp("192.168.1.253", buffer, 10);
	return 0;
}

static void socket_exit (void)
{   
	sock_release (sock);
}

module_init (socket_init);
module_exit (socket_exit);
MODULE_LICENSE ("GPL");

Oops打印Tainted信息

检查一下上面的Oops,看看Linux内核还有没有给我们留下其他的有用信息。

1
Oops: 0002 [#1]
  • 这里面,0002表示Oops的错误代码(写错误,发生在内核空间),#1表示这个错误发生一次。

Oops的错误代码根据错误的原因会有不同的定义,本文中的例子可以参考下面的定义(如果发现自己遇到的Oops和下面无法对应的话,最好去内核代码里查找):

1
2
3
4
5
 * error_code:
 *      bit 0 == 0 means no page found, 1 means protection fault
 *      bit 1 == 0 means read, 1 means write
 *      bit 2 == 0 means kernel, 1 means user-mode
 *      bit 3 == 0 means data, 1 means instruction

有时候,Oops还会打印出Tainted信息。这个信息用来指出内核是因何种原因被tainted(直译为“玷污”)。具体的定义如下:

1
2
3
4
5
6
7
8
9
10
11
12
  1: 'G' if all modules loaded have a GPL or compatible license, 'P' if any proprietary module has been loaded.  Modules without a MODULE_LICENSE or with a MODULE_LICENSE that is not recognised by insmod as GPL compatible are assumed to be proprietary.
  2: 'F' if any module was force loaded by "insmod -f", ' ' if all modules were loaded normally.
  3: 'S' if the oops occurred on an SMP kernel running on hardware that hasn't been certified as safe to run multiprocessor. Currently this occurs only on various Athlons that are not SMP capable.
  4: 'R' if a module was force unloaded by "rmmod -f", ' ' if all modules were unloaded normally.
  5: 'M' if any processor has reported a Machine Check Exception, ' ' if no Machine Check Exceptions have occurred.
  6: 'B' if a page-release function has found a bad page reference or some unexpected page flags.
  7: 'U' if a user or user application specifically requested that the Tainted flag be set, ' ' otherwise.
  8: 'D' if the kernel has died recently, i.e. there was an OOPS or BUG.
  9: 'A' if the ACPI table has been overridden.
 10: 'W' if a warning has previously been issued by the kernel. (Though some warnings may set more specific taint flags.)
 11: 'C' if a staging driver has been loaded.
 12: 'I' if the kernel is working around a severe bug in the platform firmware (BIOS or similar).

TopCoder Marathon 怎么做

和srm一样写个类和函数即可。

以这题为例: http://community.topcoder.com/longcontest/?module=ViewProblemStatement&rd=15683&pm=12593

要求:

1
2
3
4
5
6
7
8
Definition
      
Class:    CirclesSeparation
Method:   minimumWork
Parameters:   double[], double[], double[], double[]
Returns:  double[]
Method signature: double[] minimumWork(double[] x, double[] y, double[] r, double[] m)
(be sure your method is public)
可以写个很简单的:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
import java.util.*;
import java.io.*;
import java.math.*;
public class CirclesSeparation {
  int N, now;
  double ox[] = new double[1000], oy[] = new double[1000];
  double x[] = new double[1000], y[] = new double[1000];
  double r[] = new double[1000], m[] = new double[1000];
  boolean touch(int i,int j)
  {
      double dis = (x[i]-x[j])*(x[i]-x[j]) + (y[i]-y[j])*(y[i]-y[j]);
      if (dis > (r[i]+r[j]) * (r[i]+r[j])) {
          return false;
      }
      return true;
  }
  void dfsMove(int ok, int j)
  {
      double px = x[j] - x[ok];
      double py = y[j] - y[ok];
      double dis = Math.sqrt((x[j]-x[ok])*(x[j]-x[ok]) + (y[j]-y[ok])*(y[j]-y[ok]));
      double dd = r[ok] + r[j] - dis + 0.001;
      x[j] += dd * px / dis;
      y[j] += dd * py / dis;
      //System.out.println(x[j] + "\t" + y[j]);
      int i;
      for (i=0;i<=now;i++) {
          if (i != j && touch(i, j)) {
              dfsMove(j, i);
          }
      }
  }
  public double[] minimumWork(double[] ix, double[] iy, double[] ir, double[] im) {
      int i,j,k,l;
      N = ix.length;
      for (i=0;i<N;i++) {
          ox[i] = ix[i];
          oy[i] = iy[i];
          x[i] = ix[i];
          y[i] = iy[i];
          r[i] = ir[i];
          m[i] = im[i];
      }
      for (i=1;i<N;i++)
      {
          now = i;
          for (j=0;j<i;j++) {
              if (!touch(i, j)) continue;
              dfsMove(i, j);
          }
      }
      double ret[] = new double[N+N];
      for (i=0;i<N;i++) {
          ret[i+i] = x[i];
          ret[i+i+1] = y[i];
      }
      return ret;
  }
}

按照格式写,然后返回结果就可以。这是最基本的。

其实我们可以用他提供的工具先做调试

一般每题会有available.这样一个链接,
进去后

1、先下载页面顶上 CirclesSeparationVis.jar 和 一些其他的东西(如果有)
2、在这行In other words, you should implement the following pseudocode in the main method of your solution:的后面会给出一些输入输出步骤,把他们翻译成对应语言的输入输出,并且把他们写在主函数中,像这题的:
1
2
3
4
5
6
7
8
9
10
11
12
13
N = parseInt(readLine())
for (i=0; i < N; i++)
	x[i] = parseDouble(readLine())
for (i=0; i < N; i++)
	y[i] = parseDouble(readLine())
for (i=0; i < N; i++)
	r[i] = parseDouble(readLine())
for (i=0; i < N; i++)
	m[i] = parseDouble(readLine())
ret = minimumWork(x, y, r, m)
for (i=0; i < 2*N; i++)
	printLine(ret[i])
flush(stdout)

翻译成java的是:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
public static void main(String[] args) {
	Scanner cin = new Scanner(System.in);
	double x[], y[], r[], m[], ret[];
	int N, i;
	N = cin.nextInt();
	x = new double[N];
	y = new double[N];
	r = new double[N];
	m = new double[N];
	for (i=0;i<N;i++) x[i] = cin.nextDouble();
	for (i=0;i<N;i++) y[i] = cin.nextDouble();
	for (i=0;i<N;i++) r[i] = cin.nextDouble();
	for (i=0;i<N;i++) m[i] = cin.nextDouble();
	CirclesSeparation rrr = new CirclesSeparation();
	ret = rrr.minimumWork(x, y, r, m);
	for (i=0;i<N+N;i++) {
		System.out.println(ret[i]);
	}
}

把这个函数加到最基本的当中,这样一个就形成一个完整的可执行程序,编译它生成对应目标代码。

1
$ javac CirclesSeparation.java
3、再往下可以找到一句类似于:
1
java -jar CirclesSeparationVis.jar -exec "<command>"

的语句。

1
java 的<command>是 java CirclesSeparation

所以运行:

1
java -jar CirclesSeparationVis.jar -exec "java CirclesSeparation" 

就可以看到结果了。

可以用 -seed=X 来选择第几组样例,可以用 -novis 来关闭图形显示

4、当用这个工具的时候System.out.println()的输出会被工具截获,要输出调试信息可以用System.err.println()
5、有时候需要改CirclesSeparationVis.jar代码,以满足我们的调试需求。可以下载CirclesSeparationVis.java,然后javac编译之,在使用的时候改用:
1
java CirclesSeparationVis -exec "java CirclesSeparation"
6、用long t=System.currentTimeMillis()统计时间,是千分之一秒

abrt 应用程序core dump

一、安装

1
yum install abrt

二、设置

1
2
ulimit -c
ulimit -c unlimited

三、常见错误

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
1、ERROR
$ tail -f /var/log/message
abrtd: Package 'XXX' isn't signed with proper key

$ vim /etc/abrt/abrt.conf
OR
$ vim /etc/abrt/abrt-action-save-package-data.conf
OpenGPGCheck = no

2、ERROR
tail -f /var/log/message
abrtd: Duplicate: UUID

Whenever a problem is detected, ABRT compares it with all 
existing problem data and determines whether that same problem 
has been recorded. If it has been, the existing problem data 
is updated and the most recent (duplicate) problem is not recorded again.

3、
ProcessUnpackaged = <yes/no>
This directive tells ABRT whether to process crashes 
in executables that do not belong to any package. 

abrt

http://docs.fedoraproject.org/en-US/Fedora/14/html/Deployment_Guide/configuring.html
https://fedorahosted.org/releases/a/b/abrt/Deployment_Guide.html

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
21.6. Configuring ABRT

ABRT's main configuration file is /etc/abrt/abrt.conf. 
ABRT plugins can be configured through their config files, 
located in the /etc/abrt/plugins/ directory.

After changing and saving the abrt.conf configuration file, 
you must restart the abrtd daemon—as root—for the new settings to take effect:

~]# service abrtd restart

The following configuration directives are currently supported in /etc/abrt/abrt.conf.

[ Common ] Section DirectivesOpenGPGCheck = <yes/no>

Setting the OpenGPGCheck directive to yes (the default setting) tells 
ABRT to only analyze and handle crashes in applications provided by 
packages which are signed by the GPG keys whose locations are listed 
in the /etc/abrt/gpg_keys file. Setting OpenGPGCheck to no tells 
ABRT to catch crashes in all programs.

BlackList = nspluginwrapper, valgrind, strace, avant-window-navigator, [<additional_packages> ]

Crashes in packages and binaries listed after the BlackList directive 
will not be handled by ABRT. If you want ABRT to ignore other packages 
and binaries, list them here separated by commas.

ProcessUnpackaged = <yes/no>

This directive tells ABRT whether to process crashes in executables 
that do not belong to any package.    

BlackListedPaths = /usr/share/doc/*, */example*

Crashes in executables in these paths will be ignored by ABRT.

Database = SQLite3

This directive instructs ABRT to store its crash data in the SQLite3 database. 
Other databases are not currently supported. However, 
ABRT's plugin architecture allows for future support for alternative databases.

#WatchCrashdumpArchiveDir = /var/spool/abrt-upload/

This directive is commented out by default. 
Enable (uncomment) it if you want abrtd to auto-unpack crashdump tarballs 
which appear in the specified directory — in this case /var/spool/abrt-upload/ — 
(for example, uploaded via ftp, scp, etc.). You must ensure that whatever 
directory you specify in this directive exists and is writable for abrtd. 
abrtd will not create it automatically.

MaxCrashReportsSize = <size_in_megabytes>

This option sets the amount of storage space, in megabytes, 
used by ABRT to store all crash information from all users. 
The default setting is 1000 MB. Once the quota specified here has been met, 
ABRT will continue catching crashes, and in order to make room for the new crash dumps, 
it will delete the oldest and largest ones.

ActionsAndReporters = SOSreport, [<additional_plugins> ]

This option tells ABRT to run the specified plugin(s) immediately 
after a crash is detected and saved. For example, the SOSreport plugin runs 
the sosreport tool which adds the data collected by it to the created crash dump. 
You can turn this behavior off by commenting out this line. For further fine-tuning,
 you can add SOSreport (or any other specified plugin) to either the CCpp or 
Python options to make ABRT run sosreport (or any other specified plugin) after 
any C and C++ or Python applications crash, respectively. For more information 
on various Action and Reporter plugins, refer to Section 21.3, “ ABRT Plugins”

[ AnalyzerActionsAndReporters ] Section Directives

This section allows you to associate certain analyzer actions and reporter 
actions to run when ABRT catches kernel oopses or crashes in C, C++ or Python programs. 
The actions and reporters specified in any of the directives below will run only 
if you run abrt-gui or abrt-cli and report the crash that occurred. 
If you do not specify any actions and reporters in these directives, 
you will not be able to report a crash via abrt-gui or abrt-cli. 
The order of actions and reporters is important. Commenting out a directive, 
will cause ABRT not to catch the crashes associated with that directive. 
For example, commenting out the Kerneloops line will cause ABRT not to catch kernel oopses.

Kerneloops = RHTSupport, Logger

This directive specifies that, for kernel oopses, 
both the RHTSupport and Logger reporters will be run.

CCpp = RHTSupport, Logger

This directive specifies that, when C or C++ program crashes occur, 
both the RHTSupport and Logger reporters will be run.

Python = RHTSupport, Logger

This directive specifies that, when Python program crashes occur, 
both the RHTSupport and Logger reporters will be run.

Each of these destinations' details can be specified in the corresponding 
plugins/*.conf file. For example, plugins/RHTSupport.conf specifies 
which RHTSupport URL to use (set to https://api.access.redhat.com/rs by default), 
the user's login name, password for logging in to the RHTSupport site, 
etc. All these options can also be configured through the abrt-gui application
 (for more information on plugin configuration refer to Section 21.3, “ ABRT Plugins”).

[ Cron ] Section Directives <time> = <action_to_run>

The [ Cron ] section of abrt.conf allows you to specify the exact time, 
or elapsed amount of time between, when ABRT should run a certain action, 
such as scanning for kernel oopses or performing file transfers. 
You can list further actions to run by appending them to the end of this section.

Example 21.1. [ Cron ] section of /etc/abrt/abrt.conf

# Which Action plugins to run repeatedly
[ Cron ]
# h:m - at h:m
# s - every s seconds
120 = KerneloopsScanner
#02:00 = FileTransfer


The format for an entry is either 
<time_in_seconds> = <action_to_run> or <hh:mm> = <action_to_run> , 
where hh (hour) is in the range 00-23 
(all hours less than 10 should be zero-filled, i.e. preceded by a 0), 
and mm (minute) is 00-59, zero-filled likewise.