kk Blog —— 通用基础


date [-d @int|str] [+%s|"+%F %T"]
netstat -ltunp
sar -n DEV 1

tcp_read_sock BUG

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
commit baff42ab1494528907bf4d5870359e31711746ae
Author: Steven J. Magnani <steve@digidescorp.com>
Date:   Tue Mar 30 13:56:01 2010 -0700

	net: Fix oops from tcp_collapse() when using splice()

	tcp_read_sock() can have a eat skbs without immediately advancing copied_seq.
	This can cause a panic in tcp_collapse() if it is called as a result
	of the recv_actor dropping the socket lock.

	A userspace program that splices data from a socket to either another
	socket or to a file can trigger this bug.

	Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
	Signed-off-by: David S. Miller <davem@davemloft.net>
1
2
3
4
5
6
7
8
9
10
11
12
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 6afb6d8..2c75f89 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1368,6 +1368,7 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
      sk_eat_skb(sk, skb, 0);
      if (!desc->count)
          break;
+     tp->copied_seq = seq;
  }
  tp->copied_seq = seq;
 

如果在tcp_read_sock中sk_eat_skb时copied_seq没有及时一起修改的话,就会出现copied_seq小于sk_write_queue队列第一个包的seq。
tcp_read_sock的recv_actor指向的函数(比如tcp_splice_data_recv)是有可能释放sk锁的,如果这时进入收包软中断且内存紧张调用tcp_collapse,
tcp_collapse中:

1
2
3
4
5
start = copied_seq
...
int offset = start - TCP_SKB_CB(skb)->seq;

BUG_ON(offset < 0);

tcp_match_skb_to_sack BUG

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
commit 2cd0d743b05e87445c54ca124a9916f22f16742e
Author: Neal Cardwell <ncardwell@google.com>
Date:   Wed Jun 18 21:15:03 2014 -0400

	tcp: fix tcp_match_skb_to_sack() for unaligned SACK at end of an skb

	If there is an MSS change (or misbehaving receiver) that causes a SACK
	to arrive that covers the end of an skb but is less than one MSS, then
	tcp_match_skb_to_sack() was rounding up pkt_len to the full length of
	the skb ("Round if necessary..."), then chopping all bytes off the skb
	and creating a zero-byte skb in the write queue.

	This was visible now because the recently simplified TLP logic in
	bef1909ee3ed1c ("tcp: fixing TLP's FIN recovery") could find that 0-byte
	skb at the end of the write queue, and now that we do not check that
	skb's length we could send it as a TLP probe.

	Consider the following example scenario:

	 mss: 1000
	 skb: seq: 0 end_seq: 4000  len: 4000
	 SACK: start_seq: 3999 end_seq: 4000

	The tcp_match_skb_to_sack() code will compute:

	 in_sack = false
	 pkt_len = start_seq - TCP_SKB_CB(skb)->seq = 3999 - 0 = 3999
	 new_len = (pkt_len / mss) * mss = (3999/1000)*1000 = 3000
	 new_len += mss = 4000

	Previously we would find the new_len > skb->len check failing, so we
	would fall through and set pkt_len = new_len = 4000 and chop off
	pkt_len of 4000 from the 4000-byte skb, leaving a 0-byte segment
	afterward in the write queue.

	With this new commit, we notice that the new new_len >= skb->len check
	succeeds, so that we return without trying to fragment.

	Fixes: adb92db857ee ("tcp: Make SACK code to split only at mss boundaries")
	Reported-by: Eric Dumazet <edumazet@google.com>
	Signed-off-by: Neal Cardwell <ncardwell@google.com>
	Cc: Eric Dumazet <edumazet@google.com>
	Cc: Yuchung Cheng <ycheng@google.com>
	Cc: Ilpo Jarvinen <ilpo.jarvinen@helsinki.fi>
	Acked-by: Eric Dumazet <edumazet@google.com>
	Signed-off-by: David S. Miller <davem@davemloft.net>
1
2
3
4
5
6
7
8
9
10
11
12
13
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 40661fc..b5c2375 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1162,7 +1162,7 @@ static int tcp_match_skb_to_sack(struct sock *sk, struct sk_buff *skb,
          unsigned int new_len = (pkt_len / mss) * mss;
          if (!in_sack && new_len < pkt_len) {
              new_len += mss;
-             if (new_len > skb->len)
+             if (new_len >= skb->len)
                  return 0;
          }
          pkt_len = new_len;

gro收包

linux kernel 网络协议栈之GRO(Generic receive offload)

gro会合并多个gso_size不同的包, 会将gso_size设置成第一个包的gso_size.

如果此时把这个包发出去,那么就会导致不满足: skb->gso_size * (skb->segs-1) < skb->len <= skb->gso_size * skb->segs

那么后面的三个函数就有可能出错

一、tcp_shift_skb_data

1
2
3
4
5
6
7
mss = skb->gso_size
len = len/mss * mss

|---|-------|-------|
 mss    |
        V
|---|---|

二、tcp_mark_head_lost

1
2
3
4
5
6
len = (packets - cnt) * mss

|--------|--|--|
   mss   |
         V
|--------|--------|

三、tcp_match_skb_to_sack

1
2
3
4
5
6
7
8
new_len = (pkt_len/mm)*mss
in_sack = 1
pkt_len = new_len

|---|-------|-------|
 mss    |
        V
|---|---|

修改

加入发包队列前

1
2
3
skb_shinfo(skb)->gso_size = 0;
skb_shinfo(skb)->gso_segs = 0;
skb_shinfo(skb)->gso_type = 0;