http://blog.csdn.net/zhangskd/article/details/7699081
概述
In computer networking, large segment offload (LSO) is a technique for increasing outbound throughput of high-bandwidth network connections by reducing CPU overhead. It works by queuing up large buffers and letting the network interface card (NIC) split them into separate packets. The technique is also called TCP segmentation offload (TSO) when applied to TCP, or generic segmentation offload (GSO).
The inbound counterpart of large segment offload is large recive offload (LRO).
When large chunks of data are to be sent over a computer network, they need to be first broken down to smaller segments that can pass through all the network elements like routers and switches between the source and destination computers. This process it referred to as segmentation. Segmentation is often done by the TCP protocol in the host computer. Offloading this work to the NIC is called TCP segmentation offload (TSO).
For example, a unit of 64KB (65,536 bytes) of data is usually segmented to 46 segments of 1448 bytes each before it is sent over the network through the NIC. With some intelligence in the NIC, the host CPU can hand over the 64KB of data to the NIC in a single transmit request, the NIC can break that data down into smaller segments of 1448 bytes, add the TCP, IP, and data link layer protocol headers——according to a template provided by the host’s TCP/IP stack——to each segment, and send the resulting frames over the network. This significantly reduces the work done by the CPU. Many new NICs on the market today support TSO. [1]
具体
It is a method to reduce CPU workload of packet cutting in 1500byte and asking hardware to perform the same functionality.
1.TSO feature is implemented using the hardware support. This means hardware should be able to segment the packets in max size of 1500 byte and reattach the header with every packets.
2.Every network hardware is represented by netdevice structure in kernel. If hardware supports TSO, it enables the Segmentation offload features in netdevice, mainly represented by “ NETIF_F_TSO” and other fields. [2]
TCP Segmentation Offload is supported in Linux by the network device layer. A driver that wants to offer TSO needs to set the NETIF_F_TSO bit in the network device structure. In order for a device to support TSO, it needs to also support Net : TCP Checksum Offloading and Net : Scatter Gather.
The driver will then receive super-sized skb’s. These are indicated to the driver by skb_shinfo(skb)->gso_size being non-zero. The gso_size is the size the hardware should fragment the TCP data. TSO may change how and when TCP decides to send data. [3]
实现
1 2 3 4 5 6 7 8 9 10 11 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
|
sk_gso_max_size
NIC also specify the maximum segment size which it can handle, in sk_gso_max_size field. Mostly it will be set to 64k. This 64k values means if the data at TCP is more than 64k, then again TCP has to segment it in 64k and then push to interface.
相关变量,sock中:unsigned int sk_gso_max_size.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
|
TSO Nagle
GSO, Generic Segmentation Offload,是协议栈提高效率的一个策略。
它尽可能晚的推迟分段(segmentation),最理想的是在网卡驱动里分段,在网卡驱动里把 大包(super-packet)拆开,组成SG list,或在一块预先分配好的内存中重组各段,然后交给 网卡。
The idea behind GSO seems to be that many of the performance benefits of LSO (TSO/UFO) can be obtained in a hardware-independent way, by passing large “superpackets” around for as long as possible, and deferring segmentation to the last possible moment - for devices without hardware segmentation/fragmentation support, this would be when data is actually handled to the device driver; for devices with hardware support, it could even be done in hardware.
Try to defer sending, if possible, in order to minimize the amount of TSO splitting we do. View it as a kind of TSO Nagle test.
通过延迟数据包的发送,来减少TSO分段的次数,达到减小CPU负载的目的。
1 2 3 4 5 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
|
tcp_tso_win_divisor:单个TSO段可消耗拥塞窗口的比例,默认值为3。
符合以下任意条件,不会TSO延迟,可马上发送:
(1) 数据包带有FIN标志。传输快结束了,不宜延迟。
(2) 发送方不处于Open拥塞状态。处于异常状态时,不宜延迟。
(3) 上一次skb被延迟了,且距离现在大于等于2ms。延迟不能超过2ms。
(4) min(send_win, cong_win) > full-sized TSO skb。允许发送的数据量超过TSO一次能处理的最大值,没必要再defer。
(5) skb处于发送队列中间,且允许整个skb一起发送。处于发送队列中间的skb不能再获得新的数据,没必要再defer。
(6) tcp_tso_win_divisor有设置时,limit > 单个TSO段可消耗的数据量,即min(snd_wnd, snd_cwnd * mss_cache) / tcp_tso_win_divisor。
(7) tcp_tso_win_divisor没有设置时,limit > tcp_max_burst(tp) * mss_cache,一般是3个数据包。
条件4、5、6/7,都是limit > 某个阈值,就可以马上发送。这个因为通过这几个条件,可以确定此时发送是受到应用程序的限制,而不是通告窗口或者拥塞窗口。在应用程序发送的数据量很少的情况下,不宜采用TSO Nagle,因为这会影响此类应用。
我们注意到tcp_is_cwnd_limited()中的注释说:
“ This is the inverse of cwnd check in tcp_tso_should_defer",所以可以认为在tcp_tso_should_defer()中包含判断
tcp_is_not_cwnd_limited (或者tcp_is_application_limited) 的条件。
符合以下所有条件,才会进行TSO延迟:
(1) 数据包不带有FIN标志。
(2) 发送方处于Open拥塞状态。
(3) 距离上一次延迟的时间在2ms以内。
(4) 允许发送的数据量小于sk_gso_max_size。
(5) skb处于发送队列末尾,或者skb不能整个发送出去。
(6) tcp_tso_win_divisor有设置时,允许发送的数据量不大于单个TSO段可消耗的。
(7) tcp_tso_win_divisor没有设置时,允许发送的数据量不大于3个包。
可以看到TSO的触发条件并不苛刻,所以被调用时并没有加unlikely。
应用
(1) 禁用TSO
1
|
|
(2) 启用TSO
TSO是默认启用的。
1
|
|
Reference
[1] http://en.wikipedia.org/wiki/Large_segment_offload
[2] http://tejparkash.wordpress.com/2010/03/06/tso-explained/
[3] http://www.linuxfoundation.org/collaborate/workgroups/networking/tso