xref: /OK3568_Linux_fs/kernel/Documentation/networking/segmentation-offloads.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun=====================
4*4882a593SmuzhiyunSegmentation Offloads
5*4882a593Smuzhiyun=====================
6*4882a593Smuzhiyun
7*4882a593Smuzhiyun
8*4882a593SmuzhiyunIntroduction
9*4882a593Smuzhiyun============
10*4882a593Smuzhiyun
11*4882a593SmuzhiyunThis document describes a set of techniques in the Linux networking stack
12*4882a593Smuzhiyunto take advantage of segmentation offload capabilities of various NICs.
13*4882a593Smuzhiyun
14*4882a593SmuzhiyunThe following technologies are described:
15*4882a593Smuzhiyun * TCP Segmentation Offload - TSO
16*4882a593Smuzhiyun * UDP Fragmentation Offload - UFO
17*4882a593Smuzhiyun * IPIP, SIT, GRE, and UDP Tunnel Offloads
18*4882a593Smuzhiyun * Generic Segmentation Offload - GSO
19*4882a593Smuzhiyun * Generic Receive Offload - GRO
20*4882a593Smuzhiyun * Partial Generic Segmentation Offload - GSO_PARTIAL
21*4882a593Smuzhiyun * SCTP acceleration with GSO - GSO_BY_FRAGS
22*4882a593Smuzhiyun
23*4882a593Smuzhiyun
24*4882a593SmuzhiyunTCP Segmentation Offload
25*4882a593Smuzhiyun========================
26*4882a593Smuzhiyun
27*4882a593SmuzhiyunTCP segmentation allows a device to segment a single frame into multiple
28*4882a593Smuzhiyunframes with a data payload size specified in skb_shinfo()->gso_size.
29*4882a593SmuzhiyunWhen TCP segmentation requested the bit for either SKB_GSO_TCPV4 or
30*4882a593SmuzhiyunSKB_GSO_TCPV6 should be set in skb_shinfo()->gso_type and
31*4882a593Smuzhiyunskb_shinfo()->gso_size should be set to a non-zero value.
32*4882a593Smuzhiyun
33*4882a593SmuzhiyunTCP segmentation is dependent on support for the use of partial checksum
34*4882a593Smuzhiyunoffload.  For this reason TSO is normally disabled if the Tx checksum
35*4882a593Smuzhiyunoffload for a given device is disabled.
36*4882a593Smuzhiyun
37*4882a593SmuzhiyunIn order to support TCP segmentation offload it is necessary to populate
38*4882a593Smuzhiyunthe network and transport header offsets of the skbuff so that the device
39*4882a593Smuzhiyundrivers will be able determine the offsets of the IP or IPv6 header and the
40*4882a593SmuzhiyunTCP header.  In addition as CHECKSUM_PARTIAL is required csum_start should
41*4882a593Smuzhiyunalso point to the TCP header of the packet.
42*4882a593Smuzhiyun
43*4882a593SmuzhiyunFor IPv4 segmentation we support one of two types in terms of the IP ID.
44*4882a593SmuzhiyunThe default behavior is to increment the IP ID with every segment.  If the
45*4882a593SmuzhiyunGSO type SKB_GSO_TCP_FIXEDID is specified then we will not increment the IP
46*4882a593SmuzhiyunID and all segments will use the same IP ID.  If a device has
47*4882a593SmuzhiyunNETIF_F_TSO_MANGLEID set then the IP ID can be ignored when performing TSO
48*4882a593Smuzhiyunand we will either increment the IP ID for all frames, or leave it at a
49*4882a593Smuzhiyunstatic value based on driver preference.
50*4882a593Smuzhiyun
51*4882a593Smuzhiyun
52*4882a593SmuzhiyunUDP Fragmentation Offload
53*4882a593Smuzhiyun=========================
54*4882a593Smuzhiyun
55*4882a593SmuzhiyunUDP fragmentation offload allows a device to fragment an oversized UDP
56*4882a593Smuzhiyundatagram into multiple IPv4 fragments.  Many of the requirements for UDP
57*4882a593Smuzhiyunfragmentation offload are the same as TSO.  However the IPv4 ID for
58*4882a593Smuzhiyunfragments should not increment as a single IPv4 datagram is fragmented.
59*4882a593Smuzhiyun
60*4882a593SmuzhiyunUFO is deprecated: modern kernels will no longer generate UFO skbs, but can
61*4882a593Smuzhiyunstill receive them from tuntap and similar devices. Offload of UDP-based
62*4882a593Smuzhiyuntunnel protocols is still supported.
63*4882a593Smuzhiyun
64*4882a593Smuzhiyun
65*4882a593SmuzhiyunIPIP, SIT, GRE, UDP Tunnel, and Remote Checksum Offloads
66*4882a593Smuzhiyun========================================================
67*4882a593Smuzhiyun
68*4882a593SmuzhiyunIn addition to the offloads described above it is possible for a frame to
69*4882a593Smuzhiyuncontain additional headers such as an outer tunnel.  In order to account
70*4882a593Smuzhiyunfor such instances an additional set of segmentation offload types were
71*4882a593Smuzhiyunintroduced including SKB_GSO_IPXIP4, SKB_GSO_IPXIP6, SKB_GSO_GRE, and
72*4882a593SmuzhiyunSKB_GSO_UDP_TUNNEL.  These extra segmentation types are used to identify
73*4882a593Smuzhiyuncases where there are more than just 1 set of headers.  For example in the
74*4882a593Smuzhiyuncase of IPIP and SIT we should have the network and transport headers moved
75*4882a593Smuzhiyunfrom the standard list of headers to "inner" header offsets.
76*4882a593Smuzhiyun
77*4882a593SmuzhiyunCurrently only two levels of headers are supported.  The convention is to
78*4882a593Smuzhiyunrefer to the tunnel headers as the outer headers, while the encapsulated
79*4882a593Smuzhiyundata is normally referred to as the inner headers.  Below is the list of
80*4882a593Smuzhiyuncalls to access the given headers:
81*4882a593Smuzhiyun
82*4882a593SmuzhiyunIPIP/SIT Tunnel::
83*4882a593Smuzhiyun
84*4882a593Smuzhiyun             Outer                  Inner
85*4882a593Smuzhiyun  MAC        skb_mac_header
86*4882a593Smuzhiyun  Network    skb_network_header     skb_inner_network_header
87*4882a593Smuzhiyun  Transport  skb_transport_header
88*4882a593Smuzhiyun
89*4882a593SmuzhiyunUDP/GRE Tunnel::
90*4882a593Smuzhiyun
91*4882a593Smuzhiyun             Outer                  Inner
92*4882a593Smuzhiyun  MAC        skb_mac_header         skb_inner_mac_header
93*4882a593Smuzhiyun  Network    skb_network_header     skb_inner_network_header
94*4882a593Smuzhiyun  Transport  skb_transport_header   skb_inner_transport_header
95*4882a593Smuzhiyun
96*4882a593SmuzhiyunIn addition to the above tunnel types there are also SKB_GSO_GRE_CSUM and
97*4882a593SmuzhiyunSKB_GSO_UDP_TUNNEL_CSUM.  These two additional tunnel types reflect the
98*4882a593Smuzhiyunfact that the outer header also requests to have a non-zero checksum
99*4882a593Smuzhiyunincluded in the outer header.
100*4882a593Smuzhiyun
101*4882a593SmuzhiyunFinally there is SKB_GSO_TUNNEL_REMCSUM which indicates that a given tunnel
102*4882a593Smuzhiyunheader has requested a remote checksum offload.  In this case the inner
103*4882a593Smuzhiyunheaders will be left with a partial checksum and only the outer header
104*4882a593Smuzhiyunchecksum will be computed.
105*4882a593Smuzhiyun
106*4882a593Smuzhiyun
107*4882a593SmuzhiyunGeneric Segmentation Offload
108*4882a593Smuzhiyun============================
109*4882a593Smuzhiyun
110*4882a593SmuzhiyunGeneric segmentation offload is a pure software offload that is meant to
111*4882a593Smuzhiyundeal with cases where device drivers cannot perform the offloads described
112*4882a593Smuzhiyunabove.  What occurs in GSO is that a given skbuff will have its data broken
113*4882a593Smuzhiyunout over multiple skbuffs that have been resized to match the MSS provided
114*4882a593Smuzhiyunvia skb_shinfo()->gso_size.
115*4882a593Smuzhiyun
116*4882a593SmuzhiyunBefore enabling any hardware segmentation offload a corresponding software
117*4882a593Smuzhiyunoffload is required in GSO.  Otherwise it becomes possible for a frame to
118*4882a593Smuzhiyunbe re-routed between devices and end up being unable to be transmitted.
119*4882a593Smuzhiyun
120*4882a593Smuzhiyun
121*4882a593SmuzhiyunGeneric Receive Offload
122*4882a593Smuzhiyun=======================
123*4882a593Smuzhiyun
124*4882a593SmuzhiyunGeneric receive offload is the complement to GSO.  Ideally any frame
125*4882a593Smuzhiyunassembled by GRO should be segmented to create an identical sequence of
126*4882a593Smuzhiyunframes using GSO, and any sequence of frames segmented by GSO should be
127*4882a593Smuzhiyunable to be reassembled back to the original by GRO.  The only exception to
128*4882a593Smuzhiyunthis is IPv4 ID in the case that the DF bit is set for a given IP header.
129*4882a593SmuzhiyunIf the value of the IPv4 ID is not sequentially incrementing it will be
130*4882a593Smuzhiyunaltered so that it is when a frame assembled via GRO is segmented via GSO.
131*4882a593Smuzhiyun
132*4882a593Smuzhiyun
133*4882a593SmuzhiyunPartial Generic Segmentation Offload
134*4882a593Smuzhiyun====================================
135*4882a593Smuzhiyun
136*4882a593SmuzhiyunPartial generic segmentation offload is a hybrid between TSO and GSO.  What
137*4882a593Smuzhiyunit effectively does is take advantage of certain traits of TCP and tunnels
138*4882a593Smuzhiyunso that instead of having to rewrite the packet headers for each segment
139*4882a593Smuzhiyunonly the inner-most transport header and possibly the outer-most network
140*4882a593Smuzhiyunheader need to be updated.  This allows devices that do not support tunnel
141*4882a593Smuzhiyunoffloads or tunnel offloads with checksum to still make use of segmentation.
142*4882a593Smuzhiyun
143*4882a593SmuzhiyunWith the partial offload what occurs is that all headers excluding the
144*4882a593Smuzhiyuninner transport header are updated such that they will contain the correct
145*4882a593Smuzhiyunvalues for if the header was simply duplicated.  The one exception to this
146*4882a593Smuzhiyunis the outer IPv4 ID field.  It is up to the device drivers to guarantee
147*4882a593Smuzhiyunthat the IPv4 ID field is incremented in the case that a given header does
148*4882a593Smuzhiyunnot have the DF bit set.
149*4882a593Smuzhiyun
150*4882a593Smuzhiyun
151*4882a593SmuzhiyunSCTP acceleration with GSO
152*4882a593Smuzhiyun===========================
153*4882a593Smuzhiyun
154*4882a593SmuzhiyunSCTP - despite the lack of hardware support - can still take advantage of
155*4882a593SmuzhiyunGSO to pass one large packet through the network stack, rather than
156*4882a593Smuzhiyunmultiple small packets.
157*4882a593Smuzhiyun
158*4882a593SmuzhiyunThis requires a different approach to other offloads, as SCTP packets
159*4882a593Smuzhiyuncannot be just segmented to (P)MTU. Rather, the chunks must be contained in
160*4882a593SmuzhiyunIP segments, padding respected. So unlike regular GSO, SCTP can't just
161*4882a593Smuzhiyungenerate a big skb, set gso_size to the fragmentation point and deliver it
162*4882a593Smuzhiyunto IP layer.
163*4882a593Smuzhiyun
164*4882a593SmuzhiyunInstead, the SCTP protocol layer builds an skb with the segments correctly
165*4882a593Smuzhiyunpadded and stored as chained skbs, and skb_segment() splits based on those.
166*4882a593SmuzhiyunTo signal this, gso_size is set to the special value GSO_BY_FRAGS.
167*4882a593Smuzhiyun
168*4882a593SmuzhiyunTherefore, any code in the core networking stack must be aware of the
169*4882a593Smuzhiyunpossibility that gso_size will be GSO_BY_FRAGS and handle that case
170*4882a593Smuzhiyunappropriately.
171*4882a593Smuzhiyun
172*4882a593SmuzhiyunThere are some helpers to make this easier:
173*4882a593Smuzhiyun
174*4882a593Smuzhiyun- skb_is_gso(skb) && skb_is_gso_sctp(skb) is the best way to see if
175*4882a593Smuzhiyun  an skb is an SCTP GSO skb.
176*4882a593Smuzhiyun
177*4882a593Smuzhiyun- For size checks, the skb_gso_validate_*_len family of helpers correctly
178*4882a593Smuzhiyun  considers GSO_BY_FRAGS.
179*4882a593Smuzhiyun
180*4882a593Smuzhiyun- For manipulating packets, skb_increase_gso_size and skb_decrease_gso_size
181*4882a593Smuzhiyun  will check for GSO_BY_FRAGS and WARN if asked to manipulate these skbs.
182*4882a593Smuzhiyun
183*4882a593SmuzhiyunThis also affects drivers with the NETIF_F_FRAGLIST & NETIF_F_GSO_SCTP bits
184*4882a593Smuzhiyunset. Note also that NETIF_F_GSO_SCTP is included in NETIF_F_GSO_SOFTWARE.
185