xref: /OK3568_Linux_fs/kernel/Documentation/networking/checksum-offloads.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun=================
4*4882a593SmuzhiyunChecksum Offloads
5*4882a593Smuzhiyun=================
6*4882a593Smuzhiyun
7*4882a593Smuzhiyun
8*4882a593SmuzhiyunIntroduction
9*4882a593Smuzhiyun============
10*4882a593Smuzhiyun
11*4882a593SmuzhiyunThis document describes a set of techniques in the Linux networking stack to
12*4882a593Smuzhiyuntake advantage of checksum offload capabilities of various NICs.
13*4882a593Smuzhiyun
14*4882a593SmuzhiyunThe following technologies are described:
15*4882a593Smuzhiyun
16*4882a593Smuzhiyun* TX Checksum Offload
17*4882a593Smuzhiyun* LCO: Local Checksum Offload
18*4882a593Smuzhiyun* RCO: Remote Checksum Offload
19*4882a593Smuzhiyun
20*4882a593SmuzhiyunThings that should be documented here but aren't yet:
21*4882a593Smuzhiyun
22*4882a593Smuzhiyun* RX Checksum Offload
23*4882a593Smuzhiyun* CHECKSUM_UNNECESSARY conversion
24*4882a593Smuzhiyun
25*4882a593Smuzhiyun
26*4882a593SmuzhiyunTX Checksum Offload
27*4882a593Smuzhiyun===================
28*4882a593Smuzhiyun
29*4882a593SmuzhiyunThe interface for offloading a transmit checksum to a device is explained in
30*4882a593Smuzhiyundetail in comments near the top of include/linux/skbuff.h.
31*4882a593Smuzhiyun
32*4882a593SmuzhiyunIn brief, it allows to request the device fill in a single ones-complement
33*4882a593Smuzhiyunchecksum defined by the sk_buff fields skb->csum_start and skb->csum_offset.
34*4882a593SmuzhiyunThe device should compute the 16-bit ones-complement checksum (i.e. the
35*4882a593Smuzhiyun'IP-style' checksum) from csum_start to the end of the packet, and fill in the
36*4882a593Smuzhiyunresult at (csum_start + csum_offset).
37*4882a593Smuzhiyun
38*4882a593SmuzhiyunBecause csum_offset cannot be negative, this ensures that the previous value of
39*4882a593Smuzhiyunthe checksum field is included in the checksum computation, thus it can be used
40*4882a593Smuzhiyunto supply any needed corrections to the checksum (such as the sum of the
41*4882a593Smuzhiyunpseudo-header for UDP or TCP).
42*4882a593Smuzhiyun
43*4882a593SmuzhiyunThis interface only allows a single checksum to be offloaded.  Where
44*4882a593Smuzhiyunencapsulation is used, the packet may have multiple checksum fields in
45*4882a593Smuzhiyundifferent header layers, and the rest will have to be handled by another
46*4882a593Smuzhiyunmechanism such as LCO or RCO.
47*4882a593Smuzhiyun
48*4882a593SmuzhiyunCRC32c can also be offloaded using this interface, by means of filling
49*4882a593Smuzhiyunskb->csum_start and skb->csum_offset as described above, and setting
50*4882a593Smuzhiyunskb->csum_not_inet: see skbuff.h comment (section 'D') for more details.
51*4882a593Smuzhiyun
52*4882a593SmuzhiyunNo offloading of the IP header checksum is performed; it is always done in
53*4882a593Smuzhiyunsoftware.  This is OK because when we build the IP header, we obviously have it
54*4882a593Smuzhiyunin cache, so summing it isn't expensive.  It's also rather short.
55*4882a593Smuzhiyun
56*4882a593SmuzhiyunThe requirements for GSO are more complicated, because when segmenting an
57*4882a593Smuzhiyunencapsulated packet both the inner and outer checksums may need to be edited or
58*4882a593Smuzhiyunrecomputed for each resulting segment.  See the skbuff.h comment (section 'E')
59*4882a593Smuzhiyunfor more details.
60*4882a593Smuzhiyun
61*4882a593SmuzhiyunA driver declares its offload capabilities in netdev->hw_features; see
62*4882a593SmuzhiyunDocumentation/networking/netdev-features.rst for more.  Note that a device
63*4882a593Smuzhiyunwhich only advertises NETIF_F_IP[V6]_CSUM must still obey the csum_start and
64*4882a593Smuzhiyuncsum_offset given in the SKB; if it tries to deduce these itself in hardware
65*4882a593Smuzhiyun(as some NICs do) the driver should check that the values in the SKB match
66*4882a593Smuzhiyunthose which the hardware will deduce, and if not, fall back to checksumming in
67*4882a593Smuzhiyunsoftware instead (with skb_csum_hwoffload_help() or one of the
68*4882a593Smuzhiyunskb_checksum_help() / skb_crc32c_csum_help functions, as mentioned in
69*4882a593Smuzhiyuninclude/linux/skbuff.h).
70*4882a593Smuzhiyun
71*4882a593SmuzhiyunThe stack should, for the most part, assume that checksum offload is supported
72*4882a593Smuzhiyunby the underlying device.  The only place that should check is
73*4882a593Smuzhiyunvalidate_xmit_skb(), and the functions it calls directly or indirectly.  That
74*4882a593Smuzhiyunfunction compares the offload features requested by the SKB (which may include
75*4882a593Smuzhiyunother offloads besides TX Checksum Offload) and, if they are not supported or
76*4882a593Smuzhiyunenabled on the device (determined by netdev->features), performs the
77*4882a593Smuzhiyuncorresponding offload in software.  In the case of TX Checksum Offload, that
78*4882a593Smuzhiyunmeans calling skb_csum_hwoffload_help(skb, features).
79*4882a593Smuzhiyun
80*4882a593Smuzhiyun
81*4882a593SmuzhiyunLCO: Local Checksum Offload
82*4882a593Smuzhiyun===========================
83*4882a593Smuzhiyun
84*4882a593SmuzhiyunLCO is a technique for efficiently computing the outer checksum of an
85*4882a593Smuzhiyunencapsulated datagram when the inner checksum is due to be offloaded.
86*4882a593Smuzhiyun
87*4882a593SmuzhiyunThe ones-complement sum of a correctly checksummed TCP or UDP packet is equal
88*4882a593Smuzhiyunto the complement of the sum of the pseudo header, because everything else gets
89*4882a593Smuzhiyun'cancelled out' by the checksum field.  This is because the sum was
90*4882a593Smuzhiyuncomplemented before being written to the checksum field.
91*4882a593Smuzhiyun
92*4882a593SmuzhiyunMore generally, this holds in any case where the 'IP-style' ones complement
93*4882a593Smuzhiyunchecksum is used, and thus any checksum that TX Checksum Offload supports.
94*4882a593Smuzhiyun
95*4882a593SmuzhiyunThat is, if we have set up TX Checksum Offload with a start/offset pair, we
96*4882a593Smuzhiyunknow that after the device has filled in that checksum, the ones complement sum
97*4882a593Smuzhiyunfrom csum_start to the end of the packet will be equal to the complement of
98*4882a593Smuzhiyunwhatever value we put in the checksum field beforehand.  This allows us to
99*4882a593Smuzhiyuncompute the outer checksum without looking at the payload: we simply stop
100*4882a593Smuzhiyunsumming when we get to csum_start, then add the complement of the 16-bit word
101*4882a593Smuzhiyunat (csum_start + csum_offset).
102*4882a593Smuzhiyun
103*4882a593SmuzhiyunThen, when the true inner checksum is filled in (either by hardware or by
104*4882a593Smuzhiyunskb_checksum_help()), the outer checksum will become correct by virtue of the
105*4882a593Smuzhiyunarithmetic.
106*4882a593Smuzhiyun
107*4882a593SmuzhiyunLCO is performed by the stack when constructing an outer UDP header for an
108*4882a593Smuzhiyunencapsulation such as VXLAN or GENEVE, in udp_set_csum().  Similarly for the
109*4882a593SmuzhiyunIPv6 equivalents, in udp6_set_csum().
110*4882a593Smuzhiyun
111*4882a593SmuzhiyunIt is also performed when constructing an IPv4 GRE header, in
112*4882a593Smuzhiyunnet/ipv4/ip_gre.c:build_header().  It is *not* currently performed when
113*4882a593Smuzhiyunconstructing an IPv6 GRE header; the GRE checksum is computed over the whole
114*4882a593Smuzhiyunpacket in net/ipv6/ip6_gre.c:ip6gre_xmit2(), but it should be possible to use
115*4882a593SmuzhiyunLCO here as IPv6 GRE still uses an IP-style checksum.
116*4882a593Smuzhiyun
117*4882a593SmuzhiyunAll of the LCO implementations use a helper function lco_csum(), in
118*4882a593Smuzhiyuninclude/linux/skbuff.h.
119*4882a593Smuzhiyun
120*4882a593SmuzhiyunLCO can safely be used for nested encapsulations; in this case, the outer
121*4882a593Smuzhiyunencapsulation layer will sum over both its own header and the 'middle' header.
122*4882a593SmuzhiyunThis does mean that the 'middle' header will get summed multiple times, but
123*4882a593Smuzhiyunthere doesn't seem to be a way to avoid that without incurring bigger costs
124*4882a593Smuzhiyun(e.g. in SKB bloat).
125*4882a593Smuzhiyun
126*4882a593Smuzhiyun
127*4882a593SmuzhiyunRCO: Remote Checksum Offload
128*4882a593Smuzhiyun============================
129*4882a593Smuzhiyun
130*4882a593SmuzhiyunRCO is a technique for eliding the inner checksum of an encapsulated datagram,
131*4882a593Smuzhiyunallowing the outer checksum to be offloaded.  It does, however, involve a
132*4882a593Smuzhiyunchange to the encapsulation protocols, which the receiver must also support.
133*4882a593SmuzhiyunFor this reason, it is disabled by default.
134*4882a593Smuzhiyun
135*4882a593SmuzhiyunRCO is detailed in the following Internet-Drafts:
136*4882a593Smuzhiyun
137*4882a593Smuzhiyun* https://tools.ietf.org/html/draft-herbert-remotecsumoffload-00
138*4882a593Smuzhiyun* https://tools.ietf.org/html/draft-herbert-vxlan-rco-00
139*4882a593Smuzhiyun
140*4882a593SmuzhiyunIn Linux, RCO is implemented individually in each encapsulation protocol, and
141*4882a593Smuzhiyunmost tunnel types have flags controlling its use.  For instance, VXLAN has the
142*4882a593Smuzhiyunflag VXLAN_F_REMCSUM_TX (per struct vxlan_rdst) to indicate that RCO should be
143*4882a593Smuzhiyunused when transmitting to a given remote destination.
144