1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0 2*4882a593Smuzhiyun 3*4882a593Smuzhiyun================= 4*4882a593SmuzhiyunChecksum Offloads 5*4882a593Smuzhiyun================= 6*4882a593Smuzhiyun 7*4882a593Smuzhiyun 8*4882a593SmuzhiyunIntroduction 9*4882a593Smuzhiyun============ 10*4882a593Smuzhiyun 11*4882a593SmuzhiyunThis document describes a set of techniques in the Linux networking stack to 12*4882a593Smuzhiyuntake advantage of checksum offload capabilities of various NICs. 13*4882a593Smuzhiyun 14*4882a593SmuzhiyunThe following technologies are described: 15*4882a593Smuzhiyun 16*4882a593Smuzhiyun* TX Checksum Offload 17*4882a593Smuzhiyun* LCO: Local Checksum Offload 18*4882a593Smuzhiyun* RCO: Remote Checksum Offload 19*4882a593Smuzhiyun 20*4882a593SmuzhiyunThings that should be documented here but aren't yet: 21*4882a593Smuzhiyun 22*4882a593Smuzhiyun* RX Checksum Offload 23*4882a593Smuzhiyun* CHECKSUM_UNNECESSARY conversion 24*4882a593Smuzhiyun 25*4882a593Smuzhiyun 26*4882a593SmuzhiyunTX Checksum Offload 27*4882a593Smuzhiyun=================== 28*4882a593Smuzhiyun 29*4882a593SmuzhiyunThe interface for offloading a transmit checksum to a device is explained in 30*4882a593Smuzhiyundetail in comments near the top of include/linux/skbuff.h. 31*4882a593Smuzhiyun 32*4882a593SmuzhiyunIn brief, it allows to request the device fill in a single ones-complement 33*4882a593Smuzhiyunchecksum defined by the sk_buff fields skb->csum_start and skb->csum_offset. 34*4882a593SmuzhiyunThe device should compute the 16-bit ones-complement checksum (i.e. the 35*4882a593Smuzhiyun'IP-style' checksum) from csum_start to the end of the packet, and fill in the 36*4882a593Smuzhiyunresult at (csum_start + csum_offset). 37*4882a593Smuzhiyun 38*4882a593SmuzhiyunBecause csum_offset cannot be negative, this ensures that the previous value of 39*4882a593Smuzhiyunthe checksum field is included in the checksum computation, thus it can be used 40*4882a593Smuzhiyunto supply any needed corrections to the checksum (such as the sum of the 41*4882a593Smuzhiyunpseudo-header for UDP or TCP). 42*4882a593Smuzhiyun 43*4882a593SmuzhiyunThis interface only allows a single checksum to be offloaded. Where 44*4882a593Smuzhiyunencapsulation is used, the packet may have multiple checksum fields in 45*4882a593Smuzhiyundifferent header layers, and the rest will have to be handled by another 46*4882a593Smuzhiyunmechanism such as LCO or RCO. 47*4882a593Smuzhiyun 48*4882a593SmuzhiyunCRC32c can also be offloaded using this interface, by means of filling 49*4882a593Smuzhiyunskb->csum_start and skb->csum_offset as described above, and setting 50*4882a593Smuzhiyunskb->csum_not_inet: see skbuff.h comment (section 'D') for more details. 51*4882a593Smuzhiyun 52*4882a593SmuzhiyunNo offloading of the IP header checksum is performed; it is always done in 53*4882a593Smuzhiyunsoftware. This is OK because when we build the IP header, we obviously have it 54*4882a593Smuzhiyunin cache, so summing it isn't expensive. It's also rather short. 55*4882a593Smuzhiyun 56*4882a593SmuzhiyunThe requirements for GSO are more complicated, because when segmenting an 57*4882a593Smuzhiyunencapsulated packet both the inner and outer checksums may need to be edited or 58*4882a593Smuzhiyunrecomputed for each resulting segment. See the skbuff.h comment (section 'E') 59*4882a593Smuzhiyunfor more details. 60*4882a593Smuzhiyun 61*4882a593SmuzhiyunA driver declares its offload capabilities in netdev->hw_features; see 62*4882a593SmuzhiyunDocumentation/networking/netdev-features.rst for more. Note that a device 63*4882a593Smuzhiyunwhich only advertises NETIF_F_IP[V6]_CSUM must still obey the csum_start and 64*4882a593Smuzhiyuncsum_offset given in the SKB; if it tries to deduce these itself in hardware 65*4882a593Smuzhiyun(as some NICs do) the driver should check that the values in the SKB match 66*4882a593Smuzhiyunthose which the hardware will deduce, and if not, fall back to checksumming in 67*4882a593Smuzhiyunsoftware instead (with skb_csum_hwoffload_help() or one of the 68*4882a593Smuzhiyunskb_checksum_help() / skb_crc32c_csum_help functions, as mentioned in 69*4882a593Smuzhiyuninclude/linux/skbuff.h). 70*4882a593Smuzhiyun 71*4882a593SmuzhiyunThe stack should, for the most part, assume that checksum offload is supported 72*4882a593Smuzhiyunby the underlying device. The only place that should check is 73*4882a593Smuzhiyunvalidate_xmit_skb(), and the functions it calls directly or indirectly. That 74*4882a593Smuzhiyunfunction compares the offload features requested by the SKB (which may include 75*4882a593Smuzhiyunother offloads besides TX Checksum Offload) and, if they are not supported or 76*4882a593Smuzhiyunenabled on the device (determined by netdev->features), performs the 77*4882a593Smuzhiyuncorresponding offload in software. In the case of TX Checksum Offload, that 78*4882a593Smuzhiyunmeans calling skb_csum_hwoffload_help(skb, features). 79*4882a593Smuzhiyun 80*4882a593Smuzhiyun 81*4882a593SmuzhiyunLCO: Local Checksum Offload 82*4882a593Smuzhiyun=========================== 83*4882a593Smuzhiyun 84*4882a593SmuzhiyunLCO is a technique for efficiently computing the outer checksum of an 85*4882a593Smuzhiyunencapsulated datagram when the inner checksum is due to be offloaded. 86*4882a593Smuzhiyun 87*4882a593SmuzhiyunThe ones-complement sum of a correctly checksummed TCP or UDP packet is equal 88*4882a593Smuzhiyunto the complement of the sum of the pseudo header, because everything else gets 89*4882a593Smuzhiyun'cancelled out' by the checksum field. This is because the sum was 90*4882a593Smuzhiyuncomplemented before being written to the checksum field. 91*4882a593Smuzhiyun 92*4882a593SmuzhiyunMore generally, this holds in any case where the 'IP-style' ones complement 93*4882a593Smuzhiyunchecksum is used, and thus any checksum that TX Checksum Offload supports. 94*4882a593Smuzhiyun 95*4882a593SmuzhiyunThat is, if we have set up TX Checksum Offload with a start/offset pair, we 96*4882a593Smuzhiyunknow that after the device has filled in that checksum, the ones complement sum 97*4882a593Smuzhiyunfrom csum_start to the end of the packet will be equal to the complement of 98*4882a593Smuzhiyunwhatever value we put in the checksum field beforehand. This allows us to 99*4882a593Smuzhiyuncompute the outer checksum without looking at the payload: we simply stop 100*4882a593Smuzhiyunsumming when we get to csum_start, then add the complement of the 16-bit word 101*4882a593Smuzhiyunat (csum_start + csum_offset). 102*4882a593Smuzhiyun 103*4882a593SmuzhiyunThen, when the true inner checksum is filled in (either by hardware or by 104*4882a593Smuzhiyunskb_checksum_help()), the outer checksum will become correct by virtue of the 105*4882a593Smuzhiyunarithmetic. 106*4882a593Smuzhiyun 107*4882a593SmuzhiyunLCO is performed by the stack when constructing an outer UDP header for an 108*4882a593Smuzhiyunencapsulation such as VXLAN or GENEVE, in udp_set_csum(). Similarly for the 109*4882a593SmuzhiyunIPv6 equivalents, in udp6_set_csum(). 110*4882a593Smuzhiyun 111*4882a593SmuzhiyunIt is also performed when constructing an IPv4 GRE header, in 112*4882a593Smuzhiyunnet/ipv4/ip_gre.c:build_header(). It is *not* currently performed when 113*4882a593Smuzhiyunconstructing an IPv6 GRE header; the GRE checksum is computed over the whole 114*4882a593Smuzhiyunpacket in net/ipv6/ip6_gre.c:ip6gre_xmit2(), but it should be possible to use 115*4882a593SmuzhiyunLCO here as IPv6 GRE still uses an IP-style checksum. 116*4882a593Smuzhiyun 117*4882a593SmuzhiyunAll of the LCO implementations use a helper function lco_csum(), in 118*4882a593Smuzhiyuninclude/linux/skbuff.h. 119*4882a593Smuzhiyun 120*4882a593SmuzhiyunLCO can safely be used for nested encapsulations; in this case, the outer 121*4882a593Smuzhiyunencapsulation layer will sum over both its own header and the 'middle' header. 122*4882a593SmuzhiyunThis does mean that the 'middle' header will get summed multiple times, but 123*4882a593Smuzhiyunthere doesn't seem to be a way to avoid that without incurring bigger costs 124*4882a593Smuzhiyun(e.g. in SKB bloat). 125*4882a593Smuzhiyun 126*4882a593Smuzhiyun 127*4882a593SmuzhiyunRCO: Remote Checksum Offload 128*4882a593Smuzhiyun============================ 129*4882a593Smuzhiyun 130*4882a593SmuzhiyunRCO is a technique for eliding the inner checksum of an encapsulated datagram, 131*4882a593Smuzhiyunallowing the outer checksum to be offloaded. It does, however, involve a 132*4882a593Smuzhiyunchange to the encapsulation protocols, which the receiver must also support. 133*4882a593SmuzhiyunFor this reason, it is disabled by default. 134*4882a593Smuzhiyun 135*4882a593SmuzhiyunRCO is detailed in the following Internet-Drafts: 136*4882a593Smuzhiyun 137*4882a593Smuzhiyun* https://tools.ietf.org/html/draft-herbert-remotecsumoffload-00 138*4882a593Smuzhiyun* https://tools.ietf.org/html/draft-herbert-vxlan-rco-00 139*4882a593Smuzhiyun 140*4882a593SmuzhiyunIn Linux, RCO is implemented individually in each encapsulation protocol, and 141*4882a593Smuzhiyunmost tunnel types have flags controlling its use. For instance, VXLAN has the 142*4882a593Smuzhiyunflag VXLAN_F_REMCSUM_TX (per struct vxlan_rdst) to indicate that RCO should be 143*4882a593Smuzhiyunused when transmitting to a given remote destination. 144