1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0 2*4882a593Smuzhiyun 3*4882a593Smuzhiyun=============================================== 4*4882a593SmuzhiyunXFRM device - offloading the IPsec computations 5*4882a593Smuzhiyun=============================================== 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunShannon Nelson <shannon.nelson@oracle.com> 8*4882a593Smuzhiyun 9*4882a593Smuzhiyun 10*4882a593SmuzhiyunOverview 11*4882a593Smuzhiyun======== 12*4882a593Smuzhiyun 13*4882a593SmuzhiyunIPsec is a useful feature for securing network traffic, but the 14*4882a593Smuzhiyuncomputational cost is high: a 10Gbps link can easily be brought down 15*4882a593Smuzhiyunto under 1Gbps, depending on the traffic and link configuration. 16*4882a593SmuzhiyunLuckily, there are NICs that offer a hardware based IPsec offload which 17*4882a593Smuzhiyuncan radically increase throughput and decrease CPU utilization. The XFRM 18*4882a593SmuzhiyunDevice interface allows NIC drivers to offer to the stack access to the 19*4882a593Smuzhiyunhardware offload. 20*4882a593Smuzhiyun 21*4882a593SmuzhiyunUserland access to the offload is typically through a system such as 22*4882a593Smuzhiyunlibreswan or KAME/raccoon, but the iproute2 'ip xfrm' command set can 23*4882a593Smuzhiyunbe handy when experimenting. An example command might look something 24*4882a593Smuzhiyunlike this:: 25*4882a593Smuzhiyun 26*4882a593Smuzhiyun ip x s add proto esp dst 14.0.0.70 src 14.0.0.52 spi 0x07 mode transport \ 27*4882a593Smuzhiyun reqid 0x07 replay-window 32 \ 28*4882a593Smuzhiyun aead 'rfc4106(gcm(aes))' 0x44434241343332312423222114131211f4f3f2f1 128 \ 29*4882a593Smuzhiyun sel src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp \ 30*4882a593Smuzhiyun offload dev eth4 dir in 31*4882a593Smuzhiyun 32*4882a593SmuzhiyunYes, that's ugly, but that's what shell scripts and/or libreswan are for. 33*4882a593Smuzhiyun 34*4882a593Smuzhiyun 35*4882a593Smuzhiyun 36*4882a593SmuzhiyunCallbacks to implement 37*4882a593Smuzhiyun====================== 38*4882a593Smuzhiyun 39*4882a593Smuzhiyun:: 40*4882a593Smuzhiyun 41*4882a593Smuzhiyun /* from include/linux/netdevice.h */ 42*4882a593Smuzhiyun struct xfrmdev_ops { 43*4882a593Smuzhiyun int (*xdo_dev_state_add) (struct xfrm_state *x); 44*4882a593Smuzhiyun void (*xdo_dev_state_delete) (struct xfrm_state *x); 45*4882a593Smuzhiyun void (*xdo_dev_state_free) (struct xfrm_state *x); 46*4882a593Smuzhiyun bool (*xdo_dev_offload_ok) (struct sk_buff *skb, 47*4882a593Smuzhiyun struct xfrm_state *x); 48*4882a593Smuzhiyun void (*xdo_dev_state_advance_esn) (struct xfrm_state *x); 49*4882a593Smuzhiyun }; 50*4882a593Smuzhiyun 51*4882a593SmuzhiyunThe NIC driver offering ipsec offload will need to implement these 52*4882a593Smuzhiyuncallbacks to make the offload available to the network stack's 53*4882a593SmuzhiyunXFRM subsytem. Additionally, the feature bits NETIF_F_HW_ESP and 54*4882a593SmuzhiyunNETIF_F_HW_ESP_TX_CSUM will signal the availability of the offload. 55*4882a593Smuzhiyun 56*4882a593Smuzhiyun 57*4882a593Smuzhiyun 58*4882a593SmuzhiyunFlow 59*4882a593Smuzhiyun==== 60*4882a593Smuzhiyun 61*4882a593SmuzhiyunAt probe time and before the call to register_netdev(), the driver should 62*4882a593Smuzhiyunset up local data structures and XFRM callbacks, and set the feature bits. 63*4882a593SmuzhiyunThe XFRM code's listener will finish the setup on NETDEV_REGISTER. 64*4882a593Smuzhiyun 65*4882a593Smuzhiyun:: 66*4882a593Smuzhiyun 67*4882a593Smuzhiyun adapter->netdev->xfrmdev_ops = &ixgbe_xfrmdev_ops; 68*4882a593Smuzhiyun adapter->netdev->features |= NETIF_F_HW_ESP; 69*4882a593Smuzhiyun adapter->netdev->hw_enc_features |= NETIF_F_HW_ESP; 70*4882a593Smuzhiyun 71*4882a593SmuzhiyunWhen new SAs are set up with a request for "offload" feature, the 72*4882a593Smuzhiyundriver's xdo_dev_state_add() will be given the new SA to be offloaded 73*4882a593Smuzhiyunand an indication of whether it is for Rx or Tx. The driver should 74*4882a593Smuzhiyun 75*4882a593Smuzhiyun - verify the algorithm is supported for offloads 76*4882a593Smuzhiyun - store the SA information (key, salt, target-ip, protocol, etc) 77*4882a593Smuzhiyun - enable the HW offload of the SA 78*4882a593Smuzhiyun - return status value: 79*4882a593Smuzhiyun 80*4882a593Smuzhiyun =========== =================================== 81*4882a593Smuzhiyun 0 success 82*4882a593Smuzhiyun -EOPNETSUPP offload not supported, try SW IPsec 83*4882a593Smuzhiyun other fail the request 84*4882a593Smuzhiyun =========== =================================== 85*4882a593Smuzhiyun 86*4882a593SmuzhiyunThe driver can also set an offload_handle in the SA, an opaque void pointer 87*4882a593Smuzhiyunthat can be used to convey context into the fast-path offload requests:: 88*4882a593Smuzhiyun 89*4882a593Smuzhiyun xs->xso.offload_handle = context; 90*4882a593Smuzhiyun 91*4882a593Smuzhiyun 92*4882a593SmuzhiyunWhen the network stack is preparing an IPsec packet for an SA that has 93*4882a593Smuzhiyunbeen setup for offload, it first calls into xdo_dev_offload_ok() with 94*4882a593Smuzhiyunthe skb and the intended offload state to ask the driver if the offload 95*4882a593Smuzhiyunwill serviceable. This can check the packet information to be sure the 96*4882a593Smuzhiyunoffload can be supported (e.g. IPv4 or IPv6, no IPv4 options, etc) and 97*4882a593Smuzhiyunreturn true of false to signify its support. 98*4882a593Smuzhiyun 99*4882a593SmuzhiyunWhen ready to send, the driver needs to inspect the Tx packet for the 100*4882a593Smuzhiyunoffload information, including the opaque context, and set up the packet 101*4882a593Smuzhiyunsend accordingly:: 102*4882a593Smuzhiyun 103*4882a593Smuzhiyun xs = xfrm_input_state(skb); 104*4882a593Smuzhiyun context = xs->xso.offload_handle; 105*4882a593Smuzhiyun set up HW for send 106*4882a593Smuzhiyun 107*4882a593SmuzhiyunThe stack has already inserted the appropriate IPsec headers in the 108*4882a593Smuzhiyunpacket data, the offload just needs to do the encryption and fix up the 109*4882a593Smuzhiyunheader values. 110*4882a593Smuzhiyun 111*4882a593Smuzhiyun 112*4882a593SmuzhiyunWhen a packet is received and the HW has indicated that it offloaded a 113*4882a593Smuzhiyundecryption, the driver needs to add a reference to the decoded SA into 114*4882a593Smuzhiyunthe packet's skb. At this point the data should be decrypted but the 115*4882a593SmuzhiyunIPsec headers are still in the packet data; they are removed later up 116*4882a593Smuzhiyunthe stack in xfrm_input(). 117*4882a593Smuzhiyun 118*4882a593Smuzhiyun find and hold the SA that was used to the Rx skb:: 119*4882a593Smuzhiyun 120*4882a593Smuzhiyun get spi, protocol, and destination IP from packet headers 121*4882a593Smuzhiyun xs = find xs from (spi, protocol, dest_IP) 122*4882a593Smuzhiyun xfrm_state_hold(xs); 123*4882a593Smuzhiyun 124*4882a593Smuzhiyun store the state information into the skb:: 125*4882a593Smuzhiyun 126*4882a593Smuzhiyun sp = secpath_set(skb); 127*4882a593Smuzhiyun if (!sp) return; 128*4882a593Smuzhiyun sp->xvec[sp->len++] = xs; 129*4882a593Smuzhiyun sp->olen++; 130*4882a593Smuzhiyun 131*4882a593Smuzhiyun indicate the success and/or error status of the offload:: 132*4882a593Smuzhiyun 133*4882a593Smuzhiyun xo = xfrm_offload(skb); 134*4882a593Smuzhiyun xo->flags = CRYPTO_DONE; 135*4882a593Smuzhiyun xo->status = crypto_status; 136*4882a593Smuzhiyun 137*4882a593Smuzhiyun hand the packet to napi_gro_receive() as usual 138*4882a593Smuzhiyun 139*4882a593SmuzhiyunIn ESN mode, xdo_dev_state_advance_esn() is called from xfrm_replay_advance_esn(). 140*4882a593SmuzhiyunDriver will check packet seq number and update HW ESN state machine if needed. 141*4882a593Smuzhiyun 142*4882a593SmuzhiyunWhen the SA is removed by the user, the driver's xdo_dev_state_delete() 143*4882a593Smuzhiyunis asked to disable the offload. Later, xdo_dev_state_free() is called 144*4882a593Smuzhiyunfrom a garbage collection routine after all reference counts to the state 145*4882a593Smuzhiyunhave been removed and any remaining resources can be cleared for the 146*4882a593Smuzhiyunoffload state. How these are used by the driver will depend on specific 147*4882a593Smuzhiyunhardware needs. 148*4882a593Smuzhiyun 149*4882a593SmuzhiyunAs a netdev is set to DOWN the XFRM stack's netdev listener will call 150*4882a593Smuzhiyunxdo_dev_state_delete() and xdo_dev_state_free() on any remaining offloaded 151*4882a593Smuzhiyunstates. 152