xref: /OK3568_Linux_fs/kernel/Documentation/networking/xfrm_device.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun===============================================
4*4882a593SmuzhiyunXFRM device - offloading the IPsec computations
5*4882a593Smuzhiyun===============================================
6*4882a593Smuzhiyun
7*4882a593SmuzhiyunShannon Nelson <shannon.nelson@oracle.com>
8*4882a593Smuzhiyun
9*4882a593Smuzhiyun
10*4882a593SmuzhiyunOverview
11*4882a593Smuzhiyun========
12*4882a593Smuzhiyun
13*4882a593SmuzhiyunIPsec is a useful feature for securing network traffic, but the
14*4882a593Smuzhiyuncomputational cost is high: a 10Gbps link can easily be brought down
15*4882a593Smuzhiyunto under 1Gbps, depending on the traffic and link configuration.
16*4882a593SmuzhiyunLuckily, there are NICs that offer a hardware based IPsec offload which
17*4882a593Smuzhiyuncan radically increase throughput and decrease CPU utilization.  The XFRM
18*4882a593SmuzhiyunDevice interface allows NIC drivers to offer to the stack access to the
19*4882a593Smuzhiyunhardware offload.
20*4882a593Smuzhiyun
21*4882a593SmuzhiyunUserland access to the offload is typically through a system such as
22*4882a593Smuzhiyunlibreswan or KAME/raccoon, but the iproute2 'ip xfrm' command set can
23*4882a593Smuzhiyunbe handy when experimenting.  An example command might look something
24*4882a593Smuzhiyunlike this::
25*4882a593Smuzhiyun
26*4882a593Smuzhiyun  ip x s add proto esp dst 14.0.0.70 src 14.0.0.52 spi 0x07 mode transport \
27*4882a593Smuzhiyun     reqid 0x07 replay-window 32 \
28*4882a593Smuzhiyun     aead 'rfc4106(gcm(aes))' 0x44434241343332312423222114131211f4f3f2f1 128 \
29*4882a593Smuzhiyun     sel src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp \
30*4882a593Smuzhiyun     offload dev eth4 dir in
31*4882a593Smuzhiyun
32*4882a593SmuzhiyunYes, that's ugly, but that's what shell scripts and/or libreswan are for.
33*4882a593Smuzhiyun
34*4882a593Smuzhiyun
35*4882a593Smuzhiyun
36*4882a593SmuzhiyunCallbacks to implement
37*4882a593Smuzhiyun======================
38*4882a593Smuzhiyun
39*4882a593Smuzhiyun::
40*4882a593Smuzhiyun
41*4882a593Smuzhiyun  /* from include/linux/netdevice.h */
42*4882a593Smuzhiyun  struct xfrmdev_ops {
43*4882a593Smuzhiyun	int	(*xdo_dev_state_add) (struct xfrm_state *x);
44*4882a593Smuzhiyun	void	(*xdo_dev_state_delete) (struct xfrm_state *x);
45*4882a593Smuzhiyun	void	(*xdo_dev_state_free) (struct xfrm_state *x);
46*4882a593Smuzhiyun	bool	(*xdo_dev_offload_ok) (struct sk_buff *skb,
47*4882a593Smuzhiyun				       struct xfrm_state *x);
48*4882a593Smuzhiyun	void    (*xdo_dev_state_advance_esn) (struct xfrm_state *x);
49*4882a593Smuzhiyun  };
50*4882a593Smuzhiyun
51*4882a593SmuzhiyunThe NIC driver offering ipsec offload will need to implement these
52*4882a593Smuzhiyuncallbacks to make the offload available to the network stack's
53*4882a593SmuzhiyunXFRM subsytem.  Additionally, the feature bits NETIF_F_HW_ESP and
54*4882a593SmuzhiyunNETIF_F_HW_ESP_TX_CSUM will signal the availability of the offload.
55*4882a593Smuzhiyun
56*4882a593Smuzhiyun
57*4882a593Smuzhiyun
58*4882a593SmuzhiyunFlow
59*4882a593Smuzhiyun====
60*4882a593Smuzhiyun
61*4882a593SmuzhiyunAt probe time and before the call to register_netdev(), the driver should
62*4882a593Smuzhiyunset up local data structures and XFRM callbacks, and set the feature bits.
63*4882a593SmuzhiyunThe XFRM code's listener will finish the setup on NETDEV_REGISTER.
64*4882a593Smuzhiyun
65*4882a593Smuzhiyun::
66*4882a593Smuzhiyun
67*4882a593Smuzhiyun		adapter->netdev->xfrmdev_ops = &ixgbe_xfrmdev_ops;
68*4882a593Smuzhiyun		adapter->netdev->features |= NETIF_F_HW_ESP;
69*4882a593Smuzhiyun		adapter->netdev->hw_enc_features |= NETIF_F_HW_ESP;
70*4882a593Smuzhiyun
71*4882a593SmuzhiyunWhen new SAs are set up with a request for "offload" feature, the
72*4882a593Smuzhiyundriver's xdo_dev_state_add() will be given the new SA to be offloaded
73*4882a593Smuzhiyunand an indication of whether it is for Rx or Tx.  The driver should
74*4882a593Smuzhiyun
75*4882a593Smuzhiyun	- verify the algorithm is supported for offloads
76*4882a593Smuzhiyun	- store the SA information (key, salt, target-ip, protocol, etc)
77*4882a593Smuzhiyun	- enable the HW offload of the SA
78*4882a593Smuzhiyun	- return status value:
79*4882a593Smuzhiyun
80*4882a593Smuzhiyun		===========   ===================================
81*4882a593Smuzhiyun		0             success
82*4882a593Smuzhiyun		-EOPNETSUPP   offload not supported, try SW IPsec
83*4882a593Smuzhiyun		other         fail the request
84*4882a593Smuzhiyun		===========   ===================================
85*4882a593Smuzhiyun
86*4882a593SmuzhiyunThe driver can also set an offload_handle in the SA, an opaque void pointer
87*4882a593Smuzhiyunthat can be used to convey context into the fast-path offload requests::
88*4882a593Smuzhiyun
89*4882a593Smuzhiyun		xs->xso.offload_handle = context;
90*4882a593Smuzhiyun
91*4882a593Smuzhiyun
92*4882a593SmuzhiyunWhen the network stack is preparing an IPsec packet for an SA that has
93*4882a593Smuzhiyunbeen setup for offload, it first calls into xdo_dev_offload_ok() with
94*4882a593Smuzhiyunthe skb and the intended offload state to ask the driver if the offload
95*4882a593Smuzhiyunwill serviceable.  This can check the packet information to be sure the
96*4882a593Smuzhiyunoffload can be supported (e.g. IPv4 or IPv6, no IPv4 options, etc) and
97*4882a593Smuzhiyunreturn true of false to signify its support.
98*4882a593Smuzhiyun
99*4882a593SmuzhiyunWhen ready to send, the driver needs to inspect the Tx packet for the
100*4882a593Smuzhiyunoffload information, including the opaque context, and set up the packet
101*4882a593Smuzhiyunsend accordingly::
102*4882a593Smuzhiyun
103*4882a593Smuzhiyun		xs = xfrm_input_state(skb);
104*4882a593Smuzhiyun		context = xs->xso.offload_handle;
105*4882a593Smuzhiyun		set up HW for send
106*4882a593Smuzhiyun
107*4882a593SmuzhiyunThe stack has already inserted the appropriate IPsec headers in the
108*4882a593Smuzhiyunpacket data, the offload just needs to do the encryption and fix up the
109*4882a593Smuzhiyunheader values.
110*4882a593Smuzhiyun
111*4882a593Smuzhiyun
112*4882a593SmuzhiyunWhen a packet is received and the HW has indicated that it offloaded a
113*4882a593Smuzhiyundecryption, the driver needs to add a reference to the decoded SA into
114*4882a593Smuzhiyunthe packet's skb.  At this point the data should be decrypted but the
115*4882a593SmuzhiyunIPsec headers are still in the packet data; they are removed later up
116*4882a593Smuzhiyunthe stack in xfrm_input().
117*4882a593Smuzhiyun
118*4882a593Smuzhiyun	find and hold the SA that was used to the Rx skb::
119*4882a593Smuzhiyun
120*4882a593Smuzhiyun		get spi, protocol, and destination IP from packet headers
121*4882a593Smuzhiyun		xs = find xs from (spi, protocol, dest_IP)
122*4882a593Smuzhiyun		xfrm_state_hold(xs);
123*4882a593Smuzhiyun
124*4882a593Smuzhiyun	store the state information into the skb::
125*4882a593Smuzhiyun
126*4882a593Smuzhiyun		sp = secpath_set(skb);
127*4882a593Smuzhiyun		if (!sp) return;
128*4882a593Smuzhiyun		sp->xvec[sp->len++] = xs;
129*4882a593Smuzhiyun		sp->olen++;
130*4882a593Smuzhiyun
131*4882a593Smuzhiyun	indicate the success and/or error status of the offload::
132*4882a593Smuzhiyun
133*4882a593Smuzhiyun		xo = xfrm_offload(skb);
134*4882a593Smuzhiyun		xo->flags = CRYPTO_DONE;
135*4882a593Smuzhiyun		xo->status = crypto_status;
136*4882a593Smuzhiyun
137*4882a593Smuzhiyun	hand the packet to napi_gro_receive() as usual
138*4882a593Smuzhiyun
139*4882a593SmuzhiyunIn ESN mode, xdo_dev_state_advance_esn() is called from xfrm_replay_advance_esn().
140*4882a593SmuzhiyunDriver will check packet seq number and update HW ESN state machine if needed.
141*4882a593Smuzhiyun
142*4882a593SmuzhiyunWhen the SA is removed by the user, the driver's xdo_dev_state_delete()
143*4882a593Smuzhiyunis asked to disable the offload.  Later, xdo_dev_state_free() is called
144*4882a593Smuzhiyunfrom a garbage collection routine after all reference counts to the state
145*4882a593Smuzhiyunhave been removed and any remaining resources can be cleared for the
146*4882a593Smuzhiyunoffload state.  How these are used by the driver will depend on specific
147*4882a593Smuzhiyunhardware needs.
148*4882a593Smuzhiyun
149*4882a593SmuzhiyunAs a netdev is set to DOWN the XFRM stack's netdev listener will call
150*4882a593Smuzhiyunxdo_dev_state_delete() and xdo_dev_state_free() on any remaining offloaded
151*4882a593Smuzhiyunstates.
152