xref: /OK3568_Linux_fs/kernel/Documentation/networking/tls-offload.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun==================
4*4882a593SmuzhiyunKernel TLS offload
5*4882a593Smuzhiyun==================
6*4882a593Smuzhiyun
7*4882a593SmuzhiyunKernel TLS operation
8*4882a593Smuzhiyun====================
9*4882a593Smuzhiyun
10*4882a593SmuzhiyunLinux kernel provides TLS connection offload infrastructure. Once a TCP
11*4882a593Smuzhiyunconnection is in ``ESTABLISHED`` state user space can enable the TLS Upper
12*4882a593SmuzhiyunLayer Protocol (ULP) and install the cryptographic connection state.
13*4882a593SmuzhiyunFor details regarding the user-facing interface refer to the TLS
14*4882a593Smuzhiyundocumentation in :ref:`Documentation/networking/tls.rst <kernel_tls>`.
15*4882a593Smuzhiyun
16*4882a593Smuzhiyun``ktls`` can operate in three modes:
17*4882a593Smuzhiyun
18*4882a593Smuzhiyun * Software crypto mode (``TLS_SW``) - CPU handles the cryptography.
19*4882a593Smuzhiyun   In most basic cases only crypto operations synchronous with the CPU
20*4882a593Smuzhiyun   can be used, but depending on calling context CPU may utilize
21*4882a593Smuzhiyun   asynchronous crypto accelerators. The use of accelerators introduces extra
22*4882a593Smuzhiyun   latency on socket reads (decryption only starts when a read syscall
23*4882a593Smuzhiyun   is made) and additional I/O load on the system.
24*4882a593Smuzhiyun * Packet-based NIC offload mode (``TLS_HW``) - the NIC handles crypto
25*4882a593Smuzhiyun   on a packet by packet basis, provided the packets arrive in order.
26*4882a593Smuzhiyun   This mode integrates best with the kernel stack and is described in detail
27*4882a593Smuzhiyun   in the remaining part of this document
28*4882a593Smuzhiyun   (``ethtool`` flags ``tls-hw-tx-offload`` and ``tls-hw-rx-offload``).
29*4882a593Smuzhiyun * Full TCP NIC offload mode (``TLS_HW_RECORD``) - mode of operation where
30*4882a593Smuzhiyun   NIC driver and firmware replace the kernel networking stack
31*4882a593Smuzhiyun   with its own TCP handling, it is not usable in production environments
32*4882a593Smuzhiyun   making use of the Linux networking stack for example any firewalling
33*4882a593Smuzhiyun   abilities or QoS and packet scheduling (``ethtool`` flag ``tls-hw-record``).
34*4882a593Smuzhiyun
35*4882a593SmuzhiyunThe operation mode is selected automatically based on device configuration,
36*4882a593Smuzhiyunoffload opt-in or opt-out on per-connection basis is not currently supported.
37*4882a593Smuzhiyun
38*4882a593SmuzhiyunTX
39*4882a593Smuzhiyun--
40*4882a593Smuzhiyun
41*4882a593SmuzhiyunAt a high level user write requests are turned into a scatter list, the TLS ULP
42*4882a593Smuzhiyunintercepts them, inserts record framing, performs encryption (in ``TLS_SW``
43*4882a593Smuzhiyunmode) and then hands the modified scatter list to the TCP layer. From this
44*4882a593Smuzhiyunpoint on the TCP stack proceeds as normal.
45*4882a593Smuzhiyun
46*4882a593SmuzhiyunIn ``TLS_HW`` mode the encryption is not performed in the TLS ULP.
47*4882a593SmuzhiyunInstead packets reach a device driver, the driver will mark the packets
48*4882a593Smuzhiyunfor crypto offload based on the socket the packet is attached to,
49*4882a593Smuzhiyunand send them to the device for encryption and transmission.
50*4882a593Smuzhiyun
51*4882a593SmuzhiyunRX
52*4882a593Smuzhiyun--
53*4882a593Smuzhiyun
54*4882a593SmuzhiyunOn the receive side if the device handled decryption and authentication
55*4882a593Smuzhiyunsuccessfully, the driver will set the decrypted bit in the associated
56*4882a593Smuzhiyun:c:type:`struct sk_buff <sk_buff>`. The packets reach the TCP stack and
57*4882a593Smuzhiyunare handled normally. ``ktls`` is informed when data is queued to the socket
58*4882a593Smuzhiyunand the ``strparser`` mechanism is used to delineate the records. Upon read
59*4882a593Smuzhiyunrequest, records are retrieved from the socket and passed to decryption routine.
60*4882a593SmuzhiyunIf device decrypted all the segments of the record the decryption is skipped,
61*4882a593Smuzhiyunotherwise software path handles decryption.
62*4882a593Smuzhiyun
63*4882a593Smuzhiyun.. kernel-figure::  tls-offload-layers.svg
64*4882a593Smuzhiyun   :alt:	TLS offload layers
65*4882a593Smuzhiyun   :align:	center
66*4882a593Smuzhiyun   :figwidth:	28em
67*4882a593Smuzhiyun
68*4882a593Smuzhiyun   Layers of Kernel TLS stack
69*4882a593Smuzhiyun
70*4882a593SmuzhiyunDevice configuration
71*4882a593Smuzhiyun====================
72*4882a593Smuzhiyun
73*4882a593SmuzhiyunDuring driver initialization device sets the ``NETIF_F_HW_TLS_RX`` and
74*4882a593Smuzhiyun``NETIF_F_HW_TLS_TX`` features and installs its
75*4882a593Smuzhiyun:c:type:`struct tlsdev_ops <tlsdev_ops>`
76*4882a593Smuzhiyunpointer in the :c:member:`tlsdev_ops` member of the
77*4882a593Smuzhiyun:c:type:`struct net_device <net_device>`.
78*4882a593Smuzhiyun
79*4882a593SmuzhiyunWhen TLS cryptographic connection state is installed on a ``ktls`` socket
80*4882a593Smuzhiyun(note that it is done twice, once for RX and once for TX direction,
81*4882a593Smuzhiyunand the two are completely independent), the kernel checks if the underlying
82*4882a593Smuzhiyunnetwork device is offload-capable and attempts the offload. In case offload
83*4882a593Smuzhiyunfails the connection is handled entirely in software using the same mechanism
84*4882a593Smuzhiyunas if the offload was never tried.
85*4882a593Smuzhiyun
86*4882a593SmuzhiyunOffload request is performed via the :c:member:`tls_dev_add` callback of
87*4882a593Smuzhiyun:c:type:`struct tlsdev_ops <tlsdev_ops>`:
88*4882a593Smuzhiyun
89*4882a593Smuzhiyun.. code-block:: c
90*4882a593Smuzhiyun
91*4882a593Smuzhiyun	int (*tls_dev_add)(struct net_device *netdev, struct sock *sk,
92*4882a593Smuzhiyun			   enum tls_offload_ctx_dir direction,
93*4882a593Smuzhiyun			   struct tls_crypto_info *crypto_info,
94*4882a593Smuzhiyun			   u32 start_offload_tcp_sn);
95*4882a593Smuzhiyun
96*4882a593Smuzhiyun``direction`` indicates whether the cryptographic information is for
97*4882a593Smuzhiyunthe received or transmitted packets. Driver uses the ``sk`` parameter
98*4882a593Smuzhiyunto retrieve the connection 5-tuple and socket family (IPv4 vs IPv6).
99*4882a593SmuzhiyunCryptographic information in ``crypto_info`` includes the key, iv, salt
100*4882a593Smuzhiyunas well as TLS record sequence number. ``start_offload_tcp_sn`` indicates
101*4882a593Smuzhiyunwhich TCP sequence number corresponds to the beginning of the record with
102*4882a593Smuzhiyunsequence number from ``crypto_info``. The driver can add its state
103*4882a593Smuzhiyunat the end of kernel structures (see :c:member:`driver_state` members
104*4882a593Smuzhiyunin ``include/net/tls.h``) to avoid additional allocations and pointer
105*4882a593Smuzhiyundereferences.
106*4882a593Smuzhiyun
107*4882a593SmuzhiyunTX
108*4882a593Smuzhiyun--
109*4882a593Smuzhiyun
110*4882a593SmuzhiyunAfter TX state is installed, the stack guarantees that the first segment
111*4882a593Smuzhiyunof the stream will start exactly at the ``start_offload_tcp_sn`` sequence
112*4882a593Smuzhiyunnumber, simplifying TCP sequence number matching.
113*4882a593Smuzhiyun
114*4882a593SmuzhiyunTX offload being fully initialized does not imply that all segments passing
115*4882a593Smuzhiyunthrough the driver and which belong to the offloaded socket will be after
116*4882a593Smuzhiyunthe expected sequence number and will have kernel record information.
117*4882a593SmuzhiyunIn particular, already encrypted data may have been queued to the socket
118*4882a593Smuzhiyunbefore installing the connection state in the kernel.
119*4882a593Smuzhiyun
120*4882a593SmuzhiyunRX
121*4882a593Smuzhiyun--
122*4882a593Smuzhiyun
123*4882a593SmuzhiyunIn RX direction local networking stack has little control over the segmentation,
124*4882a593Smuzhiyunso the initial records' TCP sequence number may be anywhere inside the segment.
125*4882a593Smuzhiyun
126*4882a593SmuzhiyunNormal operation
127*4882a593Smuzhiyun================
128*4882a593Smuzhiyun
129*4882a593SmuzhiyunAt the minimum the device maintains the following state for each connection, in
130*4882a593Smuzhiyuneach direction:
131*4882a593Smuzhiyun
132*4882a593Smuzhiyun * crypto secrets (key, iv, salt)
133*4882a593Smuzhiyun * crypto processing state (partial blocks, partial authentication tag, etc.)
134*4882a593Smuzhiyun * record metadata (sequence number, processing offset and length)
135*4882a593Smuzhiyun * expected TCP sequence number
136*4882a593Smuzhiyun
137*4882a593SmuzhiyunThere are no guarantees on record length or record segmentation. In particular
138*4882a593Smuzhiyunsegments may start at any point of a record and contain any number of records.
139*4882a593SmuzhiyunAssuming segments are received in order, the device should be able to perform
140*4882a593Smuzhiyuncrypto operations and authentication regardless of segmentation. For this
141*4882a593Smuzhiyunto be possible device has to keep small amount of segment-to-segment state.
142*4882a593SmuzhiyunThis includes at least:
143*4882a593Smuzhiyun
144*4882a593Smuzhiyun * partial headers (if a segment carried only a part of the TLS header)
145*4882a593Smuzhiyun * partial data block
146*4882a593Smuzhiyun * partial authentication tag (all data had been seen but part of the
147*4882a593Smuzhiyun   authentication tag has to be written or read from the subsequent segment)
148*4882a593Smuzhiyun
149*4882a593SmuzhiyunRecord reassembly is not necessary for TLS offload. If the packets arrive
150*4882a593Smuzhiyunin order the device should be able to handle them separately and make
151*4882a593Smuzhiyunforward progress.
152*4882a593Smuzhiyun
153*4882a593SmuzhiyunTX
154*4882a593Smuzhiyun--
155*4882a593Smuzhiyun
156*4882a593SmuzhiyunThe kernel stack performs record framing reserving space for the authentication
157*4882a593Smuzhiyuntag and populating all other TLS header and tailer fields.
158*4882a593Smuzhiyun
159*4882a593SmuzhiyunBoth the device and the driver maintain expected TCP sequence numbers
160*4882a593Smuzhiyundue to the possibility of retransmissions and the lack of software fallback
161*4882a593Smuzhiyunonce the packet reaches the device.
162*4882a593SmuzhiyunFor segments passed in order, the driver marks the packets with
163*4882a593Smuzhiyuna connection identifier (note that a 5-tuple lookup is insufficient to identify
164*4882a593Smuzhiyunpackets requiring HW offload, see the :ref:`5tuple_problems` section)
165*4882a593Smuzhiyunand hands them to the device. The device identifies the packet as requiring
166*4882a593SmuzhiyunTLS handling and confirms the sequence number matches its expectation.
167*4882a593SmuzhiyunThe device performs encryption and authentication of the record data.
168*4882a593SmuzhiyunIt replaces the authentication tag and TCP checksum with correct values.
169*4882a593Smuzhiyun
170*4882a593SmuzhiyunRX
171*4882a593Smuzhiyun--
172*4882a593Smuzhiyun
173*4882a593SmuzhiyunBefore a packet is DMAed to the host (but after NIC's embedded switching
174*4882a593Smuzhiyunand packet transformation functions) the device validates the Layer 4
175*4882a593Smuzhiyunchecksum and performs a 5-tuple lookup to find any TLS connection the packet
176*4882a593Smuzhiyunmay belong to (technically a 4-tuple
177*4882a593Smuzhiyunlookup is sufficient - IP addresses and TCP port numbers, as the protocol
178*4882a593Smuzhiyunis always TCP). If connection is matched device confirms if the TCP sequence
179*4882a593Smuzhiyunnumber is the expected one and proceeds to TLS handling (record delineation,
180*4882a593Smuzhiyundecryption, authentication for each record in the packet). The device leaves
181*4882a593Smuzhiyunthe record framing unmodified, the stack takes care of record decapsulation.
182*4882a593SmuzhiyunDevice indicates successful handling of TLS offload in the per-packet context
183*4882a593Smuzhiyun(descriptor) passed to the host.
184*4882a593Smuzhiyun
185*4882a593SmuzhiyunUpon reception of a TLS offloaded packet, the driver sets
186*4882a593Smuzhiyunthe :c:member:`decrypted` mark in :c:type:`struct sk_buff <sk_buff>`
187*4882a593Smuzhiyuncorresponding to the segment. Networking stack makes sure decrypted
188*4882a593Smuzhiyunand non-decrypted segments do not get coalesced (e.g. by GRO or socket layer)
189*4882a593Smuzhiyunand takes care of partial decryption.
190*4882a593Smuzhiyun
191*4882a593SmuzhiyunResync handling
192*4882a593Smuzhiyun===============
193*4882a593Smuzhiyun
194*4882a593SmuzhiyunIn presence of packet drops or network packet reordering, the device may lose
195*4882a593Smuzhiyunsynchronization with the TLS stream, and require a resync with the kernel's
196*4882a593SmuzhiyunTCP stack.
197*4882a593Smuzhiyun
198*4882a593SmuzhiyunNote that resync is only attempted for connections which were successfully
199*4882a593Smuzhiyunadded to the device table and are in TLS_HW mode. For example,
200*4882a593Smuzhiyunif the table was full when cryptographic state was installed in the kernel,
201*4882a593Smuzhiyunsuch connection will never get offloaded. Therefore the resync request
202*4882a593Smuzhiyundoes not carry any cryptographic connection state.
203*4882a593Smuzhiyun
204*4882a593SmuzhiyunTX
205*4882a593Smuzhiyun--
206*4882a593Smuzhiyun
207*4882a593SmuzhiyunSegments transmitted from an offloaded socket can get out of sync
208*4882a593Smuzhiyunin similar ways to the receive side-retransmissions - local drops
209*4882a593Smuzhiyunare possible, though network reorders are not. There are currently
210*4882a593Smuzhiyuntwo mechanisms for dealing with out of order segments.
211*4882a593Smuzhiyun
212*4882a593SmuzhiyunCrypto state rebuilding
213*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~~~~~~
214*4882a593Smuzhiyun
215*4882a593SmuzhiyunWhenever an out of order segment is transmitted the driver provides
216*4882a593Smuzhiyunthe device with enough information to perform cryptographic operations.
217*4882a593SmuzhiyunThis means most likely that the part of the record preceding the current
218*4882a593Smuzhiyunsegment has to be passed to the device as part of the packet context,
219*4882a593Smuzhiyuntogether with its TCP sequence number and TLS record number. The device
220*4882a593Smuzhiyuncan then initialize its crypto state, process and discard the preceding
221*4882a593Smuzhiyundata (to be able to insert the authentication tag) and move onto handling
222*4882a593Smuzhiyunthe actual packet.
223*4882a593Smuzhiyun
224*4882a593SmuzhiyunIn this mode depending on the implementation the driver can either ask
225*4882a593Smuzhiyunfor a continuation with the crypto state and the new sequence number
226*4882a593Smuzhiyun(next expected segment is the one after the out of order one), or continue
227*4882a593Smuzhiyunwith the previous stream state - assuming that the out of order segment
228*4882a593Smuzhiyunwas just a retransmission. The former is simpler, and does not require
229*4882a593Smuzhiyunretransmission detection therefore it is the recommended method until
230*4882a593Smuzhiyunsuch time it is proven inefficient.
231*4882a593Smuzhiyun
232*4882a593SmuzhiyunNext record sync
233*4882a593Smuzhiyun~~~~~~~~~~~~~~~~
234*4882a593Smuzhiyun
235*4882a593SmuzhiyunWhenever an out of order segment is detected the driver requests
236*4882a593Smuzhiyunthat the ``ktls`` software fallback code encrypt it. If the segment's
237*4882a593Smuzhiyunsequence number is lower than expected the driver assumes retransmission
238*4882a593Smuzhiyunand doesn't change device state. If the segment is in the future, it
239*4882a593Smuzhiyunmay imply a local drop, the driver asks the stack to sync the device
240*4882a593Smuzhiyunto the next record state and falls back to software.
241*4882a593Smuzhiyun
242*4882a593SmuzhiyunResync request is indicated with:
243*4882a593Smuzhiyun
244*4882a593Smuzhiyun.. code-block:: c
245*4882a593Smuzhiyun
246*4882a593Smuzhiyun  void tls_offload_tx_resync_request(struct sock *sk, u32 got_seq, u32 exp_seq)
247*4882a593Smuzhiyun
248*4882a593SmuzhiyunUntil resync is complete driver should not access its expected TCP
249*4882a593Smuzhiyunsequence number (as it will be updated from a different context).
250*4882a593SmuzhiyunFollowing helper should be used to test if resync is complete:
251*4882a593Smuzhiyun
252*4882a593Smuzhiyun.. code-block:: c
253*4882a593Smuzhiyun
254*4882a593Smuzhiyun  bool tls_offload_tx_resync_pending(struct sock *sk)
255*4882a593Smuzhiyun
256*4882a593SmuzhiyunNext time ``ktls`` pushes a record it will first send its TCP sequence number
257*4882a593Smuzhiyunand TLS record number to the driver. Stack will also make sure that
258*4882a593Smuzhiyunthe new record will start on a segment boundary (like it does when
259*4882a593Smuzhiyunthe connection is initially added).
260*4882a593Smuzhiyun
261*4882a593SmuzhiyunRX
262*4882a593Smuzhiyun--
263*4882a593Smuzhiyun
264*4882a593SmuzhiyunA small amount of RX reorder events may not require a full resynchronization.
265*4882a593SmuzhiyunIn particular the device should not lose synchronization
266*4882a593Smuzhiyunwhen record boundary can be recovered:
267*4882a593Smuzhiyun
268*4882a593Smuzhiyun.. kernel-figure::  tls-offload-reorder-good.svg
269*4882a593Smuzhiyun   :alt:	reorder of non-header segment
270*4882a593Smuzhiyun   :align:	center
271*4882a593Smuzhiyun
272*4882a593Smuzhiyun   Reorder of non-header segment
273*4882a593Smuzhiyun
274*4882a593SmuzhiyunGreen segments are successfully decrypted, blue ones are passed
275*4882a593Smuzhiyunas received on wire, red stripes mark start of new records.
276*4882a593Smuzhiyun
277*4882a593SmuzhiyunIn above case segment 1 is received and decrypted successfully.
278*4882a593SmuzhiyunSegment 2 was dropped so 3 arrives out of order. The device knows
279*4882a593Smuzhiyunthe next record starts inside 3, based on record length in segment 1.
280*4882a593SmuzhiyunSegment 3 is passed untouched, because due to lack of data from segment 2
281*4882a593Smuzhiyunthe remainder of the previous record inside segment 3 cannot be handled.
282*4882a593SmuzhiyunThe device can, however, collect the authentication algorithm's state
283*4882a593Smuzhiyunand partial block from the new record in segment 3 and when 4 and 5
284*4882a593Smuzhiyunarrive continue decryption. Finally when 2 arrives it's completely outside
285*4882a593Smuzhiyunof expected window of the device so it's passed as is without special
286*4882a593Smuzhiyunhandling. ``ktls`` software fallback handles the decryption of record
287*4882a593Smuzhiyunspanning segments 1, 2 and 3. The device did not get out of sync,
288*4882a593Smuzhiyuneven though two segments did not get decrypted.
289*4882a593Smuzhiyun
290*4882a593SmuzhiyunKernel synchronization may be necessary if the lost segment contained
291*4882a593Smuzhiyuna record header and arrived after the next record header has already passed:
292*4882a593Smuzhiyun
293*4882a593Smuzhiyun.. kernel-figure::  tls-offload-reorder-bad.svg
294*4882a593Smuzhiyun   :alt:	reorder of header segment
295*4882a593Smuzhiyun   :align:	center
296*4882a593Smuzhiyun
297*4882a593Smuzhiyun   Reorder of segment with a TLS header
298*4882a593Smuzhiyun
299*4882a593SmuzhiyunIn this example segment 2 gets dropped, and it contains a record header.
300*4882a593SmuzhiyunDevice can only detect that segment 4 also contains a TLS header
301*4882a593Smuzhiyunif it knows the length of the previous record from segment 2. In this case
302*4882a593Smuzhiyunthe device will lose synchronization with the stream.
303*4882a593Smuzhiyun
304*4882a593SmuzhiyunStream scan resynchronization
305*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
306*4882a593Smuzhiyun
307*4882a593SmuzhiyunWhen the device gets out of sync and the stream reaches TCP sequence
308*4882a593Smuzhiyunnumbers more than a max size record past the expected TCP sequence number,
309*4882a593Smuzhiyunthe device starts scanning for a known header pattern. For example
310*4882a593Smuzhiyunfor TLS 1.2 and TLS 1.3 subsequent bytes of value ``0x03 0x03`` occur
311*4882a593Smuzhiyunin the SSL/TLS version field of the header. Once pattern is matched
312*4882a593Smuzhiyunthe device continues attempting parsing headers at expected locations
313*4882a593Smuzhiyun(based on the length fields at guessed locations).
314*4882a593SmuzhiyunWhenever the expected location does not contain a valid header the scan
315*4882a593Smuzhiyunis restarted.
316*4882a593Smuzhiyun
317*4882a593SmuzhiyunWhen the header is matched the device sends a confirmation request
318*4882a593Smuzhiyunto the kernel, asking if the guessed location is correct (if a TLS record
319*4882a593Smuzhiyunreally starts there), and which record sequence number the given header had.
320*4882a593SmuzhiyunThe kernel confirms the guessed location was correct and tells the device
321*4882a593Smuzhiyunthe record sequence number. Meanwhile, the device had been parsing
322*4882a593Smuzhiyunand counting all records since the just-confirmed one, it adds the number
323*4882a593Smuzhiyunof records it had seen to the record number provided by the kernel.
324*4882a593SmuzhiyunAt this point the device is in sync and can resume decryption at next
325*4882a593Smuzhiyunsegment boundary.
326*4882a593Smuzhiyun
327*4882a593SmuzhiyunIn a pathological case the device may latch onto a sequence of matching
328*4882a593Smuzhiyunheaders and never hear back from the kernel (there is no negative
329*4882a593Smuzhiyunconfirmation from the kernel). The implementation may choose to periodically
330*4882a593Smuzhiyunrestart scan. Given how unlikely falsely-matching stream is, however,
331*4882a593Smuzhiyunperiodic restart is not deemed necessary.
332*4882a593Smuzhiyun
333*4882a593SmuzhiyunSpecial care has to be taken if the confirmation request is passed
334*4882a593Smuzhiyunasynchronously to the packet stream and record may get processed
335*4882a593Smuzhiyunby the kernel before the confirmation request.
336*4882a593Smuzhiyun
337*4882a593SmuzhiyunStack-driven resynchronization
338*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
339*4882a593Smuzhiyun
340*4882a593SmuzhiyunThe driver may also request the stack to perform resynchronization
341*4882a593Smuzhiyunwhenever it sees the records are no longer getting decrypted.
342*4882a593SmuzhiyunIf the connection is configured in this mode the stack automatically
343*4882a593Smuzhiyunschedules resynchronization after it has received two completely encrypted
344*4882a593Smuzhiyunrecords.
345*4882a593Smuzhiyun
346*4882a593SmuzhiyunThe stack waits for the socket to drain and informs the device about
347*4882a593Smuzhiyunthe next expected record number and its TCP sequence number. If the
348*4882a593Smuzhiyunrecords continue to be received fully encrypted stack retries the
349*4882a593Smuzhiyunsynchronization with an exponential back off (first after 2 encrypted
350*4882a593Smuzhiyunrecords, then after 4 records, after 8, after 16... up until every
351*4882a593Smuzhiyun128 records).
352*4882a593Smuzhiyun
353*4882a593SmuzhiyunError handling
354*4882a593Smuzhiyun==============
355*4882a593Smuzhiyun
356*4882a593SmuzhiyunTX
357*4882a593Smuzhiyun--
358*4882a593Smuzhiyun
359*4882a593SmuzhiyunPackets may be redirected or rerouted by the stack to a different
360*4882a593Smuzhiyundevice than the selected TLS offload device. The stack will handle
361*4882a593Smuzhiyunsuch condition using the :c:func:`sk_validate_xmit_skb` helper
362*4882a593Smuzhiyun(TLS offload code installs :c:func:`tls_validate_xmit_skb` at this hook).
363*4882a593SmuzhiyunOffload maintains information about all records until the data is
364*4882a593Smuzhiyunfully acknowledged, so if skbs reach the wrong device they can be handled
365*4882a593Smuzhiyunby software fallback.
366*4882a593Smuzhiyun
367*4882a593SmuzhiyunAny device TLS offload handling error on the transmission side must result
368*4882a593Smuzhiyunin the packet being dropped. For example if a packet got out of order
369*4882a593Smuzhiyundue to a bug in the stack or the device, reached the device and can't
370*4882a593Smuzhiyunbe encrypted such packet must be dropped.
371*4882a593Smuzhiyun
372*4882a593SmuzhiyunRX
373*4882a593Smuzhiyun--
374*4882a593Smuzhiyun
375*4882a593SmuzhiyunIf the device encounters any problems with TLS offload on the receive
376*4882a593Smuzhiyunside it should pass the packet to the host's networking stack as it was
377*4882a593Smuzhiyunreceived on the wire.
378*4882a593Smuzhiyun
379*4882a593SmuzhiyunFor example authentication failure for any record in the segment should
380*4882a593Smuzhiyunresult in passing the unmodified packet to the software fallback. This means
381*4882a593Smuzhiyunpackets should not be modified "in place". Splitting segments to handle partial
382*4882a593Smuzhiyundecryption is not advised. In other words either all records in the packet
383*4882a593Smuzhiyunhad been handled successfully and authenticated or the packet has to be passed
384*4882a593Smuzhiyunto the host's stack as it was on the wire (recovering original packet in the
385*4882a593Smuzhiyundriver if device provides precise error is sufficient).
386*4882a593Smuzhiyun
387*4882a593SmuzhiyunThe Linux networking stack does not provide a way of reporting per-packet
388*4882a593Smuzhiyundecryption and authentication errors, packets with errors must simply not
389*4882a593Smuzhiyunhave the :c:member:`decrypted` mark set.
390*4882a593Smuzhiyun
391*4882a593SmuzhiyunA packet should also not be handled by the TLS offload if it contains
392*4882a593Smuzhiyunincorrect checksums.
393*4882a593Smuzhiyun
394*4882a593SmuzhiyunPerformance metrics
395*4882a593Smuzhiyun===================
396*4882a593Smuzhiyun
397*4882a593SmuzhiyunTLS offload can be characterized by the following basic metrics:
398*4882a593Smuzhiyun
399*4882a593Smuzhiyun * max connection count
400*4882a593Smuzhiyun * connection installation rate
401*4882a593Smuzhiyun * connection installation latency
402*4882a593Smuzhiyun * total cryptographic performance
403*4882a593Smuzhiyun
404*4882a593SmuzhiyunNote that each TCP connection requires a TLS session in both directions,
405*4882a593Smuzhiyunthe performance may be reported treating each direction separately.
406*4882a593Smuzhiyun
407*4882a593SmuzhiyunMax connection count
408*4882a593Smuzhiyun--------------------
409*4882a593Smuzhiyun
410*4882a593SmuzhiyunThe number of connections device can support can be exposed via
411*4882a593Smuzhiyun``devlink resource`` API.
412*4882a593Smuzhiyun
413*4882a593SmuzhiyunTotal cryptographic performance
414*4882a593Smuzhiyun-------------------------------
415*4882a593Smuzhiyun
416*4882a593SmuzhiyunOffload performance may depend on segment and record size.
417*4882a593Smuzhiyun
418*4882a593SmuzhiyunOverload of the cryptographic subsystem of the device should not have
419*4882a593Smuzhiyunsignificant performance impact on non-offloaded streams.
420*4882a593Smuzhiyun
421*4882a593SmuzhiyunStatistics
422*4882a593Smuzhiyun==========
423*4882a593Smuzhiyun
424*4882a593SmuzhiyunFollowing minimum set of TLS-related statistics should be reported
425*4882a593Smuzhiyunby the driver:
426*4882a593Smuzhiyun
427*4882a593Smuzhiyun * ``rx_tls_decrypted_packets`` - number of successfully decrypted RX packets
428*4882a593Smuzhiyun   which were part of a TLS stream.
429*4882a593Smuzhiyun * ``rx_tls_decrypted_bytes`` - number of TLS payload bytes in RX packets
430*4882a593Smuzhiyun   which were successfully decrypted.
431*4882a593Smuzhiyun * ``rx_tls_ctx`` - number of TLS RX HW offload contexts added to device for
432*4882a593Smuzhiyun   decryption.
433*4882a593Smuzhiyun * ``rx_tls_del`` - number of TLS RX HW offload contexts deleted from device
434*4882a593Smuzhiyun   (connection has finished).
435*4882a593Smuzhiyun * ``rx_tls_resync_req_pkt`` - number of received TLS packets with a resync
436*4882a593Smuzhiyun    request.
437*4882a593Smuzhiyun * ``rx_tls_resync_req_start`` - number of times the TLS async resync request
438*4882a593Smuzhiyun    was started.
439*4882a593Smuzhiyun * ``rx_tls_resync_req_end`` - number of times the TLS async resync request
440*4882a593Smuzhiyun    properly ended with providing the HW tracked tcp-seq.
441*4882a593Smuzhiyun * ``rx_tls_resync_req_skip`` - number of times the TLS async resync request
442*4882a593Smuzhiyun    procedure was started by not properly ended.
443*4882a593Smuzhiyun * ``rx_tls_resync_res_ok`` - number of times the TLS resync response call to
444*4882a593Smuzhiyun    the driver was successfully handled.
445*4882a593Smuzhiyun * ``rx_tls_resync_res_skip`` - number of times the TLS resync response call to
446*4882a593Smuzhiyun    the driver was terminated unsuccessfully.
447*4882a593Smuzhiyun * ``rx_tls_err`` - number of RX packets which were part of a TLS stream
448*4882a593Smuzhiyun   but were not decrypted due to unexpected error in the state machine.
449*4882a593Smuzhiyun * ``tx_tls_encrypted_packets`` - number of TX packets passed to the device
450*4882a593Smuzhiyun   for encryption of their TLS payload.
451*4882a593Smuzhiyun * ``tx_tls_encrypted_bytes`` - number of TLS payload bytes in TX packets
452*4882a593Smuzhiyun   passed to the device for encryption.
453*4882a593Smuzhiyun * ``tx_tls_ctx`` - number of TLS TX HW offload contexts added to device for
454*4882a593Smuzhiyun   encryption.
455*4882a593Smuzhiyun * ``tx_tls_ooo`` - number of TX packets which were part of a TLS stream
456*4882a593Smuzhiyun   but did not arrive in the expected order.
457*4882a593Smuzhiyun * ``tx_tls_skip_no_sync_data`` - number of TX packets which were part of
458*4882a593Smuzhiyun   a TLS stream and arrived out-of-order, but skipped the HW offload routine
459*4882a593Smuzhiyun   and went to the regular transmit flow as they were retransmissions of the
460*4882a593Smuzhiyun   connection handshake.
461*4882a593Smuzhiyun * ``tx_tls_drop_no_sync_data`` - number of TX packets which were part of
462*4882a593Smuzhiyun   a TLS stream dropped, because they arrived out of order and associated
463*4882a593Smuzhiyun   record could not be found.
464*4882a593Smuzhiyun * ``tx_tls_drop_bypass_req`` - number of TX packets which were part of a TLS
465*4882a593Smuzhiyun   stream dropped, because they contain both data that has been encrypted by
466*4882a593Smuzhiyun   software and data that expects hardware crypto offload.
467*4882a593Smuzhiyun
468*4882a593SmuzhiyunNotable corner cases, exceptions and additional requirements
469*4882a593Smuzhiyun============================================================
470*4882a593Smuzhiyun
471*4882a593Smuzhiyun.. _5tuple_problems:
472*4882a593Smuzhiyun
473*4882a593Smuzhiyun5-tuple matching limitations
474*4882a593Smuzhiyun----------------------------
475*4882a593Smuzhiyun
476*4882a593SmuzhiyunThe device can only recognize received packets based on the 5-tuple
477*4882a593Smuzhiyunof the socket. Current ``ktls`` implementation will not offload sockets
478*4882a593Smuzhiyunrouted through software interfaces such as those used for tunneling
479*4882a593Smuzhiyunor virtual networking. However, many packet transformations performed
480*4882a593Smuzhiyunby the networking stack (most notably any BPF logic) do not require
481*4882a593Smuzhiyunany intermediate software device, therefore a 5-tuple match may
482*4882a593Smuzhiyunconsistently miss at the device level. In such cases the device
483*4882a593Smuzhiyunshould still be able to perform TX offload (encryption) and should
484*4882a593Smuzhiyunfallback cleanly to software decryption (RX).
485*4882a593Smuzhiyun
486*4882a593SmuzhiyunOut of order
487*4882a593Smuzhiyun------------
488*4882a593Smuzhiyun
489*4882a593SmuzhiyunIntroducing extra processing in NICs should not cause packets to be
490*4882a593Smuzhiyuntransmitted or received out of order, for example pure ACK packets
491*4882a593Smuzhiyunshould not be reordered with respect to data segments.
492*4882a593Smuzhiyun
493*4882a593SmuzhiyunIngress reorder
494*4882a593Smuzhiyun---------------
495*4882a593Smuzhiyun
496*4882a593SmuzhiyunA device is permitted to perform packet reordering for consecutive
497*4882a593SmuzhiyunTCP segments (i.e. placing packets in the correct order) but any form
498*4882a593Smuzhiyunof additional buffering is disallowed.
499*4882a593Smuzhiyun
500*4882a593SmuzhiyunCoexistence with standard networking offload features
501*4882a593Smuzhiyun-----------------------------------------------------
502*4882a593Smuzhiyun
503*4882a593SmuzhiyunOffloaded ``ktls`` sockets should support standard TCP stack features
504*4882a593Smuzhiyuntransparently. Enabling device TLS offload should not cause any difference
505*4882a593Smuzhiyunin packets as seen on the wire.
506*4882a593Smuzhiyun
507*4882a593SmuzhiyunTransport layer transparency
508*4882a593Smuzhiyun----------------------------
509*4882a593Smuzhiyun
510*4882a593SmuzhiyunThe device should not modify any packet headers for the purpose
511*4882a593Smuzhiyunof the simplifying TLS offload.
512*4882a593Smuzhiyun
513*4882a593SmuzhiyunThe device should not depend on any packet headers beyond what is strictly
514*4882a593Smuzhiyunnecessary for TLS offload.
515*4882a593Smuzhiyun
516*4882a593SmuzhiyunSegment drops
517*4882a593Smuzhiyun-------------
518*4882a593Smuzhiyun
519*4882a593SmuzhiyunDropping packets is acceptable only in the event of catastrophic
520*4882a593Smuzhiyunsystem errors and should never be used as an error handling mechanism
521*4882a593Smuzhiyunin cases arising from normal operation. In other words, reliance
522*4882a593Smuzhiyunon TCP retransmissions to handle corner cases is not acceptable.
523*4882a593Smuzhiyun
524*4882a593SmuzhiyunTLS device features
525*4882a593Smuzhiyun-------------------
526*4882a593Smuzhiyun
527*4882a593SmuzhiyunDrivers should ignore the changes to TLS the device feature flags.
528*4882a593SmuzhiyunThese flags will be acted upon accordingly by the core ``ktls`` code.
529*4882a593SmuzhiyunTLS device feature flags only control adding of new TLS connection
530*4882a593Smuzhiyunoffloads, old connections will remain active after flags are cleared.
531