1*4882a593Smuzhiyun.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) 2*4882a593Smuzhiyun 3*4882a593Smuzhiyun================== 4*4882a593SmuzhiyunKernel TLS offload 5*4882a593Smuzhiyun================== 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunKernel TLS operation 8*4882a593Smuzhiyun==================== 9*4882a593Smuzhiyun 10*4882a593SmuzhiyunLinux kernel provides TLS connection offload infrastructure. Once a TCP 11*4882a593Smuzhiyunconnection is in ``ESTABLISHED`` state user space can enable the TLS Upper 12*4882a593SmuzhiyunLayer Protocol (ULP) and install the cryptographic connection state. 13*4882a593SmuzhiyunFor details regarding the user-facing interface refer to the TLS 14*4882a593Smuzhiyundocumentation in :ref:`Documentation/networking/tls.rst <kernel_tls>`. 15*4882a593Smuzhiyun 16*4882a593Smuzhiyun``ktls`` can operate in three modes: 17*4882a593Smuzhiyun 18*4882a593Smuzhiyun * Software crypto mode (``TLS_SW``) - CPU handles the cryptography. 19*4882a593Smuzhiyun In most basic cases only crypto operations synchronous with the CPU 20*4882a593Smuzhiyun can be used, but depending on calling context CPU may utilize 21*4882a593Smuzhiyun asynchronous crypto accelerators. The use of accelerators introduces extra 22*4882a593Smuzhiyun latency on socket reads (decryption only starts when a read syscall 23*4882a593Smuzhiyun is made) and additional I/O load on the system. 24*4882a593Smuzhiyun * Packet-based NIC offload mode (``TLS_HW``) - the NIC handles crypto 25*4882a593Smuzhiyun on a packet by packet basis, provided the packets arrive in order. 26*4882a593Smuzhiyun This mode integrates best with the kernel stack and is described in detail 27*4882a593Smuzhiyun in the remaining part of this document 28*4882a593Smuzhiyun (``ethtool`` flags ``tls-hw-tx-offload`` and ``tls-hw-rx-offload``). 29*4882a593Smuzhiyun * Full TCP NIC offload mode (``TLS_HW_RECORD``) - mode of operation where 30*4882a593Smuzhiyun NIC driver and firmware replace the kernel networking stack 31*4882a593Smuzhiyun with its own TCP handling, it is not usable in production environments 32*4882a593Smuzhiyun making use of the Linux networking stack for example any firewalling 33*4882a593Smuzhiyun abilities or QoS and packet scheduling (``ethtool`` flag ``tls-hw-record``). 34*4882a593Smuzhiyun 35*4882a593SmuzhiyunThe operation mode is selected automatically based on device configuration, 36*4882a593Smuzhiyunoffload opt-in or opt-out on per-connection basis is not currently supported. 37*4882a593Smuzhiyun 38*4882a593SmuzhiyunTX 39*4882a593Smuzhiyun-- 40*4882a593Smuzhiyun 41*4882a593SmuzhiyunAt a high level user write requests are turned into a scatter list, the TLS ULP 42*4882a593Smuzhiyunintercepts them, inserts record framing, performs encryption (in ``TLS_SW`` 43*4882a593Smuzhiyunmode) and then hands the modified scatter list to the TCP layer. From this 44*4882a593Smuzhiyunpoint on the TCP stack proceeds as normal. 45*4882a593Smuzhiyun 46*4882a593SmuzhiyunIn ``TLS_HW`` mode the encryption is not performed in the TLS ULP. 47*4882a593SmuzhiyunInstead packets reach a device driver, the driver will mark the packets 48*4882a593Smuzhiyunfor crypto offload based on the socket the packet is attached to, 49*4882a593Smuzhiyunand send them to the device for encryption and transmission. 50*4882a593Smuzhiyun 51*4882a593SmuzhiyunRX 52*4882a593Smuzhiyun-- 53*4882a593Smuzhiyun 54*4882a593SmuzhiyunOn the receive side if the device handled decryption and authentication 55*4882a593Smuzhiyunsuccessfully, the driver will set the decrypted bit in the associated 56*4882a593Smuzhiyun:c:type:`struct sk_buff <sk_buff>`. The packets reach the TCP stack and 57*4882a593Smuzhiyunare handled normally. ``ktls`` is informed when data is queued to the socket 58*4882a593Smuzhiyunand the ``strparser`` mechanism is used to delineate the records. Upon read 59*4882a593Smuzhiyunrequest, records are retrieved from the socket and passed to decryption routine. 60*4882a593SmuzhiyunIf device decrypted all the segments of the record the decryption is skipped, 61*4882a593Smuzhiyunotherwise software path handles decryption. 62*4882a593Smuzhiyun 63*4882a593Smuzhiyun.. kernel-figure:: tls-offload-layers.svg 64*4882a593Smuzhiyun :alt: TLS offload layers 65*4882a593Smuzhiyun :align: center 66*4882a593Smuzhiyun :figwidth: 28em 67*4882a593Smuzhiyun 68*4882a593Smuzhiyun Layers of Kernel TLS stack 69*4882a593Smuzhiyun 70*4882a593SmuzhiyunDevice configuration 71*4882a593Smuzhiyun==================== 72*4882a593Smuzhiyun 73*4882a593SmuzhiyunDuring driver initialization device sets the ``NETIF_F_HW_TLS_RX`` and 74*4882a593Smuzhiyun``NETIF_F_HW_TLS_TX`` features and installs its 75*4882a593Smuzhiyun:c:type:`struct tlsdev_ops <tlsdev_ops>` 76*4882a593Smuzhiyunpointer in the :c:member:`tlsdev_ops` member of the 77*4882a593Smuzhiyun:c:type:`struct net_device <net_device>`. 78*4882a593Smuzhiyun 79*4882a593SmuzhiyunWhen TLS cryptographic connection state is installed on a ``ktls`` socket 80*4882a593Smuzhiyun(note that it is done twice, once for RX and once for TX direction, 81*4882a593Smuzhiyunand the two are completely independent), the kernel checks if the underlying 82*4882a593Smuzhiyunnetwork device is offload-capable and attempts the offload. In case offload 83*4882a593Smuzhiyunfails the connection is handled entirely in software using the same mechanism 84*4882a593Smuzhiyunas if the offload was never tried. 85*4882a593Smuzhiyun 86*4882a593SmuzhiyunOffload request is performed via the :c:member:`tls_dev_add` callback of 87*4882a593Smuzhiyun:c:type:`struct tlsdev_ops <tlsdev_ops>`: 88*4882a593Smuzhiyun 89*4882a593Smuzhiyun.. code-block:: c 90*4882a593Smuzhiyun 91*4882a593Smuzhiyun int (*tls_dev_add)(struct net_device *netdev, struct sock *sk, 92*4882a593Smuzhiyun enum tls_offload_ctx_dir direction, 93*4882a593Smuzhiyun struct tls_crypto_info *crypto_info, 94*4882a593Smuzhiyun u32 start_offload_tcp_sn); 95*4882a593Smuzhiyun 96*4882a593Smuzhiyun``direction`` indicates whether the cryptographic information is for 97*4882a593Smuzhiyunthe received or transmitted packets. Driver uses the ``sk`` parameter 98*4882a593Smuzhiyunto retrieve the connection 5-tuple and socket family (IPv4 vs IPv6). 99*4882a593SmuzhiyunCryptographic information in ``crypto_info`` includes the key, iv, salt 100*4882a593Smuzhiyunas well as TLS record sequence number. ``start_offload_tcp_sn`` indicates 101*4882a593Smuzhiyunwhich TCP sequence number corresponds to the beginning of the record with 102*4882a593Smuzhiyunsequence number from ``crypto_info``. The driver can add its state 103*4882a593Smuzhiyunat the end of kernel structures (see :c:member:`driver_state` members 104*4882a593Smuzhiyunin ``include/net/tls.h``) to avoid additional allocations and pointer 105*4882a593Smuzhiyundereferences. 106*4882a593Smuzhiyun 107*4882a593SmuzhiyunTX 108*4882a593Smuzhiyun-- 109*4882a593Smuzhiyun 110*4882a593SmuzhiyunAfter TX state is installed, the stack guarantees that the first segment 111*4882a593Smuzhiyunof the stream will start exactly at the ``start_offload_tcp_sn`` sequence 112*4882a593Smuzhiyunnumber, simplifying TCP sequence number matching. 113*4882a593Smuzhiyun 114*4882a593SmuzhiyunTX offload being fully initialized does not imply that all segments passing 115*4882a593Smuzhiyunthrough the driver and which belong to the offloaded socket will be after 116*4882a593Smuzhiyunthe expected sequence number and will have kernel record information. 117*4882a593SmuzhiyunIn particular, already encrypted data may have been queued to the socket 118*4882a593Smuzhiyunbefore installing the connection state in the kernel. 119*4882a593Smuzhiyun 120*4882a593SmuzhiyunRX 121*4882a593Smuzhiyun-- 122*4882a593Smuzhiyun 123*4882a593SmuzhiyunIn RX direction local networking stack has little control over the segmentation, 124*4882a593Smuzhiyunso the initial records' TCP sequence number may be anywhere inside the segment. 125*4882a593Smuzhiyun 126*4882a593SmuzhiyunNormal operation 127*4882a593Smuzhiyun================ 128*4882a593Smuzhiyun 129*4882a593SmuzhiyunAt the minimum the device maintains the following state for each connection, in 130*4882a593Smuzhiyuneach direction: 131*4882a593Smuzhiyun 132*4882a593Smuzhiyun * crypto secrets (key, iv, salt) 133*4882a593Smuzhiyun * crypto processing state (partial blocks, partial authentication tag, etc.) 134*4882a593Smuzhiyun * record metadata (sequence number, processing offset and length) 135*4882a593Smuzhiyun * expected TCP sequence number 136*4882a593Smuzhiyun 137*4882a593SmuzhiyunThere are no guarantees on record length or record segmentation. In particular 138*4882a593Smuzhiyunsegments may start at any point of a record and contain any number of records. 139*4882a593SmuzhiyunAssuming segments are received in order, the device should be able to perform 140*4882a593Smuzhiyuncrypto operations and authentication regardless of segmentation. For this 141*4882a593Smuzhiyunto be possible device has to keep small amount of segment-to-segment state. 142*4882a593SmuzhiyunThis includes at least: 143*4882a593Smuzhiyun 144*4882a593Smuzhiyun * partial headers (if a segment carried only a part of the TLS header) 145*4882a593Smuzhiyun * partial data block 146*4882a593Smuzhiyun * partial authentication tag (all data had been seen but part of the 147*4882a593Smuzhiyun authentication tag has to be written or read from the subsequent segment) 148*4882a593Smuzhiyun 149*4882a593SmuzhiyunRecord reassembly is not necessary for TLS offload. If the packets arrive 150*4882a593Smuzhiyunin order the device should be able to handle them separately and make 151*4882a593Smuzhiyunforward progress. 152*4882a593Smuzhiyun 153*4882a593SmuzhiyunTX 154*4882a593Smuzhiyun-- 155*4882a593Smuzhiyun 156*4882a593SmuzhiyunThe kernel stack performs record framing reserving space for the authentication 157*4882a593Smuzhiyuntag and populating all other TLS header and tailer fields. 158*4882a593Smuzhiyun 159*4882a593SmuzhiyunBoth the device and the driver maintain expected TCP sequence numbers 160*4882a593Smuzhiyundue to the possibility of retransmissions and the lack of software fallback 161*4882a593Smuzhiyunonce the packet reaches the device. 162*4882a593SmuzhiyunFor segments passed in order, the driver marks the packets with 163*4882a593Smuzhiyuna connection identifier (note that a 5-tuple lookup is insufficient to identify 164*4882a593Smuzhiyunpackets requiring HW offload, see the :ref:`5tuple_problems` section) 165*4882a593Smuzhiyunand hands them to the device. The device identifies the packet as requiring 166*4882a593SmuzhiyunTLS handling and confirms the sequence number matches its expectation. 167*4882a593SmuzhiyunThe device performs encryption and authentication of the record data. 168*4882a593SmuzhiyunIt replaces the authentication tag and TCP checksum with correct values. 169*4882a593Smuzhiyun 170*4882a593SmuzhiyunRX 171*4882a593Smuzhiyun-- 172*4882a593Smuzhiyun 173*4882a593SmuzhiyunBefore a packet is DMAed to the host (but after NIC's embedded switching 174*4882a593Smuzhiyunand packet transformation functions) the device validates the Layer 4 175*4882a593Smuzhiyunchecksum and performs a 5-tuple lookup to find any TLS connection the packet 176*4882a593Smuzhiyunmay belong to (technically a 4-tuple 177*4882a593Smuzhiyunlookup is sufficient - IP addresses and TCP port numbers, as the protocol 178*4882a593Smuzhiyunis always TCP). If connection is matched device confirms if the TCP sequence 179*4882a593Smuzhiyunnumber is the expected one and proceeds to TLS handling (record delineation, 180*4882a593Smuzhiyundecryption, authentication for each record in the packet). The device leaves 181*4882a593Smuzhiyunthe record framing unmodified, the stack takes care of record decapsulation. 182*4882a593SmuzhiyunDevice indicates successful handling of TLS offload in the per-packet context 183*4882a593Smuzhiyun(descriptor) passed to the host. 184*4882a593Smuzhiyun 185*4882a593SmuzhiyunUpon reception of a TLS offloaded packet, the driver sets 186*4882a593Smuzhiyunthe :c:member:`decrypted` mark in :c:type:`struct sk_buff <sk_buff>` 187*4882a593Smuzhiyuncorresponding to the segment. Networking stack makes sure decrypted 188*4882a593Smuzhiyunand non-decrypted segments do not get coalesced (e.g. by GRO or socket layer) 189*4882a593Smuzhiyunand takes care of partial decryption. 190*4882a593Smuzhiyun 191*4882a593SmuzhiyunResync handling 192*4882a593Smuzhiyun=============== 193*4882a593Smuzhiyun 194*4882a593SmuzhiyunIn presence of packet drops or network packet reordering, the device may lose 195*4882a593Smuzhiyunsynchronization with the TLS stream, and require a resync with the kernel's 196*4882a593SmuzhiyunTCP stack. 197*4882a593Smuzhiyun 198*4882a593SmuzhiyunNote that resync is only attempted for connections which were successfully 199*4882a593Smuzhiyunadded to the device table and are in TLS_HW mode. For example, 200*4882a593Smuzhiyunif the table was full when cryptographic state was installed in the kernel, 201*4882a593Smuzhiyunsuch connection will never get offloaded. Therefore the resync request 202*4882a593Smuzhiyundoes not carry any cryptographic connection state. 203*4882a593Smuzhiyun 204*4882a593SmuzhiyunTX 205*4882a593Smuzhiyun-- 206*4882a593Smuzhiyun 207*4882a593SmuzhiyunSegments transmitted from an offloaded socket can get out of sync 208*4882a593Smuzhiyunin similar ways to the receive side-retransmissions - local drops 209*4882a593Smuzhiyunare possible, though network reorders are not. There are currently 210*4882a593Smuzhiyuntwo mechanisms for dealing with out of order segments. 211*4882a593Smuzhiyun 212*4882a593SmuzhiyunCrypto state rebuilding 213*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~~~~~~ 214*4882a593Smuzhiyun 215*4882a593SmuzhiyunWhenever an out of order segment is transmitted the driver provides 216*4882a593Smuzhiyunthe device with enough information to perform cryptographic operations. 217*4882a593SmuzhiyunThis means most likely that the part of the record preceding the current 218*4882a593Smuzhiyunsegment has to be passed to the device as part of the packet context, 219*4882a593Smuzhiyuntogether with its TCP sequence number and TLS record number. The device 220*4882a593Smuzhiyuncan then initialize its crypto state, process and discard the preceding 221*4882a593Smuzhiyundata (to be able to insert the authentication tag) and move onto handling 222*4882a593Smuzhiyunthe actual packet. 223*4882a593Smuzhiyun 224*4882a593SmuzhiyunIn this mode depending on the implementation the driver can either ask 225*4882a593Smuzhiyunfor a continuation with the crypto state and the new sequence number 226*4882a593Smuzhiyun(next expected segment is the one after the out of order one), or continue 227*4882a593Smuzhiyunwith the previous stream state - assuming that the out of order segment 228*4882a593Smuzhiyunwas just a retransmission. The former is simpler, and does not require 229*4882a593Smuzhiyunretransmission detection therefore it is the recommended method until 230*4882a593Smuzhiyunsuch time it is proven inefficient. 231*4882a593Smuzhiyun 232*4882a593SmuzhiyunNext record sync 233*4882a593Smuzhiyun~~~~~~~~~~~~~~~~ 234*4882a593Smuzhiyun 235*4882a593SmuzhiyunWhenever an out of order segment is detected the driver requests 236*4882a593Smuzhiyunthat the ``ktls`` software fallback code encrypt it. If the segment's 237*4882a593Smuzhiyunsequence number is lower than expected the driver assumes retransmission 238*4882a593Smuzhiyunand doesn't change device state. If the segment is in the future, it 239*4882a593Smuzhiyunmay imply a local drop, the driver asks the stack to sync the device 240*4882a593Smuzhiyunto the next record state and falls back to software. 241*4882a593Smuzhiyun 242*4882a593SmuzhiyunResync request is indicated with: 243*4882a593Smuzhiyun 244*4882a593Smuzhiyun.. code-block:: c 245*4882a593Smuzhiyun 246*4882a593Smuzhiyun void tls_offload_tx_resync_request(struct sock *sk, u32 got_seq, u32 exp_seq) 247*4882a593Smuzhiyun 248*4882a593SmuzhiyunUntil resync is complete driver should not access its expected TCP 249*4882a593Smuzhiyunsequence number (as it will be updated from a different context). 250*4882a593SmuzhiyunFollowing helper should be used to test if resync is complete: 251*4882a593Smuzhiyun 252*4882a593Smuzhiyun.. code-block:: c 253*4882a593Smuzhiyun 254*4882a593Smuzhiyun bool tls_offload_tx_resync_pending(struct sock *sk) 255*4882a593Smuzhiyun 256*4882a593SmuzhiyunNext time ``ktls`` pushes a record it will first send its TCP sequence number 257*4882a593Smuzhiyunand TLS record number to the driver. Stack will also make sure that 258*4882a593Smuzhiyunthe new record will start on a segment boundary (like it does when 259*4882a593Smuzhiyunthe connection is initially added). 260*4882a593Smuzhiyun 261*4882a593SmuzhiyunRX 262*4882a593Smuzhiyun-- 263*4882a593Smuzhiyun 264*4882a593SmuzhiyunA small amount of RX reorder events may not require a full resynchronization. 265*4882a593SmuzhiyunIn particular the device should not lose synchronization 266*4882a593Smuzhiyunwhen record boundary can be recovered: 267*4882a593Smuzhiyun 268*4882a593Smuzhiyun.. kernel-figure:: tls-offload-reorder-good.svg 269*4882a593Smuzhiyun :alt: reorder of non-header segment 270*4882a593Smuzhiyun :align: center 271*4882a593Smuzhiyun 272*4882a593Smuzhiyun Reorder of non-header segment 273*4882a593Smuzhiyun 274*4882a593SmuzhiyunGreen segments are successfully decrypted, blue ones are passed 275*4882a593Smuzhiyunas received on wire, red stripes mark start of new records. 276*4882a593Smuzhiyun 277*4882a593SmuzhiyunIn above case segment 1 is received and decrypted successfully. 278*4882a593SmuzhiyunSegment 2 was dropped so 3 arrives out of order. The device knows 279*4882a593Smuzhiyunthe next record starts inside 3, based on record length in segment 1. 280*4882a593SmuzhiyunSegment 3 is passed untouched, because due to lack of data from segment 2 281*4882a593Smuzhiyunthe remainder of the previous record inside segment 3 cannot be handled. 282*4882a593SmuzhiyunThe device can, however, collect the authentication algorithm's state 283*4882a593Smuzhiyunand partial block from the new record in segment 3 and when 4 and 5 284*4882a593Smuzhiyunarrive continue decryption. Finally when 2 arrives it's completely outside 285*4882a593Smuzhiyunof expected window of the device so it's passed as is without special 286*4882a593Smuzhiyunhandling. ``ktls`` software fallback handles the decryption of record 287*4882a593Smuzhiyunspanning segments 1, 2 and 3. The device did not get out of sync, 288*4882a593Smuzhiyuneven though two segments did not get decrypted. 289*4882a593Smuzhiyun 290*4882a593SmuzhiyunKernel synchronization may be necessary if the lost segment contained 291*4882a593Smuzhiyuna record header and arrived after the next record header has already passed: 292*4882a593Smuzhiyun 293*4882a593Smuzhiyun.. kernel-figure:: tls-offload-reorder-bad.svg 294*4882a593Smuzhiyun :alt: reorder of header segment 295*4882a593Smuzhiyun :align: center 296*4882a593Smuzhiyun 297*4882a593Smuzhiyun Reorder of segment with a TLS header 298*4882a593Smuzhiyun 299*4882a593SmuzhiyunIn this example segment 2 gets dropped, and it contains a record header. 300*4882a593SmuzhiyunDevice can only detect that segment 4 also contains a TLS header 301*4882a593Smuzhiyunif it knows the length of the previous record from segment 2. In this case 302*4882a593Smuzhiyunthe device will lose synchronization with the stream. 303*4882a593Smuzhiyun 304*4882a593SmuzhiyunStream scan resynchronization 305*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 306*4882a593Smuzhiyun 307*4882a593SmuzhiyunWhen the device gets out of sync and the stream reaches TCP sequence 308*4882a593Smuzhiyunnumbers more than a max size record past the expected TCP sequence number, 309*4882a593Smuzhiyunthe device starts scanning for a known header pattern. For example 310*4882a593Smuzhiyunfor TLS 1.2 and TLS 1.3 subsequent bytes of value ``0x03 0x03`` occur 311*4882a593Smuzhiyunin the SSL/TLS version field of the header. Once pattern is matched 312*4882a593Smuzhiyunthe device continues attempting parsing headers at expected locations 313*4882a593Smuzhiyun(based on the length fields at guessed locations). 314*4882a593SmuzhiyunWhenever the expected location does not contain a valid header the scan 315*4882a593Smuzhiyunis restarted. 316*4882a593Smuzhiyun 317*4882a593SmuzhiyunWhen the header is matched the device sends a confirmation request 318*4882a593Smuzhiyunto the kernel, asking if the guessed location is correct (if a TLS record 319*4882a593Smuzhiyunreally starts there), and which record sequence number the given header had. 320*4882a593SmuzhiyunThe kernel confirms the guessed location was correct and tells the device 321*4882a593Smuzhiyunthe record sequence number. Meanwhile, the device had been parsing 322*4882a593Smuzhiyunand counting all records since the just-confirmed one, it adds the number 323*4882a593Smuzhiyunof records it had seen to the record number provided by the kernel. 324*4882a593SmuzhiyunAt this point the device is in sync and can resume decryption at next 325*4882a593Smuzhiyunsegment boundary. 326*4882a593Smuzhiyun 327*4882a593SmuzhiyunIn a pathological case the device may latch onto a sequence of matching 328*4882a593Smuzhiyunheaders and never hear back from the kernel (there is no negative 329*4882a593Smuzhiyunconfirmation from the kernel). The implementation may choose to periodically 330*4882a593Smuzhiyunrestart scan. Given how unlikely falsely-matching stream is, however, 331*4882a593Smuzhiyunperiodic restart is not deemed necessary. 332*4882a593Smuzhiyun 333*4882a593SmuzhiyunSpecial care has to be taken if the confirmation request is passed 334*4882a593Smuzhiyunasynchronously to the packet stream and record may get processed 335*4882a593Smuzhiyunby the kernel before the confirmation request. 336*4882a593Smuzhiyun 337*4882a593SmuzhiyunStack-driven resynchronization 338*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 339*4882a593Smuzhiyun 340*4882a593SmuzhiyunThe driver may also request the stack to perform resynchronization 341*4882a593Smuzhiyunwhenever it sees the records are no longer getting decrypted. 342*4882a593SmuzhiyunIf the connection is configured in this mode the stack automatically 343*4882a593Smuzhiyunschedules resynchronization after it has received two completely encrypted 344*4882a593Smuzhiyunrecords. 345*4882a593Smuzhiyun 346*4882a593SmuzhiyunThe stack waits for the socket to drain and informs the device about 347*4882a593Smuzhiyunthe next expected record number and its TCP sequence number. If the 348*4882a593Smuzhiyunrecords continue to be received fully encrypted stack retries the 349*4882a593Smuzhiyunsynchronization with an exponential back off (first after 2 encrypted 350*4882a593Smuzhiyunrecords, then after 4 records, after 8, after 16... up until every 351*4882a593Smuzhiyun128 records). 352*4882a593Smuzhiyun 353*4882a593SmuzhiyunError handling 354*4882a593Smuzhiyun============== 355*4882a593Smuzhiyun 356*4882a593SmuzhiyunTX 357*4882a593Smuzhiyun-- 358*4882a593Smuzhiyun 359*4882a593SmuzhiyunPackets may be redirected or rerouted by the stack to a different 360*4882a593Smuzhiyundevice than the selected TLS offload device. The stack will handle 361*4882a593Smuzhiyunsuch condition using the :c:func:`sk_validate_xmit_skb` helper 362*4882a593Smuzhiyun(TLS offload code installs :c:func:`tls_validate_xmit_skb` at this hook). 363*4882a593SmuzhiyunOffload maintains information about all records until the data is 364*4882a593Smuzhiyunfully acknowledged, so if skbs reach the wrong device they can be handled 365*4882a593Smuzhiyunby software fallback. 366*4882a593Smuzhiyun 367*4882a593SmuzhiyunAny device TLS offload handling error on the transmission side must result 368*4882a593Smuzhiyunin the packet being dropped. For example if a packet got out of order 369*4882a593Smuzhiyundue to a bug in the stack or the device, reached the device and can't 370*4882a593Smuzhiyunbe encrypted such packet must be dropped. 371*4882a593Smuzhiyun 372*4882a593SmuzhiyunRX 373*4882a593Smuzhiyun-- 374*4882a593Smuzhiyun 375*4882a593SmuzhiyunIf the device encounters any problems with TLS offload on the receive 376*4882a593Smuzhiyunside it should pass the packet to the host's networking stack as it was 377*4882a593Smuzhiyunreceived on the wire. 378*4882a593Smuzhiyun 379*4882a593SmuzhiyunFor example authentication failure for any record in the segment should 380*4882a593Smuzhiyunresult in passing the unmodified packet to the software fallback. This means 381*4882a593Smuzhiyunpackets should not be modified "in place". Splitting segments to handle partial 382*4882a593Smuzhiyundecryption is not advised. In other words either all records in the packet 383*4882a593Smuzhiyunhad been handled successfully and authenticated or the packet has to be passed 384*4882a593Smuzhiyunto the host's stack as it was on the wire (recovering original packet in the 385*4882a593Smuzhiyundriver if device provides precise error is sufficient). 386*4882a593Smuzhiyun 387*4882a593SmuzhiyunThe Linux networking stack does not provide a way of reporting per-packet 388*4882a593Smuzhiyundecryption and authentication errors, packets with errors must simply not 389*4882a593Smuzhiyunhave the :c:member:`decrypted` mark set. 390*4882a593Smuzhiyun 391*4882a593SmuzhiyunA packet should also not be handled by the TLS offload if it contains 392*4882a593Smuzhiyunincorrect checksums. 393*4882a593Smuzhiyun 394*4882a593SmuzhiyunPerformance metrics 395*4882a593Smuzhiyun=================== 396*4882a593Smuzhiyun 397*4882a593SmuzhiyunTLS offload can be characterized by the following basic metrics: 398*4882a593Smuzhiyun 399*4882a593Smuzhiyun * max connection count 400*4882a593Smuzhiyun * connection installation rate 401*4882a593Smuzhiyun * connection installation latency 402*4882a593Smuzhiyun * total cryptographic performance 403*4882a593Smuzhiyun 404*4882a593SmuzhiyunNote that each TCP connection requires a TLS session in both directions, 405*4882a593Smuzhiyunthe performance may be reported treating each direction separately. 406*4882a593Smuzhiyun 407*4882a593SmuzhiyunMax connection count 408*4882a593Smuzhiyun-------------------- 409*4882a593Smuzhiyun 410*4882a593SmuzhiyunThe number of connections device can support can be exposed via 411*4882a593Smuzhiyun``devlink resource`` API. 412*4882a593Smuzhiyun 413*4882a593SmuzhiyunTotal cryptographic performance 414*4882a593Smuzhiyun------------------------------- 415*4882a593Smuzhiyun 416*4882a593SmuzhiyunOffload performance may depend on segment and record size. 417*4882a593Smuzhiyun 418*4882a593SmuzhiyunOverload of the cryptographic subsystem of the device should not have 419*4882a593Smuzhiyunsignificant performance impact on non-offloaded streams. 420*4882a593Smuzhiyun 421*4882a593SmuzhiyunStatistics 422*4882a593Smuzhiyun========== 423*4882a593Smuzhiyun 424*4882a593SmuzhiyunFollowing minimum set of TLS-related statistics should be reported 425*4882a593Smuzhiyunby the driver: 426*4882a593Smuzhiyun 427*4882a593Smuzhiyun * ``rx_tls_decrypted_packets`` - number of successfully decrypted RX packets 428*4882a593Smuzhiyun which were part of a TLS stream. 429*4882a593Smuzhiyun * ``rx_tls_decrypted_bytes`` - number of TLS payload bytes in RX packets 430*4882a593Smuzhiyun which were successfully decrypted. 431*4882a593Smuzhiyun * ``rx_tls_ctx`` - number of TLS RX HW offload contexts added to device for 432*4882a593Smuzhiyun decryption. 433*4882a593Smuzhiyun * ``rx_tls_del`` - number of TLS RX HW offload contexts deleted from device 434*4882a593Smuzhiyun (connection has finished). 435*4882a593Smuzhiyun * ``rx_tls_resync_req_pkt`` - number of received TLS packets with a resync 436*4882a593Smuzhiyun request. 437*4882a593Smuzhiyun * ``rx_tls_resync_req_start`` - number of times the TLS async resync request 438*4882a593Smuzhiyun was started. 439*4882a593Smuzhiyun * ``rx_tls_resync_req_end`` - number of times the TLS async resync request 440*4882a593Smuzhiyun properly ended with providing the HW tracked tcp-seq. 441*4882a593Smuzhiyun * ``rx_tls_resync_req_skip`` - number of times the TLS async resync request 442*4882a593Smuzhiyun procedure was started by not properly ended. 443*4882a593Smuzhiyun * ``rx_tls_resync_res_ok`` - number of times the TLS resync response call to 444*4882a593Smuzhiyun the driver was successfully handled. 445*4882a593Smuzhiyun * ``rx_tls_resync_res_skip`` - number of times the TLS resync response call to 446*4882a593Smuzhiyun the driver was terminated unsuccessfully. 447*4882a593Smuzhiyun * ``rx_tls_err`` - number of RX packets which were part of a TLS stream 448*4882a593Smuzhiyun but were not decrypted due to unexpected error in the state machine. 449*4882a593Smuzhiyun * ``tx_tls_encrypted_packets`` - number of TX packets passed to the device 450*4882a593Smuzhiyun for encryption of their TLS payload. 451*4882a593Smuzhiyun * ``tx_tls_encrypted_bytes`` - number of TLS payload bytes in TX packets 452*4882a593Smuzhiyun passed to the device for encryption. 453*4882a593Smuzhiyun * ``tx_tls_ctx`` - number of TLS TX HW offload contexts added to device for 454*4882a593Smuzhiyun encryption. 455*4882a593Smuzhiyun * ``tx_tls_ooo`` - number of TX packets which were part of a TLS stream 456*4882a593Smuzhiyun but did not arrive in the expected order. 457*4882a593Smuzhiyun * ``tx_tls_skip_no_sync_data`` - number of TX packets which were part of 458*4882a593Smuzhiyun a TLS stream and arrived out-of-order, but skipped the HW offload routine 459*4882a593Smuzhiyun and went to the regular transmit flow as they were retransmissions of the 460*4882a593Smuzhiyun connection handshake. 461*4882a593Smuzhiyun * ``tx_tls_drop_no_sync_data`` - number of TX packets which were part of 462*4882a593Smuzhiyun a TLS stream dropped, because they arrived out of order and associated 463*4882a593Smuzhiyun record could not be found. 464*4882a593Smuzhiyun * ``tx_tls_drop_bypass_req`` - number of TX packets which were part of a TLS 465*4882a593Smuzhiyun stream dropped, because they contain both data that has been encrypted by 466*4882a593Smuzhiyun software and data that expects hardware crypto offload. 467*4882a593Smuzhiyun 468*4882a593SmuzhiyunNotable corner cases, exceptions and additional requirements 469*4882a593Smuzhiyun============================================================ 470*4882a593Smuzhiyun 471*4882a593Smuzhiyun.. _5tuple_problems: 472*4882a593Smuzhiyun 473*4882a593Smuzhiyun5-tuple matching limitations 474*4882a593Smuzhiyun---------------------------- 475*4882a593Smuzhiyun 476*4882a593SmuzhiyunThe device can only recognize received packets based on the 5-tuple 477*4882a593Smuzhiyunof the socket. Current ``ktls`` implementation will not offload sockets 478*4882a593Smuzhiyunrouted through software interfaces such as those used for tunneling 479*4882a593Smuzhiyunor virtual networking. However, many packet transformations performed 480*4882a593Smuzhiyunby the networking stack (most notably any BPF logic) do not require 481*4882a593Smuzhiyunany intermediate software device, therefore a 5-tuple match may 482*4882a593Smuzhiyunconsistently miss at the device level. In such cases the device 483*4882a593Smuzhiyunshould still be able to perform TX offload (encryption) and should 484*4882a593Smuzhiyunfallback cleanly to software decryption (RX). 485*4882a593Smuzhiyun 486*4882a593SmuzhiyunOut of order 487*4882a593Smuzhiyun------------ 488*4882a593Smuzhiyun 489*4882a593SmuzhiyunIntroducing extra processing in NICs should not cause packets to be 490*4882a593Smuzhiyuntransmitted or received out of order, for example pure ACK packets 491*4882a593Smuzhiyunshould not be reordered with respect to data segments. 492*4882a593Smuzhiyun 493*4882a593SmuzhiyunIngress reorder 494*4882a593Smuzhiyun--------------- 495*4882a593Smuzhiyun 496*4882a593SmuzhiyunA device is permitted to perform packet reordering for consecutive 497*4882a593SmuzhiyunTCP segments (i.e. placing packets in the correct order) but any form 498*4882a593Smuzhiyunof additional buffering is disallowed. 499*4882a593Smuzhiyun 500*4882a593SmuzhiyunCoexistence with standard networking offload features 501*4882a593Smuzhiyun----------------------------------------------------- 502*4882a593Smuzhiyun 503*4882a593SmuzhiyunOffloaded ``ktls`` sockets should support standard TCP stack features 504*4882a593Smuzhiyuntransparently. Enabling device TLS offload should not cause any difference 505*4882a593Smuzhiyunin packets as seen on the wire. 506*4882a593Smuzhiyun 507*4882a593SmuzhiyunTransport layer transparency 508*4882a593Smuzhiyun---------------------------- 509*4882a593Smuzhiyun 510*4882a593SmuzhiyunThe device should not modify any packet headers for the purpose 511*4882a593Smuzhiyunof the simplifying TLS offload. 512*4882a593Smuzhiyun 513*4882a593SmuzhiyunThe device should not depend on any packet headers beyond what is strictly 514*4882a593Smuzhiyunnecessary for TLS offload. 515*4882a593Smuzhiyun 516*4882a593SmuzhiyunSegment drops 517*4882a593Smuzhiyun------------- 518*4882a593Smuzhiyun 519*4882a593SmuzhiyunDropping packets is acceptable only in the event of catastrophic 520*4882a593Smuzhiyunsystem errors and should never be used as an error handling mechanism 521*4882a593Smuzhiyunin cases arising from normal operation. In other words, reliance 522*4882a593Smuzhiyunon TCP retransmissions to handle corner cases is not acceptable. 523*4882a593Smuzhiyun 524*4882a593SmuzhiyunTLS device features 525*4882a593Smuzhiyun------------------- 526*4882a593Smuzhiyun 527*4882a593SmuzhiyunDrivers should ignore the changes to TLS the device feature flags. 528*4882a593SmuzhiyunThese flags will be acted upon accordingly by the core ``ktls`` code. 529*4882a593SmuzhiyunTLS device feature flags only control adding of new TLS connection 530*4882a593Smuzhiyunoffloads, old connections will remain active after flags are cleared. 531