1*4882a593Smuzhiyun.. _kernel_tls: 2*4882a593Smuzhiyun 3*4882a593Smuzhiyun========== 4*4882a593SmuzhiyunKernel TLS 5*4882a593Smuzhiyun========== 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunOverview 8*4882a593Smuzhiyun======== 9*4882a593Smuzhiyun 10*4882a593SmuzhiyunTransport Layer Security (TLS) is a Upper Layer Protocol (ULP) that runs over 11*4882a593SmuzhiyunTCP. TLS provides end-to-end data integrity and confidentiality. 12*4882a593Smuzhiyun 13*4882a593SmuzhiyunUser interface 14*4882a593Smuzhiyun============== 15*4882a593Smuzhiyun 16*4882a593SmuzhiyunCreating a TLS connection 17*4882a593Smuzhiyun------------------------- 18*4882a593Smuzhiyun 19*4882a593SmuzhiyunFirst create a new TCP socket and set the TLS ULP. 20*4882a593Smuzhiyun 21*4882a593Smuzhiyun.. code-block:: c 22*4882a593Smuzhiyun 23*4882a593Smuzhiyun sock = socket(AF_INET, SOCK_STREAM, 0); 24*4882a593Smuzhiyun setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls")); 25*4882a593Smuzhiyun 26*4882a593SmuzhiyunSetting the TLS ULP allows us to set/get TLS socket options. Currently 27*4882a593Smuzhiyunonly the symmetric encryption is handled in the kernel. After the TLS 28*4882a593Smuzhiyunhandshake is complete, we have all the parameters required to move the 29*4882a593Smuzhiyundata-path to the kernel. There is a separate socket option for moving 30*4882a593Smuzhiyunthe transmit and the receive into the kernel. 31*4882a593Smuzhiyun 32*4882a593Smuzhiyun.. code-block:: c 33*4882a593Smuzhiyun 34*4882a593Smuzhiyun /* From linux/tls.h */ 35*4882a593Smuzhiyun struct tls_crypto_info { 36*4882a593Smuzhiyun unsigned short version; 37*4882a593Smuzhiyun unsigned short cipher_type; 38*4882a593Smuzhiyun }; 39*4882a593Smuzhiyun 40*4882a593Smuzhiyun struct tls12_crypto_info_aes_gcm_128 { 41*4882a593Smuzhiyun struct tls_crypto_info info; 42*4882a593Smuzhiyun unsigned char iv[TLS_CIPHER_AES_GCM_128_IV_SIZE]; 43*4882a593Smuzhiyun unsigned char key[TLS_CIPHER_AES_GCM_128_KEY_SIZE]; 44*4882a593Smuzhiyun unsigned char salt[TLS_CIPHER_AES_GCM_128_SALT_SIZE]; 45*4882a593Smuzhiyun unsigned char rec_seq[TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE]; 46*4882a593Smuzhiyun }; 47*4882a593Smuzhiyun 48*4882a593Smuzhiyun 49*4882a593Smuzhiyun struct tls12_crypto_info_aes_gcm_128 crypto_info; 50*4882a593Smuzhiyun 51*4882a593Smuzhiyun crypto_info.info.version = TLS_1_2_VERSION; 52*4882a593Smuzhiyun crypto_info.info.cipher_type = TLS_CIPHER_AES_GCM_128; 53*4882a593Smuzhiyun memcpy(crypto_info.iv, iv_write, TLS_CIPHER_AES_GCM_128_IV_SIZE); 54*4882a593Smuzhiyun memcpy(crypto_info.rec_seq, seq_number_write, 55*4882a593Smuzhiyun TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE); 56*4882a593Smuzhiyun memcpy(crypto_info.key, cipher_key_write, TLS_CIPHER_AES_GCM_128_KEY_SIZE); 57*4882a593Smuzhiyun memcpy(crypto_info.salt, implicit_iv_write, TLS_CIPHER_AES_GCM_128_SALT_SIZE); 58*4882a593Smuzhiyun 59*4882a593Smuzhiyun setsockopt(sock, SOL_TLS, TLS_TX, &crypto_info, sizeof(crypto_info)); 60*4882a593Smuzhiyun 61*4882a593SmuzhiyunTransmit and receive are set separately, but the setup is the same, using either 62*4882a593SmuzhiyunTLS_TX or TLS_RX. 63*4882a593Smuzhiyun 64*4882a593SmuzhiyunSending TLS application data 65*4882a593Smuzhiyun---------------------------- 66*4882a593Smuzhiyun 67*4882a593SmuzhiyunAfter setting the TLS_TX socket option all application data sent over this 68*4882a593Smuzhiyunsocket is encrypted using TLS and the parameters provided in the socket option. 69*4882a593SmuzhiyunFor example, we can send an encrypted hello world record as follows: 70*4882a593Smuzhiyun 71*4882a593Smuzhiyun.. code-block:: c 72*4882a593Smuzhiyun 73*4882a593Smuzhiyun const char *msg = "hello world\n"; 74*4882a593Smuzhiyun send(sock, msg, strlen(msg)); 75*4882a593Smuzhiyun 76*4882a593Smuzhiyunsend() data is directly encrypted from the userspace buffer provided 77*4882a593Smuzhiyunto the encrypted kernel send buffer if possible. 78*4882a593Smuzhiyun 79*4882a593SmuzhiyunThe sendfile system call will send the file's data over TLS records of maximum 80*4882a593Smuzhiyunlength (2^14). 81*4882a593Smuzhiyun 82*4882a593Smuzhiyun.. code-block:: c 83*4882a593Smuzhiyun 84*4882a593Smuzhiyun file = open(filename, O_RDONLY); 85*4882a593Smuzhiyun fstat(file, &stat); 86*4882a593Smuzhiyun sendfile(sock, file, &offset, stat.st_size); 87*4882a593Smuzhiyun 88*4882a593SmuzhiyunTLS records are created and sent after each send() call, unless 89*4882a593SmuzhiyunMSG_MORE is passed. MSG_MORE will delay creation of a record until 90*4882a593SmuzhiyunMSG_MORE is not passed, or the maximum record size is reached. 91*4882a593Smuzhiyun 92*4882a593SmuzhiyunThe kernel will need to allocate a buffer for the encrypted data. 93*4882a593SmuzhiyunThis buffer is allocated at the time send() is called, such that 94*4882a593Smuzhiyuneither the entire send() call will return -ENOMEM (or block waiting 95*4882a593Smuzhiyunfor memory), or the encryption will always succeed. If send() returns 96*4882a593Smuzhiyun-ENOMEM and some data was left on the socket buffer from a previous 97*4882a593Smuzhiyuncall using MSG_MORE, the MSG_MORE data is left on the socket buffer. 98*4882a593Smuzhiyun 99*4882a593SmuzhiyunReceiving TLS application data 100*4882a593Smuzhiyun------------------------------ 101*4882a593Smuzhiyun 102*4882a593SmuzhiyunAfter setting the TLS_RX socket option, all recv family socket calls 103*4882a593Smuzhiyunare decrypted using TLS parameters provided. A full TLS record must 104*4882a593Smuzhiyunbe received before decryption can happen. 105*4882a593Smuzhiyun 106*4882a593Smuzhiyun.. code-block:: c 107*4882a593Smuzhiyun 108*4882a593Smuzhiyun char buffer[16384]; 109*4882a593Smuzhiyun recv(sock, buffer, 16384); 110*4882a593Smuzhiyun 111*4882a593SmuzhiyunReceived data is decrypted directly in to the user buffer if it is 112*4882a593Smuzhiyunlarge enough, and no additional allocations occur. If the userspace 113*4882a593Smuzhiyunbuffer is too small, data is decrypted in the kernel and copied to 114*4882a593Smuzhiyunuserspace. 115*4882a593Smuzhiyun 116*4882a593Smuzhiyun``EINVAL`` is returned if the TLS version in the received message does not 117*4882a593Smuzhiyunmatch the version passed in setsockopt. 118*4882a593Smuzhiyun 119*4882a593Smuzhiyun``EMSGSIZE`` is returned if the received message is too big. 120*4882a593Smuzhiyun 121*4882a593Smuzhiyun``EBADMSG`` is returned if decryption failed for any other reason. 122*4882a593Smuzhiyun 123*4882a593SmuzhiyunSend TLS control messages 124*4882a593Smuzhiyun------------------------- 125*4882a593Smuzhiyun 126*4882a593SmuzhiyunOther than application data, TLS has control messages such as alert 127*4882a593Smuzhiyunmessages (record type 21) and handshake messages (record type 22), etc. 128*4882a593SmuzhiyunThese messages can be sent over the socket by providing the TLS record type 129*4882a593Smuzhiyunvia a CMSG. For example the following function sends @data of @length bytes 130*4882a593Smuzhiyunusing a record of type @record_type. 131*4882a593Smuzhiyun 132*4882a593Smuzhiyun.. code-block:: c 133*4882a593Smuzhiyun 134*4882a593Smuzhiyun /* send TLS control message using record_type */ 135*4882a593Smuzhiyun static int klts_send_ctrl_message(int sock, unsigned char record_type, 136*4882a593Smuzhiyun void *data, size_t length) 137*4882a593Smuzhiyun { 138*4882a593Smuzhiyun struct msghdr msg = {0}; 139*4882a593Smuzhiyun int cmsg_len = sizeof(record_type); 140*4882a593Smuzhiyun struct cmsghdr *cmsg; 141*4882a593Smuzhiyun char buf[CMSG_SPACE(cmsg_len)]; 142*4882a593Smuzhiyun struct iovec msg_iov; /* Vector of data to send/receive into. */ 143*4882a593Smuzhiyun 144*4882a593Smuzhiyun msg.msg_control = buf; 145*4882a593Smuzhiyun msg.msg_controllen = sizeof(buf); 146*4882a593Smuzhiyun cmsg = CMSG_FIRSTHDR(&msg); 147*4882a593Smuzhiyun cmsg->cmsg_level = SOL_TLS; 148*4882a593Smuzhiyun cmsg->cmsg_type = TLS_SET_RECORD_TYPE; 149*4882a593Smuzhiyun cmsg->cmsg_len = CMSG_LEN(cmsg_len); 150*4882a593Smuzhiyun *CMSG_DATA(cmsg) = record_type; 151*4882a593Smuzhiyun msg.msg_controllen = cmsg->cmsg_len; 152*4882a593Smuzhiyun 153*4882a593Smuzhiyun msg_iov.iov_base = data; 154*4882a593Smuzhiyun msg_iov.iov_len = length; 155*4882a593Smuzhiyun msg.msg_iov = &msg_iov; 156*4882a593Smuzhiyun msg.msg_iovlen = 1; 157*4882a593Smuzhiyun 158*4882a593Smuzhiyun return sendmsg(sock, &msg, 0); 159*4882a593Smuzhiyun } 160*4882a593Smuzhiyun 161*4882a593SmuzhiyunControl message data should be provided unencrypted, and will be 162*4882a593Smuzhiyunencrypted by the kernel. 163*4882a593Smuzhiyun 164*4882a593SmuzhiyunReceiving TLS control messages 165*4882a593Smuzhiyun------------------------------ 166*4882a593Smuzhiyun 167*4882a593SmuzhiyunTLS control messages are passed in the userspace buffer, with message 168*4882a593Smuzhiyuntype passed via cmsg. If no cmsg buffer is provided, an error is 169*4882a593Smuzhiyunreturned if a control message is received. Data messages may be 170*4882a593Smuzhiyunreceived without a cmsg buffer set. 171*4882a593Smuzhiyun 172*4882a593Smuzhiyun.. code-block:: c 173*4882a593Smuzhiyun 174*4882a593Smuzhiyun char buffer[16384]; 175*4882a593Smuzhiyun char cmsg[CMSG_SPACE(sizeof(unsigned char))]; 176*4882a593Smuzhiyun struct msghdr msg = {0}; 177*4882a593Smuzhiyun msg.msg_control = cmsg; 178*4882a593Smuzhiyun msg.msg_controllen = sizeof(cmsg); 179*4882a593Smuzhiyun 180*4882a593Smuzhiyun struct iovec msg_iov; 181*4882a593Smuzhiyun msg_iov.iov_base = buffer; 182*4882a593Smuzhiyun msg_iov.iov_len = 16384; 183*4882a593Smuzhiyun 184*4882a593Smuzhiyun msg.msg_iov = &msg_iov; 185*4882a593Smuzhiyun msg.msg_iovlen = 1; 186*4882a593Smuzhiyun 187*4882a593Smuzhiyun int ret = recvmsg(sock, &msg, 0 /* flags */); 188*4882a593Smuzhiyun 189*4882a593Smuzhiyun struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg); 190*4882a593Smuzhiyun if (cmsg->cmsg_level == SOL_TLS && 191*4882a593Smuzhiyun cmsg->cmsg_type == TLS_GET_RECORD_TYPE) { 192*4882a593Smuzhiyun int record_type = *((unsigned char *)CMSG_DATA(cmsg)); 193*4882a593Smuzhiyun // Do something with record_type, and control message data in 194*4882a593Smuzhiyun // buffer. 195*4882a593Smuzhiyun // 196*4882a593Smuzhiyun // Note that record_type may be == to application data (23). 197*4882a593Smuzhiyun } else { 198*4882a593Smuzhiyun // Buffer contains application data. 199*4882a593Smuzhiyun } 200*4882a593Smuzhiyun 201*4882a593Smuzhiyunrecv will never return data from mixed types of TLS records. 202*4882a593Smuzhiyun 203*4882a593SmuzhiyunIntegrating in to userspace TLS library 204*4882a593Smuzhiyun--------------------------------------- 205*4882a593Smuzhiyun 206*4882a593SmuzhiyunAt a high level, the kernel TLS ULP is a replacement for the record 207*4882a593Smuzhiyunlayer of a userspace TLS library. 208*4882a593Smuzhiyun 209*4882a593SmuzhiyunA patchset to OpenSSL to use ktls as the record layer is 210*4882a593Smuzhiyun`here <https://github.com/Mellanox/openssl/commits/tls_rx2>`_. 211*4882a593Smuzhiyun 212*4882a593Smuzhiyun`An example <https://github.com/ktls/af_ktls-tool/commits/RX>`_ 213*4882a593Smuzhiyunof calling send directly after a handshake using gnutls. 214*4882a593SmuzhiyunSince it doesn't implement a full record layer, control 215*4882a593Smuzhiyunmessages are not supported. 216*4882a593Smuzhiyun 217*4882a593SmuzhiyunStatistics 218*4882a593Smuzhiyun========== 219*4882a593Smuzhiyun 220*4882a593SmuzhiyunTLS implementation exposes the following per-namespace statistics 221*4882a593Smuzhiyun(``/proc/net/tls_stat``): 222*4882a593Smuzhiyun 223*4882a593Smuzhiyun- ``TlsCurrTxSw``, ``TlsCurrRxSw`` - 224*4882a593Smuzhiyun number of TX and RX sessions currently installed where host handles 225*4882a593Smuzhiyun cryptography 226*4882a593Smuzhiyun 227*4882a593Smuzhiyun- ``TlsCurrTxDevice``, ``TlsCurrRxDevice`` - 228*4882a593Smuzhiyun number of TX and RX sessions currently installed where NIC handles 229*4882a593Smuzhiyun cryptography 230*4882a593Smuzhiyun 231*4882a593Smuzhiyun- ``TlsTxSw``, ``TlsRxSw`` - 232*4882a593Smuzhiyun number of TX and RX sessions opened with host cryptography 233*4882a593Smuzhiyun 234*4882a593Smuzhiyun- ``TlsTxDevice``, ``TlsRxDevice`` - 235*4882a593Smuzhiyun number of TX and RX sessions opened with NIC cryptography 236*4882a593Smuzhiyun 237*4882a593Smuzhiyun- ``TlsDecryptError`` - 238*4882a593Smuzhiyun record decryption failed (e.g. due to incorrect authentication tag) 239*4882a593Smuzhiyun 240*4882a593Smuzhiyun- ``TlsDeviceRxResync`` - 241*4882a593Smuzhiyun number of RX resyncs sent to NICs handling cryptography 242