1*4882a593Smuzhiyun============ 2*4882a593SmuzhiyunSNMP counter 3*4882a593Smuzhiyun============ 4*4882a593Smuzhiyun 5*4882a593SmuzhiyunThis document explains the meaning of SNMP counters. 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunGeneral IPv4 counters 8*4882a593Smuzhiyun===================== 9*4882a593SmuzhiyunAll layer 4 packets and ICMP packets will change these counters, but 10*4882a593Smuzhiyunthese counters won't be changed by layer 2 packets (such as STP) or 11*4882a593SmuzhiyunARP packets. 12*4882a593Smuzhiyun 13*4882a593Smuzhiyun* IpInReceives 14*4882a593Smuzhiyun 15*4882a593SmuzhiyunDefined in `RFC1213 ipInReceives`_ 16*4882a593Smuzhiyun 17*4882a593Smuzhiyun.. _RFC1213 ipInReceives: https://tools.ietf.org/html/rfc1213#page-26 18*4882a593Smuzhiyun 19*4882a593SmuzhiyunThe number of packets received by the IP layer. It gets increasing at the 20*4882a593Smuzhiyunbeginning of ip_rcv function, always be updated together with 21*4882a593SmuzhiyunIpExtInOctets. It will be increased even if the packet is dropped 22*4882a593Smuzhiyunlater (e.g. due to the IP header is invalid or the checksum is wrong 23*4882a593Smuzhiyunand so on). It indicates the number of aggregated segments after 24*4882a593SmuzhiyunGRO/LRO. 25*4882a593Smuzhiyun 26*4882a593Smuzhiyun* IpInDelivers 27*4882a593Smuzhiyun 28*4882a593SmuzhiyunDefined in `RFC1213 ipInDelivers`_ 29*4882a593Smuzhiyun 30*4882a593Smuzhiyun.. _RFC1213 ipInDelivers: https://tools.ietf.org/html/rfc1213#page-28 31*4882a593Smuzhiyun 32*4882a593SmuzhiyunThe number of packets delivers to the upper layer protocols. E.g. TCP, UDP, 33*4882a593SmuzhiyunICMP and so on. If no one listens on a raw socket, only kernel 34*4882a593Smuzhiyunsupported protocols will be delivered, if someone listens on the raw 35*4882a593Smuzhiyunsocket, all valid IP packets will be delivered. 36*4882a593Smuzhiyun 37*4882a593Smuzhiyun* IpOutRequests 38*4882a593Smuzhiyun 39*4882a593SmuzhiyunDefined in `RFC1213 ipOutRequests`_ 40*4882a593Smuzhiyun 41*4882a593Smuzhiyun.. _RFC1213 ipOutRequests: https://tools.ietf.org/html/rfc1213#page-28 42*4882a593Smuzhiyun 43*4882a593SmuzhiyunThe number of packets sent via IP layer, for both single cast and 44*4882a593Smuzhiyunmulticast packets, and would always be updated together with 45*4882a593SmuzhiyunIpExtOutOctets. 46*4882a593Smuzhiyun 47*4882a593Smuzhiyun* IpExtInOctets and IpExtOutOctets 48*4882a593Smuzhiyun 49*4882a593SmuzhiyunThey are Linux kernel extensions, no RFC definitions. Please note, 50*4882a593SmuzhiyunRFC1213 indeed defines ifInOctets and ifOutOctets, but they 51*4882a593Smuzhiyunare different things. The ifInOctets and ifOutOctets include the MAC 52*4882a593Smuzhiyunlayer header size but IpExtInOctets and IpExtOutOctets don't, they 53*4882a593Smuzhiyunonly include the IP layer header and the IP layer data. 54*4882a593Smuzhiyun 55*4882a593Smuzhiyun* IpExtInNoECTPkts, IpExtInECT1Pkts, IpExtInECT0Pkts, IpExtInCEPkts 56*4882a593Smuzhiyun 57*4882a593SmuzhiyunThey indicate the number of four kinds of ECN IP packets, please refer 58*4882a593Smuzhiyun`Explicit Congestion Notification`_ for more details. 59*4882a593Smuzhiyun 60*4882a593Smuzhiyun.. _Explicit Congestion Notification: https://tools.ietf.org/html/rfc3168#page-6 61*4882a593Smuzhiyun 62*4882a593SmuzhiyunThese 4 counters calculate how many packets received per ECN 63*4882a593Smuzhiyunstatus. They count the real frame number regardless the LRO/GRO. So 64*4882a593Smuzhiyunfor the same packet, you might find that IpInReceives count 1, but 65*4882a593SmuzhiyunIpExtInNoECTPkts counts 2 or more. 66*4882a593Smuzhiyun 67*4882a593Smuzhiyun* IpInHdrErrors 68*4882a593Smuzhiyun 69*4882a593SmuzhiyunDefined in `RFC1213 ipInHdrErrors`_. It indicates the packet is 70*4882a593Smuzhiyundropped due to the IP header error. It might happen in both IP input 71*4882a593Smuzhiyunand IP forward paths. 72*4882a593Smuzhiyun 73*4882a593Smuzhiyun.. _RFC1213 ipInHdrErrors: https://tools.ietf.org/html/rfc1213#page-27 74*4882a593Smuzhiyun 75*4882a593Smuzhiyun* IpInAddrErrors 76*4882a593Smuzhiyun 77*4882a593SmuzhiyunDefined in `RFC1213 ipInAddrErrors`_. It will be increased in two 78*4882a593Smuzhiyunscenarios: (1) The IP address is invalid. (2) The destination IP 79*4882a593Smuzhiyunaddress is not a local address and IP forwarding is not enabled 80*4882a593Smuzhiyun 81*4882a593Smuzhiyun.. _RFC1213 ipInAddrErrors: https://tools.ietf.org/html/rfc1213#page-27 82*4882a593Smuzhiyun 83*4882a593Smuzhiyun* IpExtInNoRoutes 84*4882a593Smuzhiyun 85*4882a593SmuzhiyunThis counter means the packet is dropped when the IP stack receives a 86*4882a593Smuzhiyunpacket and can't find a route for it from the route table. It might 87*4882a593Smuzhiyunhappen when IP forwarding is enabled and the destination IP address is 88*4882a593Smuzhiyunnot a local address and there is no route for the destination IP 89*4882a593Smuzhiyunaddress. 90*4882a593Smuzhiyun 91*4882a593Smuzhiyun* IpInUnknownProtos 92*4882a593Smuzhiyun 93*4882a593SmuzhiyunDefined in `RFC1213 ipInUnknownProtos`_. It will be increased if the 94*4882a593Smuzhiyunlayer 4 protocol is unsupported by kernel. If an application is using 95*4882a593Smuzhiyunraw socket, kernel will always deliver the packet to the raw socket 96*4882a593Smuzhiyunand this counter won't be increased. 97*4882a593Smuzhiyun 98*4882a593Smuzhiyun.. _RFC1213 ipInUnknownProtos: https://tools.ietf.org/html/rfc1213#page-27 99*4882a593Smuzhiyun 100*4882a593Smuzhiyun* IpExtInTruncatedPkts 101*4882a593Smuzhiyun 102*4882a593SmuzhiyunFor IPv4 packet, it means the actual data size is smaller than the 103*4882a593Smuzhiyun"Total Length" field in the IPv4 header. 104*4882a593Smuzhiyun 105*4882a593Smuzhiyun* IpInDiscards 106*4882a593Smuzhiyun 107*4882a593SmuzhiyunDefined in `RFC1213 ipInDiscards`_. It indicates the packet is dropped 108*4882a593Smuzhiyunin the IP receiving path and due to kernel internal reasons (e.g. no 109*4882a593Smuzhiyunenough memory). 110*4882a593Smuzhiyun 111*4882a593Smuzhiyun.. _RFC1213 ipInDiscards: https://tools.ietf.org/html/rfc1213#page-28 112*4882a593Smuzhiyun 113*4882a593Smuzhiyun* IpOutDiscards 114*4882a593Smuzhiyun 115*4882a593SmuzhiyunDefined in `RFC1213 ipOutDiscards`_. It indicates the packet is 116*4882a593Smuzhiyundropped in the IP sending path and due to kernel internal reasons. 117*4882a593Smuzhiyun 118*4882a593Smuzhiyun.. _RFC1213 ipOutDiscards: https://tools.ietf.org/html/rfc1213#page-28 119*4882a593Smuzhiyun 120*4882a593Smuzhiyun* IpOutNoRoutes 121*4882a593Smuzhiyun 122*4882a593SmuzhiyunDefined in `RFC1213 ipOutNoRoutes`_. It indicates the packet is 123*4882a593Smuzhiyundropped in the IP sending path and no route is found for it. 124*4882a593Smuzhiyun 125*4882a593Smuzhiyun.. _RFC1213 ipOutNoRoutes: https://tools.ietf.org/html/rfc1213#page-29 126*4882a593Smuzhiyun 127*4882a593SmuzhiyunICMP counters 128*4882a593Smuzhiyun============= 129*4882a593Smuzhiyun* IcmpInMsgs and IcmpOutMsgs 130*4882a593Smuzhiyun 131*4882a593SmuzhiyunDefined by `RFC1213 icmpInMsgs`_ and `RFC1213 icmpOutMsgs`_ 132*4882a593Smuzhiyun 133*4882a593Smuzhiyun.. _RFC1213 icmpInMsgs: https://tools.ietf.org/html/rfc1213#page-41 134*4882a593Smuzhiyun.. _RFC1213 icmpOutMsgs: https://tools.ietf.org/html/rfc1213#page-43 135*4882a593Smuzhiyun 136*4882a593SmuzhiyunAs mentioned in the RFC1213, these two counters include errors, they 137*4882a593Smuzhiyunwould be increased even if the ICMP packet has an invalid type. The 138*4882a593SmuzhiyunICMP output path will check the header of a raw socket, so the 139*4882a593SmuzhiyunIcmpOutMsgs would still be updated if the IP header is constructed by 140*4882a593Smuzhiyuna userspace program. 141*4882a593Smuzhiyun 142*4882a593Smuzhiyun* ICMP named types 143*4882a593Smuzhiyun 144*4882a593Smuzhiyun| These counters include most of common ICMP types, they are: 145*4882a593Smuzhiyun| IcmpInDestUnreachs: `RFC1213 icmpInDestUnreachs`_ 146*4882a593Smuzhiyun| IcmpInTimeExcds: `RFC1213 icmpInTimeExcds`_ 147*4882a593Smuzhiyun| IcmpInParmProbs: `RFC1213 icmpInParmProbs`_ 148*4882a593Smuzhiyun| IcmpInSrcQuenchs: `RFC1213 icmpInSrcQuenchs`_ 149*4882a593Smuzhiyun| IcmpInRedirects: `RFC1213 icmpInRedirects`_ 150*4882a593Smuzhiyun| IcmpInEchos: `RFC1213 icmpInEchos`_ 151*4882a593Smuzhiyun| IcmpInEchoReps: `RFC1213 icmpInEchoReps`_ 152*4882a593Smuzhiyun| IcmpInTimestamps: `RFC1213 icmpInTimestamps`_ 153*4882a593Smuzhiyun| IcmpInTimestampReps: `RFC1213 icmpInTimestampReps`_ 154*4882a593Smuzhiyun| IcmpInAddrMasks: `RFC1213 icmpInAddrMasks`_ 155*4882a593Smuzhiyun| IcmpInAddrMaskReps: `RFC1213 icmpInAddrMaskReps`_ 156*4882a593Smuzhiyun| IcmpOutDestUnreachs: `RFC1213 icmpOutDestUnreachs`_ 157*4882a593Smuzhiyun| IcmpOutTimeExcds: `RFC1213 icmpOutTimeExcds`_ 158*4882a593Smuzhiyun| IcmpOutParmProbs: `RFC1213 icmpOutParmProbs`_ 159*4882a593Smuzhiyun| IcmpOutSrcQuenchs: `RFC1213 icmpOutSrcQuenchs`_ 160*4882a593Smuzhiyun| IcmpOutRedirects: `RFC1213 icmpOutRedirects`_ 161*4882a593Smuzhiyun| IcmpOutEchos: `RFC1213 icmpOutEchos`_ 162*4882a593Smuzhiyun| IcmpOutEchoReps: `RFC1213 icmpOutEchoReps`_ 163*4882a593Smuzhiyun| IcmpOutTimestamps: `RFC1213 icmpOutTimestamps`_ 164*4882a593Smuzhiyun| IcmpOutTimestampReps: `RFC1213 icmpOutTimestampReps`_ 165*4882a593Smuzhiyun| IcmpOutAddrMasks: `RFC1213 icmpOutAddrMasks`_ 166*4882a593Smuzhiyun| IcmpOutAddrMaskReps: `RFC1213 icmpOutAddrMaskReps`_ 167*4882a593Smuzhiyun 168*4882a593Smuzhiyun.. _RFC1213 icmpInDestUnreachs: https://tools.ietf.org/html/rfc1213#page-41 169*4882a593Smuzhiyun.. _RFC1213 icmpInTimeExcds: https://tools.ietf.org/html/rfc1213#page-41 170*4882a593Smuzhiyun.. _RFC1213 icmpInParmProbs: https://tools.ietf.org/html/rfc1213#page-42 171*4882a593Smuzhiyun.. _RFC1213 icmpInSrcQuenchs: https://tools.ietf.org/html/rfc1213#page-42 172*4882a593Smuzhiyun.. _RFC1213 icmpInRedirects: https://tools.ietf.org/html/rfc1213#page-42 173*4882a593Smuzhiyun.. _RFC1213 icmpInEchos: https://tools.ietf.org/html/rfc1213#page-42 174*4882a593Smuzhiyun.. _RFC1213 icmpInEchoReps: https://tools.ietf.org/html/rfc1213#page-42 175*4882a593Smuzhiyun.. _RFC1213 icmpInTimestamps: https://tools.ietf.org/html/rfc1213#page-42 176*4882a593Smuzhiyun.. _RFC1213 icmpInTimestampReps: https://tools.ietf.org/html/rfc1213#page-43 177*4882a593Smuzhiyun.. _RFC1213 icmpInAddrMasks: https://tools.ietf.org/html/rfc1213#page-43 178*4882a593Smuzhiyun.. _RFC1213 icmpInAddrMaskReps: https://tools.ietf.org/html/rfc1213#page-43 179*4882a593Smuzhiyun 180*4882a593Smuzhiyun.. _RFC1213 icmpOutDestUnreachs: https://tools.ietf.org/html/rfc1213#page-44 181*4882a593Smuzhiyun.. _RFC1213 icmpOutTimeExcds: https://tools.ietf.org/html/rfc1213#page-44 182*4882a593Smuzhiyun.. _RFC1213 icmpOutParmProbs: https://tools.ietf.org/html/rfc1213#page-44 183*4882a593Smuzhiyun.. _RFC1213 icmpOutSrcQuenchs: https://tools.ietf.org/html/rfc1213#page-44 184*4882a593Smuzhiyun.. _RFC1213 icmpOutRedirects: https://tools.ietf.org/html/rfc1213#page-44 185*4882a593Smuzhiyun.. _RFC1213 icmpOutEchos: https://tools.ietf.org/html/rfc1213#page-45 186*4882a593Smuzhiyun.. _RFC1213 icmpOutEchoReps: https://tools.ietf.org/html/rfc1213#page-45 187*4882a593Smuzhiyun.. _RFC1213 icmpOutTimestamps: https://tools.ietf.org/html/rfc1213#page-45 188*4882a593Smuzhiyun.. _RFC1213 icmpOutTimestampReps: https://tools.ietf.org/html/rfc1213#page-45 189*4882a593Smuzhiyun.. _RFC1213 icmpOutAddrMasks: https://tools.ietf.org/html/rfc1213#page-45 190*4882a593Smuzhiyun.. _RFC1213 icmpOutAddrMaskReps: https://tools.ietf.org/html/rfc1213#page-46 191*4882a593Smuzhiyun 192*4882a593SmuzhiyunEvery ICMP type has two counters: 'In' and 'Out'. E.g., for the ICMP 193*4882a593SmuzhiyunEcho packet, they are IcmpInEchos and IcmpOutEchos. Their meanings are 194*4882a593Smuzhiyunstraightforward. The 'In' counter means kernel receives such a packet 195*4882a593Smuzhiyunand the 'Out' counter means kernel sends such a packet. 196*4882a593Smuzhiyun 197*4882a593Smuzhiyun* ICMP numeric types 198*4882a593Smuzhiyun 199*4882a593SmuzhiyunThey are IcmpMsgInType[N] and IcmpMsgOutType[N], the [N] indicates the 200*4882a593SmuzhiyunICMP type number. These counters track all kinds of ICMP packets. The 201*4882a593SmuzhiyunICMP type number definition could be found in the `ICMP parameters`_ 202*4882a593Smuzhiyundocument. 203*4882a593Smuzhiyun 204*4882a593Smuzhiyun.. _ICMP parameters: https://www.iana.org/assignments/icmp-parameters/icmp-parameters.xhtml 205*4882a593Smuzhiyun 206*4882a593SmuzhiyunFor example, if the Linux kernel sends an ICMP Echo packet, the 207*4882a593SmuzhiyunIcmpMsgOutType8 would increase 1. And if kernel gets an ICMP Echo Reply 208*4882a593Smuzhiyunpacket, IcmpMsgInType0 would increase 1. 209*4882a593Smuzhiyun 210*4882a593Smuzhiyun* IcmpInCsumErrors 211*4882a593Smuzhiyun 212*4882a593SmuzhiyunThis counter indicates the checksum of the ICMP packet is 213*4882a593Smuzhiyunwrong. Kernel verifies the checksum after updating the IcmpInMsgs and 214*4882a593Smuzhiyunbefore updating IcmpMsgInType[N]. If a packet has bad checksum, the 215*4882a593SmuzhiyunIcmpInMsgs would be updated but none of IcmpMsgInType[N] would be updated. 216*4882a593Smuzhiyun 217*4882a593Smuzhiyun* IcmpInErrors and IcmpOutErrors 218*4882a593Smuzhiyun 219*4882a593SmuzhiyunDefined by `RFC1213 icmpInErrors`_ and `RFC1213 icmpOutErrors`_ 220*4882a593Smuzhiyun 221*4882a593Smuzhiyun.. _RFC1213 icmpInErrors: https://tools.ietf.org/html/rfc1213#page-41 222*4882a593Smuzhiyun.. _RFC1213 icmpOutErrors: https://tools.ietf.org/html/rfc1213#page-43 223*4882a593Smuzhiyun 224*4882a593SmuzhiyunWhen an error occurs in the ICMP packet handler path, these two 225*4882a593Smuzhiyuncounters would be updated. The receiving packet path use IcmpInErrors 226*4882a593Smuzhiyunand the sending packet path use IcmpOutErrors. When IcmpInCsumErrors 227*4882a593Smuzhiyunis increased, IcmpInErrors would always be increased too. 228*4882a593Smuzhiyun 229*4882a593Smuzhiyunrelationship of the ICMP counters 230*4882a593Smuzhiyun--------------------------------- 231*4882a593SmuzhiyunThe sum of IcmpMsgOutType[N] is always equal to IcmpOutMsgs, as they 232*4882a593Smuzhiyunare updated at the same time. The sum of IcmpMsgInType[N] plus 233*4882a593SmuzhiyunIcmpInErrors should be equal or larger than IcmpInMsgs. When kernel 234*4882a593Smuzhiyunreceives an ICMP packet, kernel follows below logic: 235*4882a593Smuzhiyun 236*4882a593Smuzhiyun1. increase IcmpInMsgs 237*4882a593Smuzhiyun2. if has any error, update IcmpInErrors and finish the process 238*4882a593Smuzhiyun3. update IcmpMsgOutType[N] 239*4882a593Smuzhiyun4. handle the packet depending on the type, if has any error, update 240*4882a593Smuzhiyun IcmpInErrors and finish the process 241*4882a593Smuzhiyun 242*4882a593SmuzhiyunSo if all errors occur in step (2), IcmpInMsgs should be equal to the 243*4882a593Smuzhiyunsum of IcmpMsgOutType[N] plus IcmpInErrors. If all errors occur in 244*4882a593Smuzhiyunstep (4), IcmpInMsgs should be equal to the sum of 245*4882a593SmuzhiyunIcmpMsgOutType[N]. If the errors occur in both step (2) and step (4), 246*4882a593SmuzhiyunIcmpInMsgs should be less than the sum of IcmpMsgOutType[N] plus 247*4882a593SmuzhiyunIcmpInErrors. 248*4882a593Smuzhiyun 249*4882a593SmuzhiyunGeneral TCP counters 250*4882a593Smuzhiyun==================== 251*4882a593Smuzhiyun* TcpInSegs 252*4882a593Smuzhiyun 253*4882a593SmuzhiyunDefined in `RFC1213 tcpInSegs`_ 254*4882a593Smuzhiyun 255*4882a593Smuzhiyun.. _RFC1213 tcpInSegs: https://tools.ietf.org/html/rfc1213#page-48 256*4882a593Smuzhiyun 257*4882a593SmuzhiyunThe number of packets received by the TCP layer. As mentioned in 258*4882a593SmuzhiyunRFC1213, it includes the packets received in error, such as checksum 259*4882a593Smuzhiyunerror, invalid TCP header and so on. Only one error won't be included: 260*4882a593Smuzhiyunif the layer 2 destination address is not the NIC's layer 2 261*4882a593Smuzhiyunaddress. It might happen if the packet is a multicast or broadcast 262*4882a593Smuzhiyunpacket, or the NIC is in promiscuous mode. In these situations, the 263*4882a593Smuzhiyunpackets would be delivered to the TCP layer, but the TCP layer will discard 264*4882a593Smuzhiyunthese packets before increasing TcpInSegs. The TcpInSegs counter 265*4882a593Smuzhiyunisn't aware of GRO. So if two packets are merged by GRO, the TcpInSegs 266*4882a593Smuzhiyuncounter would only increase 1. 267*4882a593Smuzhiyun 268*4882a593Smuzhiyun* TcpOutSegs 269*4882a593Smuzhiyun 270*4882a593SmuzhiyunDefined in `RFC1213 tcpOutSegs`_ 271*4882a593Smuzhiyun 272*4882a593Smuzhiyun.. _RFC1213 tcpOutSegs: https://tools.ietf.org/html/rfc1213#page-48 273*4882a593Smuzhiyun 274*4882a593SmuzhiyunThe number of packets sent by the TCP layer. As mentioned in RFC1213, 275*4882a593Smuzhiyunit excludes the retransmitted packets. But it includes the SYN, ACK 276*4882a593Smuzhiyunand RST packets. Doesn't like TcpInSegs, the TcpOutSegs is aware of 277*4882a593SmuzhiyunGSO, so if a packet would be split to 2 by GSO, TcpOutSegs will 278*4882a593Smuzhiyunincrease 2. 279*4882a593Smuzhiyun 280*4882a593Smuzhiyun* TcpActiveOpens 281*4882a593Smuzhiyun 282*4882a593SmuzhiyunDefined in `RFC1213 tcpActiveOpens`_ 283*4882a593Smuzhiyun 284*4882a593Smuzhiyun.. _RFC1213 tcpActiveOpens: https://tools.ietf.org/html/rfc1213#page-47 285*4882a593Smuzhiyun 286*4882a593SmuzhiyunIt means the TCP layer sends a SYN, and come into the SYN-SENT 287*4882a593Smuzhiyunstate. Every time TcpActiveOpens increases 1, TcpOutSegs should always 288*4882a593Smuzhiyunincrease 1. 289*4882a593Smuzhiyun 290*4882a593Smuzhiyun* TcpPassiveOpens 291*4882a593Smuzhiyun 292*4882a593SmuzhiyunDefined in `RFC1213 tcpPassiveOpens`_ 293*4882a593Smuzhiyun 294*4882a593Smuzhiyun.. _RFC1213 tcpPassiveOpens: https://tools.ietf.org/html/rfc1213#page-47 295*4882a593Smuzhiyun 296*4882a593SmuzhiyunIt means the TCP layer receives a SYN, replies a SYN+ACK, come into 297*4882a593Smuzhiyunthe SYN-RCVD state. 298*4882a593Smuzhiyun 299*4882a593Smuzhiyun* TcpExtTCPRcvCoalesce 300*4882a593Smuzhiyun 301*4882a593SmuzhiyunWhen packets are received by the TCP layer and are not be read by the 302*4882a593Smuzhiyunapplication, the TCP layer will try to merge them. This counter 303*4882a593Smuzhiyunindicate how many packets are merged in such situation. If GRO is 304*4882a593Smuzhiyunenabled, lots of packets would be merged by GRO, these packets 305*4882a593Smuzhiyunwouldn't be counted to TcpExtTCPRcvCoalesce. 306*4882a593Smuzhiyun 307*4882a593Smuzhiyun* TcpExtTCPAutoCorking 308*4882a593Smuzhiyun 309*4882a593SmuzhiyunWhen sending packets, the TCP layer will try to merge small packets to 310*4882a593Smuzhiyuna bigger one. This counter increase 1 for every packet merged in such 311*4882a593Smuzhiyunsituation. Please refer to the LWN article for more details: 312*4882a593Smuzhiyunhttps://lwn.net/Articles/576263/ 313*4882a593Smuzhiyun 314*4882a593Smuzhiyun* TcpExtTCPOrigDataSent 315*4882a593Smuzhiyun 316*4882a593SmuzhiyunThis counter is explained by `kernel commit f19c29e3e391`_, I pasted the 317*4882a593Smuzhiyunexplaination below:: 318*4882a593Smuzhiyun 319*4882a593Smuzhiyun TCPOrigDataSent: number of outgoing packets with original data (excluding 320*4882a593Smuzhiyun retransmission but including data-in-SYN). This counter is different from 321*4882a593Smuzhiyun TcpOutSegs because TcpOutSegs also tracks pure ACKs. TCPOrigDataSent is 322*4882a593Smuzhiyun more useful to track the TCP retransmission rate. 323*4882a593Smuzhiyun 324*4882a593Smuzhiyun* TCPSynRetrans 325*4882a593Smuzhiyun 326*4882a593SmuzhiyunThis counter is explained by `kernel commit f19c29e3e391`_, I pasted the 327*4882a593Smuzhiyunexplaination below:: 328*4882a593Smuzhiyun 329*4882a593Smuzhiyun TCPSynRetrans: number of SYN and SYN/ACK retransmits to break down 330*4882a593Smuzhiyun retransmissions into SYN, fast-retransmits, timeout retransmits, etc. 331*4882a593Smuzhiyun 332*4882a593Smuzhiyun* TCPFastOpenActiveFail 333*4882a593Smuzhiyun 334*4882a593SmuzhiyunThis counter is explained by `kernel commit f19c29e3e391`_, I pasted the 335*4882a593Smuzhiyunexplaination below:: 336*4882a593Smuzhiyun 337*4882a593Smuzhiyun TCPFastOpenActiveFail: Fast Open attempts (SYN/data) failed because 338*4882a593Smuzhiyun the remote does not accept it or the attempts timed out. 339*4882a593Smuzhiyun 340*4882a593Smuzhiyun.. _kernel commit f19c29e3e391: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f19c29e3e391a66a273e9afebaf01917245148cd 341*4882a593Smuzhiyun 342*4882a593Smuzhiyun* TcpExtListenOverflows and TcpExtListenDrops 343*4882a593Smuzhiyun 344*4882a593SmuzhiyunWhen kernel receives a SYN from a client, and if the TCP accept queue 345*4882a593Smuzhiyunis full, kernel will drop the SYN and add 1 to TcpExtListenOverflows. 346*4882a593SmuzhiyunAt the same time kernel will also add 1 to TcpExtListenDrops. When a 347*4882a593SmuzhiyunTCP socket is in LISTEN state, and kernel need to drop a packet, 348*4882a593Smuzhiyunkernel would always add 1 to TcpExtListenDrops. So increase 349*4882a593SmuzhiyunTcpExtListenOverflows would let TcpExtListenDrops increasing at the 350*4882a593Smuzhiyunsame time, but TcpExtListenDrops would also increase without 351*4882a593SmuzhiyunTcpExtListenOverflows increasing, e.g. a memory allocation fail would 352*4882a593Smuzhiyunalso let TcpExtListenDrops increase. 353*4882a593Smuzhiyun 354*4882a593SmuzhiyunNote: The above explanation is based on kernel 4.10 or above version, on 355*4882a593Smuzhiyunan old kernel, the TCP stack has different behavior when TCP accept 356*4882a593Smuzhiyunqueue is full. On the old kernel, TCP stack won't drop the SYN, it 357*4882a593Smuzhiyunwould complete the 3-way handshake. As the accept queue is full, TCP 358*4882a593Smuzhiyunstack will keep the socket in the TCP half-open queue. As it is in the 359*4882a593Smuzhiyunhalf open queue, TCP stack will send SYN+ACK on an exponential backoff 360*4882a593Smuzhiyuntimer, after client replies ACK, TCP stack checks whether the accept 361*4882a593Smuzhiyunqueue is still full, if it is not full, moves the socket to the accept 362*4882a593Smuzhiyunqueue, if it is full, keeps the socket in the half-open queue, at next 363*4882a593Smuzhiyuntime client replies ACK, this socket will get another chance to move 364*4882a593Smuzhiyunto the accept queue. 365*4882a593Smuzhiyun 366*4882a593Smuzhiyun 367*4882a593SmuzhiyunTCP Fast Open 368*4882a593Smuzhiyun============= 369*4882a593Smuzhiyun* TcpEstabResets 370*4882a593Smuzhiyun 371*4882a593SmuzhiyunDefined in `RFC1213 tcpEstabResets`_. 372*4882a593Smuzhiyun 373*4882a593Smuzhiyun.. _RFC1213 tcpEstabResets: https://tools.ietf.org/html/rfc1213#page-48 374*4882a593Smuzhiyun 375*4882a593Smuzhiyun* TcpAttemptFails 376*4882a593Smuzhiyun 377*4882a593SmuzhiyunDefined in `RFC1213 tcpAttemptFails`_. 378*4882a593Smuzhiyun 379*4882a593Smuzhiyun.. _RFC1213 tcpAttemptFails: https://tools.ietf.org/html/rfc1213#page-48 380*4882a593Smuzhiyun 381*4882a593Smuzhiyun* TcpOutRsts 382*4882a593Smuzhiyun 383*4882a593SmuzhiyunDefined in `RFC1213 tcpOutRsts`_. The RFC says this counter indicates 384*4882a593Smuzhiyunthe 'segments sent containing the RST flag', but in linux kernel, this 385*4882a593Smuzhiyuncouner indicates the segments kerenl tried to send. The sending 386*4882a593Smuzhiyunprocess might be failed due to some errors (e.g. memory alloc failed). 387*4882a593Smuzhiyun 388*4882a593Smuzhiyun.. _RFC1213 tcpOutRsts: https://tools.ietf.org/html/rfc1213#page-52 389*4882a593Smuzhiyun 390*4882a593Smuzhiyun* TcpExtTCPSpuriousRtxHostQueues 391*4882a593Smuzhiyun 392*4882a593SmuzhiyunWhen the TCP stack wants to retransmit a packet, and finds that packet 393*4882a593Smuzhiyunis not lost in the network, but the packet is not sent yet, the TCP 394*4882a593Smuzhiyunstack would give up the retransmission and update this counter. It 395*4882a593Smuzhiyunmight happen if a packet stays too long time in a qdisc or driver 396*4882a593Smuzhiyunqueue. 397*4882a593Smuzhiyun 398*4882a593Smuzhiyun* TcpEstabResets 399*4882a593Smuzhiyun 400*4882a593SmuzhiyunThe socket receives a RST packet in Establish or CloseWait state. 401*4882a593Smuzhiyun 402*4882a593Smuzhiyun* TcpExtTCPKeepAlive 403*4882a593Smuzhiyun 404*4882a593SmuzhiyunThis counter indicates many keepalive packets were sent. The keepalive 405*4882a593Smuzhiyunwon't be enabled by default. A userspace program could enable it by 406*4882a593Smuzhiyunsetting the SO_KEEPALIVE socket option. 407*4882a593Smuzhiyun 408*4882a593Smuzhiyun* TcpExtTCPSpuriousRTOs 409*4882a593Smuzhiyun 410*4882a593SmuzhiyunThe spurious retransmission timeout detected by the `F-RTO`_ 411*4882a593Smuzhiyunalgorithm. 412*4882a593Smuzhiyun 413*4882a593Smuzhiyun.. _F-RTO: https://tools.ietf.org/html/rfc5682 414*4882a593Smuzhiyun 415*4882a593SmuzhiyunTCP Fast Path 416*4882a593Smuzhiyun============= 417*4882a593SmuzhiyunWhen kernel receives a TCP packet, it has two paths to handler the 418*4882a593Smuzhiyunpacket, one is fast path, another is slow path. The comment in kernel 419*4882a593Smuzhiyuncode provides a good explanation of them, I pasted them below:: 420*4882a593Smuzhiyun 421*4882a593Smuzhiyun It is split into a fast path and a slow path. The fast path is 422*4882a593Smuzhiyun disabled when: 423*4882a593Smuzhiyun 424*4882a593Smuzhiyun - A zero window was announced from us 425*4882a593Smuzhiyun - zero window probing 426*4882a593Smuzhiyun is only handled properly on the slow path. 427*4882a593Smuzhiyun - Out of order segments arrived. 428*4882a593Smuzhiyun - Urgent data is expected. 429*4882a593Smuzhiyun - There is no buffer space left 430*4882a593Smuzhiyun - Unexpected TCP flags/window values/header lengths are received 431*4882a593Smuzhiyun (detected by checking the TCP header against pred_flags) 432*4882a593Smuzhiyun - Data is sent in both directions. The fast path only supports pure senders 433*4882a593Smuzhiyun or pure receivers (this means either the sequence number or the ack 434*4882a593Smuzhiyun value must stay constant) 435*4882a593Smuzhiyun - Unexpected TCP option. 436*4882a593Smuzhiyun 437*4882a593SmuzhiyunKernel will try to use fast path unless any of the above conditions 438*4882a593Smuzhiyunare satisfied. If the packets are out of order, kernel will handle 439*4882a593Smuzhiyunthem in slow path, which means the performance might be not very 440*4882a593Smuzhiyungood. Kernel would also come into slow path if the "Delayed ack" is 441*4882a593Smuzhiyunused, because when using "Delayed ack", the data is sent in both 442*4882a593Smuzhiyundirections. When the TCP window scale option is not used, kernel will 443*4882a593Smuzhiyuntry to enable fast path immediately when the connection comes into the 444*4882a593Smuzhiyunestablished state, but if the TCP window scale option is used, kernel 445*4882a593Smuzhiyunwill disable the fast path at first, and try to enable it after kernel 446*4882a593Smuzhiyunreceives packets. 447*4882a593Smuzhiyun 448*4882a593Smuzhiyun* TcpExtTCPPureAcks and TcpExtTCPHPAcks 449*4882a593Smuzhiyun 450*4882a593SmuzhiyunIf a packet set ACK flag and has no data, it is a pure ACK packet, if 451*4882a593Smuzhiyunkernel handles it in the fast path, TcpExtTCPHPAcks will increase 1, 452*4882a593Smuzhiyunif kernel handles it in the slow path, TcpExtTCPPureAcks will 453*4882a593Smuzhiyunincrease 1. 454*4882a593Smuzhiyun 455*4882a593Smuzhiyun* TcpExtTCPHPHits 456*4882a593Smuzhiyun 457*4882a593SmuzhiyunIf a TCP packet has data (which means it is not a pure ACK packet), 458*4882a593Smuzhiyunand this packet is handled in the fast path, TcpExtTCPHPHits will 459*4882a593Smuzhiyunincrease 1. 460*4882a593Smuzhiyun 461*4882a593Smuzhiyun 462*4882a593SmuzhiyunTCP abort 463*4882a593Smuzhiyun========= 464*4882a593Smuzhiyun* TcpExtTCPAbortOnData 465*4882a593Smuzhiyun 466*4882a593SmuzhiyunIt means TCP layer has data in flight, but need to close the 467*4882a593Smuzhiyunconnection. So TCP layer sends a RST to the other side, indicate the 468*4882a593Smuzhiyunconnection is not closed very graceful. An easy way to increase this 469*4882a593Smuzhiyuncounter is using the SO_LINGER option. Please refer to the SO_LINGER 470*4882a593Smuzhiyunsection of the `socket man page`_: 471*4882a593Smuzhiyun 472*4882a593Smuzhiyun.. _socket man page: http://man7.org/linux/man-pages/man7/socket.7.html 473*4882a593Smuzhiyun 474*4882a593SmuzhiyunBy default, when an application closes a connection, the close function 475*4882a593Smuzhiyunwill return immediately and kernel will try to send the in-flight data 476*4882a593Smuzhiyunasync. If you use the SO_LINGER option, set l_onoff to 1, and l_linger 477*4882a593Smuzhiyunto a positive number, the close function won't return immediately, but 478*4882a593Smuzhiyunwait for the in-flight data are acked by the other side, the max wait 479*4882a593Smuzhiyuntime is l_linger seconds. If set l_onoff to 1 and set l_linger to 0, 480*4882a593Smuzhiyunwhen the application closes a connection, kernel will send a RST 481*4882a593Smuzhiyunimmediately and increase the TcpExtTCPAbortOnData counter. 482*4882a593Smuzhiyun 483*4882a593Smuzhiyun* TcpExtTCPAbortOnClose 484*4882a593Smuzhiyun 485*4882a593SmuzhiyunThis counter means the application has unread data in the TCP layer when 486*4882a593Smuzhiyunthe application wants to close the TCP connection. In such a situation, 487*4882a593Smuzhiyunkernel will send a RST to the other side of the TCP connection. 488*4882a593Smuzhiyun 489*4882a593Smuzhiyun* TcpExtTCPAbortOnMemory 490*4882a593Smuzhiyun 491*4882a593SmuzhiyunWhen an application closes a TCP connection, kernel still need to track 492*4882a593Smuzhiyunthe connection, let it complete the TCP disconnect process. E.g. an 493*4882a593Smuzhiyunapp calls the close method of a socket, kernel sends fin to the other 494*4882a593Smuzhiyunside of the connection, then the app has no relationship with the 495*4882a593Smuzhiyunsocket any more, but kernel need to keep the socket, this socket 496*4882a593Smuzhiyunbecomes an orphan socket, kernel waits for the reply of the other side, 497*4882a593Smuzhiyunand would come to the TIME_WAIT state finally. When kernel has no 498*4882a593Smuzhiyunenough memory to keep the orphan socket, kernel would send an RST to 499*4882a593Smuzhiyunthe other side, and delete the socket, in such situation, kernel will 500*4882a593Smuzhiyunincrease 1 to the TcpExtTCPAbortOnMemory. Two conditions would trigger 501*4882a593SmuzhiyunTcpExtTCPAbortOnMemory: 502*4882a593Smuzhiyun 503*4882a593Smuzhiyun1. the memory used by the TCP protocol is higher than the third value of 504*4882a593Smuzhiyunthe tcp_mem. Please refer the tcp_mem section in the `TCP man page`_: 505*4882a593Smuzhiyun 506*4882a593Smuzhiyun.. _TCP man page: http://man7.org/linux/man-pages/man7/tcp.7.html 507*4882a593Smuzhiyun 508*4882a593Smuzhiyun2. the orphan socket count is higher than net.ipv4.tcp_max_orphans 509*4882a593Smuzhiyun 510*4882a593Smuzhiyun 511*4882a593Smuzhiyun* TcpExtTCPAbortOnTimeout 512*4882a593Smuzhiyun 513*4882a593SmuzhiyunThis counter will increase when any of the TCP timers expire. In such 514*4882a593Smuzhiyunsituation, kernel won't send RST, just give up the connection. 515*4882a593Smuzhiyun 516*4882a593Smuzhiyun* TcpExtTCPAbortOnLinger 517*4882a593Smuzhiyun 518*4882a593SmuzhiyunWhen a TCP connection comes into FIN_WAIT_2 state, instead of waiting 519*4882a593Smuzhiyunfor the fin packet from the other side, kernel could send a RST and 520*4882a593Smuzhiyundelete the socket immediately. This is not the default behavior of 521*4882a593SmuzhiyunLinux kernel TCP stack. By configuring the TCP_LINGER2 socket option, 522*4882a593Smuzhiyunyou could let kernel follow this behavior. 523*4882a593Smuzhiyun 524*4882a593Smuzhiyun* TcpExtTCPAbortFailed 525*4882a593Smuzhiyun 526*4882a593SmuzhiyunThe kernel TCP layer will send RST if the `RFC2525 2.17 section`_ is 527*4882a593Smuzhiyunsatisfied. If an internal error occurs during this process, 528*4882a593SmuzhiyunTcpExtTCPAbortFailed will be increased. 529*4882a593Smuzhiyun 530*4882a593Smuzhiyun.. _RFC2525 2.17 section: https://tools.ietf.org/html/rfc2525#page-50 531*4882a593Smuzhiyun 532*4882a593SmuzhiyunTCP Hybrid Slow Start 533*4882a593Smuzhiyun===================== 534*4882a593SmuzhiyunThe Hybrid Slow Start algorithm is an enhancement of the traditional 535*4882a593SmuzhiyunTCP congestion window Slow Start algorithm. It uses two pieces of 536*4882a593Smuzhiyuninformation to detect whether the max bandwidth of the TCP path is 537*4882a593Smuzhiyunapproached. The two pieces of information are ACK train length and 538*4882a593Smuzhiyunincrease in packet delay. For detail information, please refer the 539*4882a593Smuzhiyun`Hybrid Slow Start paper`_. Either ACK train length or packet delay 540*4882a593Smuzhiyunhits a specific threshold, the congestion control algorithm will come 541*4882a593Smuzhiyuninto the Congestion Avoidance state. Until v4.20, two congestion 542*4882a593Smuzhiyuncontrol algorithms are using Hybrid Slow Start, they are cubic (the 543*4882a593Smuzhiyundefault congestion control algorithm) and cdg. Four snmp counters 544*4882a593Smuzhiyunrelate with the Hybrid Slow Start algorithm. 545*4882a593Smuzhiyun 546*4882a593Smuzhiyun.. _Hybrid Slow Start paper: https://pdfs.semanticscholar.org/25e9/ef3f03315782c7f1cbcd31b587857adae7d1.pdf 547*4882a593Smuzhiyun 548*4882a593Smuzhiyun* TcpExtTCPHystartTrainDetect 549*4882a593Smuzhiyun 550*4882a593SmuzhiyunHow many times the ACK train length threshold is detected 551*4882a593Smuzhiyun 552*4882a593Smuzhiyun* TcpExtTCPHystartTrainCwnd 553*4882a593Smuzhiyun 554*4882a593SmuzhiyunThe sum of CWND detected by ACK train length. Dividing this value by 555*4882a593SmuzhiyunTcpExtTCPHystartTrainDetect is the average CWND which detected by the 556*4882a593SmuzhiyunACK train length. 557*4882a593Smuzhiyun 558*4882a593Smuzhiyun* TcpExtTCPHystartDelayDetect 559*4882a593Smuzhiyun 560*4882a593SmuzhiyunHow many times the packet delay threshold is detected. 561*4882a593Smuzhiyun 562*4882a593Smuzhiyun* TcpExtTCPHystartDelayCwnd 563*4882a593Smuzhiyun 564*4882a593SmuzhiyunThe sum of CWND detected by packet delay. Dividing this value by 565*4882a593SmuzhiyunTcpExtTCPHystartDelayDetect is the average CWND which detected by the 566*4882a593Smuzhiyunpacket delay. 567*4882a593Smuzhiyun 568*4882a593SmuzhiyunTCP retransmission and congestion control 569*4882a593Smuzhiyun========================================= 570*4882a593SmuzhiyunThe TCP protocol has two retransmission mechanisms: SACK and fast 571*4882a593Smuzhiyunrecovery. They are exclusive with each other. When SACK is enabled, 572*4882a593Smuzhiyunthe kernel TCP stack would use SACK, or kernel would use fast 573*4882a593Smuzhiyunrecovery. The SACK is a TCP option, which is defined in `RFC2018`_, 574*4882a593Smuzhiyunthe fast recovery is defined in `RFC6582`_, which is also called 575*4882a593Smuzhiyun'Reno'. 576*4882a593Smuzhiyun 577*4882a593SmuzhiyunThe TCP congestion control is a big and complex topic. To understand 578*4882a593Smuzhiyunthe related snmp counter, we need to know the states of the congestion 579*4882a593Smuzhiyuncontrol state machine. There are 5 states: Open, Disorder, CWR, 580*4882a593SmuzhiyunRecovery and Loss. For details about these states, please refer page 5 581*4882a593Smuzhiyunand page 6 of this document: 582*4882a593Smuzhiyunhttps://pdfs.semanticscholar.org/0e9c/968d09ab2e53e24c4dca5b2d67c7f7140f8e.pdf 583*4882a593Smuzhiyun 584*4882a593Smuzhiyun.. _RFC2018: https://tools.ietf.org/html/rfc2018 585*4882a593Smuzhiyun.. _RFC6582: https://tools.ietf.org/html/rfc6582 586*4882a593Smuzhiyun 587*4882a593Smuzhiyun* TcpExtTCPRenoRecovery and TcpExtTCPSackRecovery 588*4882a593Smuzhiyun 589*4882a593SmuzhiyunWhen the congestion control comes into Recovery state, if sack is 590*4882a593Smuzhiyunused, TcpExtTCPSackRecovery increases 1, if sack is not used, 591*4882a593SmuzhiyunTcpExtTCPRenoRecovery increases 1. These two counters mean the TCP 592*4882a593Smuzhiyunstack begins to retransmit the lost packets. 593*4882a593Smuzhiyun 594*4882a593Smuzhiyun* TcpExtTCPSACKReneging 595*4882a593Smuzhiyun 596*4882a593SmuzhiyunA packet was acknowledged by SACK, but the receiver has dropped this 597*4882a593Smuzhiyunpacket, so the sender needs to retransmit this packet. In this 598*4882a593Smuzhiyunsituation, the sender adds 1 to TcpExtTCPSACKReneging. A receiver 599*4882a593Smuzhiyuncould drop a packet which has been acknowledged by SACK, although it is 600*4882a593Smuzhiyununusual, it is allowed by the TCP protocol. The sender doesn't really 601*4882a593Smuzhiyunknow what happened on the receiver side. The sender just waits until 602*4882a593Smuzhiyunthe RTO expires for this packet, then the sender assumes this packet 603*4882a593Smuzhiyunhas been dropped by the receiver. 604*4882a593Smuzhiyun 605*4882a593Smuzhiyun* TcpExtTCPRenoReorder 606*4882a593Smuzhiyun 607*4882a593SmuzhiyunThe reorder packet is detected by fast recovery. It would only be used 608*4882a593Smuzhiyunif SACK is disabled. The fast recovery algorithm detects recorder by 609*4882a593Smuzhiyunthe duplicate ACK number. E.g., if retransmission is triggered, and 610*4882a593Smuzhiyunthe original retransmitted packet is not lost, it is just out of 611*4882a593Smuzhiyunorder, the receiver would acknowledge multiple times, one for the 612*4882a593Smuzhiyunretransmitted packet, another for the arriving of the original out of 613*4882a593Smuzhiyunorder packet. Thus the sender would find more ACks than its 614*4882a593Smuzhiyunexpectation, and the sender knows out of order occurs. 615*4882a593Smuzhiyun 616*4882a593Smuzhiyun* TcpExtTCPTSReorder 617*4882a593Smuzhiyun 618*4882a593SmuzhiyunThe reorder packet is detected when a hole is filled. E.g., assume the 619*4882a593Smuzhiyunsender sends packet 1,2,3,4,5, and the receiving order is 620*4882a593Smuzhiyun1,2,4,5,3. When the sender receives the ACK of packet 3 (which will 621*4882a593Smuzhiyunfill the hole), two conditions will let TcpExtTCPTSReorder increase 622*4882a593Smuzhiyun1: (1) if the packet 3 is not re-retransmitted yet. (2) if the packet 623*4882a593Smuzhiyun3 is retransmitted but the timestamp of the packet 3's ACK is earlier 624*4882a593Smuzhiyunthan the retransmission timestamp. 625*4882a593Smuzhiyun 626*4882a593Smuzhiyun* TcpExtTCPSACKReorder 627*4882a593Smuzhiyun 628*4882a593SmuzhiyunThe reorder packet detected by SACK. The SACK has two methods to 629*4882a593Smuzhiyundetect reorder: (1) DSACK is received by the sender. It means the 630*4882a593Smuzhiyunsender sends the same packet more than one times. And the only reason 631*4882a593Smuzhiyunis the sender believes an out of order packet is lost so it sends the 632*4882a593Smuzhiyunpacket again. (2) Assume packet 1,2,3,4,5 are sent by the sender, and 633*4882a593Smuzhiyunthe sender has received SACKs for packet 2 and 5, now the sender 634*4882a593Smuzhiyunreceives SACK for packet 4 and the sender doesn't retransmit the 635*4882a593Smuzhiyunpacket yet, the sender would know packet 4 is out of order. The TCP 636*4882a593Smuzhiyunstack of kernel will increase TcpExtTCPSACKReorder for both of the 637*4882a593Smuzhiyunabove scenarios. 638*4882a593Smuzhiyun 639*4882a593Smuzhiyun* TcpExtTCPSlowStartRetrans 640*4882a593Smuzhiyun 641*4882a593SmuzhiyunThe TCP stack wants to retransmit a packet and the congestion control 642*4882a593Smuzhiyunstate is 'Loss'. 643*4882a593Smuzhiyun 644*4882a593Smuzhiyun* TcpExtTCPFastRetrans 645*4882a593Smuzhiyun 646*4882a593SmuzhiyunThe TCP stack wants to retransmit a packet and the congestion control 647*4882a593Smuzhiyunstate is not 'Loss'. 648*4882a593Smuzhiyun 649*4882a593Smuzhiyun* TcpExtTCPLostRetransmit 650*4882a593Smuzhiyun 651*4882a593SmuzhiyunA SACK points out that a retransmission packet is lost again. 652*4882a593Smuzhiyun 653*4882a593Smuzhiyun* TcpExtTCPRetransFail 654*4882a593Smuzhiyun 655*4882a593SmuzhiyunThe TCP stack tries to deliver a retransmission packet to lower layers 656*4882a593Smuzhiyunbut the lower layers return an error. 657*4882a593Smuzhiyun 658*4882a593Smuzhiyun* TcpExtTCPSynRetrans 659*4882a593Smuzhiyun 660*4882a593SmuzhiyunThe TCP stack retransmits a SYN packet. 661*4882a593Smuzhiyun 662*4882a593SmuzhiyunDSACK 663*4882a593Smuzhiyun===== 664*4882a593SmuzhiyunThe DSACK is defined in `RFC2883`_. The receiver uses DSACK to report 665*4882a593Smuzhiyunduplicate packets to the sender. There are two kinds of 666*4882a593Smuzhiyunduplications: (1) a packet which has been acknowledged is 667*4882a593Smuzhiyunduplicate. (2) an out of order packet is duplicate. The TCP stack 668*4882a593Smuzhiyuncounts these two kinds of duplications on both receiver side and 669*4882a593Smuzhiyunsender side. 670*4882a593Smuzhiyun 671*4882a593Smuzhiyun.. _RFC2883 : https://tools.ietf.org/html/rfc2883 672*4882a593Smuzhiyun 673*4882a593Smuzhiyun* TcpExtTCPDSACKOldSent 674*4882a593Smuzhiyun 675*4882a593SmuzhiyunThe TCP stack receives a duplicate packet which has been acked, so it 676*4882a593Smuzhiyunsends a DSACK to the sender. 677*4882a593Smuzhiyun 678*4882a593Smuzhiyun* TcpExtTCPDSACKOfoSent 679*4882a593Smuzhiyun 680*4882a593SmuzhiyunThe TCP stack receives an out of order duplicate packet, so it sends a 681*4882a593SmuzhiyunDSACK to the sender. 682*4882a593Smuzhiyun 683*4882a593Smuzhiyun* TcpExtTCPDSACKRecv 684*4882a593Smuzhiyun 685*4882a593SmuzhiyunThe TCP stack receives a DSACK, which indicates an acknowledged 686*4882a593Smuzhiyunduplicate packet is received. 687*4882a593Smuzhiyun 688*4882a593Smuzhiyun* TcpExtTCPDSACKOfoRecv 689*4882a593Smuzhiyun 690*4882a593SmuzhiyunThe TCP stack receives a DSACK, which indicate an out of order 691*4882a593Smuzhiyunduplicate packet is received. 692*4882a593Smuzhiyun 693*4882a593Smuzhiyuninvalid SACK and DSACK 694*4882a593Smuzhiyun====================== 695*4882a593SmuzhiyunWhen a SACK (or DSACK) block is invalid, a corresponding counter would 696*4882a593Smuzhiyunbe updated. The validation method is base on the start/end sequence 697*4882a593Smuzhiyunnumber of the SACK block. For more details, please refer the comment 698*4882a593Smuzhiyunof the function tcp_is_sackblock_valid in the kernel source code. A 699*4882a593SmuzhiyunSACK option could have up to 4 blocks, they are checked 700*4882a593Smuzhiyunindividually. E.g., if 3 blocks of a SACk is invalid, the 701*4882a593Smuzhiyuncorresponding counter would be updated 3 times. The comment of the 702*4882a593Smuzhiyun`Add counters for discarded SACK blocks`_ patch has additional 703*4882a593Smuzhiyunexplaination: 704*4882a593Smuzhiyun 705*4882a593Smuzhiyun.. _Add counters for discarded SACK blocks: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=18f02545a9a16c9a89778b91a162ad16d510bb32 706*4882a593Smuzhiyun 707*4882a593Smuzhiyun* TcpExtTCPSACKDiscard 708*4882a593Smuzhiyun 709*4882a593SmuzhiyunThis counter indicates how many SACK blocks are invalid. If the invalid 710*4882a593SmuzhiyunSACK block is caused by ACK recording, the TCP stack will only ignore 711*4882a593Smuzhiyunit and won't update this counter. 712*4882a593Smuzhiyun 713*4882a593Smuzhiyun* TcpExtTCPDSACKIgnoredOld and TcpExtTCPDSACKIgnoredNoUndo 714*4882a593Smuzhiyun 715*4882a593SmuzhiyunWhen a DSACK block is invalid, one of these two counters would be 716*4882a593Smuzhiyunupdated. Which counter will be updated depends on the undo_marker flag 717*4882a593Smuzhiyunof the TCP socket. If the undo_marker is not set, the TCP stack isn't 718*4882a593Smuzhiyunlikely to re-transmit any packets, and we still receive an invalid 719*4882a593SmuzhiyunDSACK block, the reason might be that the packet is duplicated in the 720*4882a593Smuzhiyunmiddle of the network. In such scenario, TcpExtTCPDSACKIgnoredNoUndo 721*4882a593Smuzhiyunwill be updated. If the undo_marker is set, TcpExtTCPDSACKIgnoredOld 722*4882a593Smuzhiyunwill be updated. As implied in its name, it might be an old packet. 723*4882a593Smuzhiyun 724*4882a593SmuzhiyunSACK shift 725*4882a593Smuzhiyun========== 726*4882a593SmuzhiyunThe linux networking stack stores data in sk_buff struct (skb for 727*4882a593Smuzhiyunshort). If a SACK block acrosses multiple skb, the TCP stack will try 728*4882a593Smuzhiyunto re-arrange data in these skb. E.g. if a SACK block acknowledges seq 729*4882a593Smuzhiyun10 to 15, skb1 has seq 10 to 13, skb2 has seq 14 to 20. The seq 14 and 730*4882a593Smuzhiyun15 in skb2 would be moved to skb1. This operation is 'shift'. If a 731*4882a593SmuzhiyunSACK block acknowledges seq 10 to 20, skb1 has seq 10 to 13, skb2 has 732*4882a593Smuzhiyunseq 14 to 20. All data in skb2 will be moved to skb1, and skb2 will be 733*4882a593Smuzhiyundiscard, this operation is 'merge'. 734*4882a593Smuzhiyun 735*4882a593Smuzhiyun* TcpExtTCPSackShifted 736*4882a593Smuzhiyun 737*4882a593SmuzhiyunA skb is shifted 738*4882a593Smuzhiyun 739*4882a593Smuzhiyun* TcpExtTCPSackMerged 740*4882a593Smuzhiyun 741*4882a593SmuzhiyunA skb is merged 742*4882a593Smuzhiyun 743*4882a593Smuzhiyun* TcpExtTCPSackShiftFallback 744*4882a593Smuzhiyun 745*4882a593SmuzhiyunA skb should be shifted or merged, but the TCP stack doesn't do it for 746*4882a593Smuzhiyunsome reasons. 747*4882a593Smuzhiyun 748*4882a593SmuzhiyunTCP out of order 749*4882a593Smuzhiyun================ 750*4882a593Smuzhiyun* TcpExtTCPOFOQueue 751*4882a593Smuzhiyun 752*4882a593SmuzhiyunThe TCP layer receives an out of order packet and has enough memory 753*4882a593Smuzhiyunto queue it. 754*4882a593Smuzhiyun 755*4882a593Smuzhiyun* TcpExtTCPOFODrop 756*4882a593Smuzhiyun 757*4882a593SmuzhiyunThe TCP layer receives an out of order packet but doesn't have enough 758*4882a593Smuzhiyunmemory, so drops it. Such packets won't be counted into 759*4882a593SmuzhiyunTcpExtTCPOFOQueue. 760*4882a593Smuzhiyun 761*4882a593Smuzhiyun* TcpExtTCPOFOMerge 762*4882a593Smuzhiyun 763*4882a593SmuzhiyunThe received out of order packet has an overlay with the previous 764*4882a593Smuzhiyunpacket. the overlay part will be dropped. All of TcpExtTCPOFOMerge 765*4882a593Smuzhiyunpackets will also be counted into TcpExtTCPOFOQueue. 766*4882a593Smuzhiyun 767*4882a593SmuzhiyunTCP PAWS 768*4882a593Smuzhiyun======== 769*4882a593SmuzhiyunPAWS (Protection Against Wrapped Sequence numbers) is an algorithm 770*4882a593Smuzhiyunwhich is used to drop old packets. It depends on the TCP 771*4882a593Smuzhiyuntimestamps. For detail information, please refer the `timestamp wiki`_ 772*4882a593Smuzhiyunand the `RFC of PAWS`_. 773*4882a593Smuzhiyun 774*4882a593Smuzhiyun.. _RFC of PAWS: https://tools.ietf.org/html/rfc1323#page-17 775*4882a593Smuzhiyun.. _timestamp wiki: https://en.wikipedia.org/wiki/Transmission_Control_Protocol#TCP_timestamps 776*4882a593Smuzhiyun 777*4882a593Smuzhiyun* TcpExtPAWSActive 778*4882a593Smuzhiyun 779*4882a593SmuzhiyunPackets are dropped by PAWS in Syn-Sent status. 780*4882a593Smuzhiyun 781*4882a593Smuzhiyun* TcpExtPAWSEstab 782*4882a593Smuzhiyun 783*4882a593SmuzhiyunPackets are dropped by PAWS in any status other than Syn-Sent. 784*4882a593Smuzhiyun 785*4882a593SmuzhiyunTCP ACK skip 786*4882a593Smuzhiyun============ 787*4882a593SmuzhiyunIn some scenarios, kernel would avoid sending duplicate ACKs too 788*4882a593Smuzhiyunfrequently. Please find more details in the tcp_invalid_ratelimit 789*4882a593Smuzhiyunsection of the `sysctl document`_. When kernel decides to skip an ACK 790*4882a593Smuzhiyundue to tcp_invalid_ratelimit, kernel would update one of below 791*4882a593Smuzhiyuncounters to indicate the ACK is skipped in which scenario. The ACK 792*4882a593Smuzhiyunwould only be skipped if the received packet is either a SYN packet or 793*4882a593Smuzhiyunit has no data. 794*4882a593Smuzhiyun 795*4882a593Smuzhiyun.. _sysctl document: https://www.kernel.org/doc/Documentation/networking/ip-sysctl.rst 796*4882a593Smuzhiyun 797*4882a593Smuzhiyun* TcpExtTCPACKSkippedSynRecv 798*4882a593Smuzhiyun 799*4882a593SmuzhiyunThe ACK is skipped in Syn-Recv status. The Syn-Recv status means the 800*4882a593SmuzhiyunTCP stack receives a SYN and replies SYN+ACK. Now the TCP stack is 801*4882a593Smuzhiyunwaiting for an ACK. Generally, the TCP stack doesn't need to send ACK 802*4882a593Smuzhiyunin the Syn-Recv status. But in several scenarios, the TCP stack need 803*4882a593Smuzhiyunto send an ACK. E.g., the TCP stack receives the same SYN packet 804*4882a593Smuzhiyunrepeately, the received packet does not pass the PAWS check, or the 805*4882a593Smuzhiyunreceived packet sequence number is out of window. In these scenarios, 806*4882a593Smuzhiyunthe TCP stack needs to send ACK. If the ACk sending frequency is higher than 807*4882a593Smuzhiyuntcp_invalid_ratelimit allows, the TCP stack will skip sending ACK and 808*4882a593Smuzhiyunincrease TcpExtTCPACKSkippedSynRecv. 809*4882a593Smuzhiyun 810*4882a593Smuzhiyun 811*4882a593Smuzhiyun* TcpExtTCPACKSkippedPAWS 812*4882a593Smuzhiyun 813*4882a593SmuzhiyunThe ACK is skipped due to PAWS (Protect Against Wrapped Sequence 814*4882a593Smuzhiyunnumbers) check fails. If the PAWS check fails in Syn-Recv, Fin-Wait-2 815*4882a593Smuzhiyunor Time-Wait statuses, the skipped ACK would be counted to 816*4882a593SmuzhiyunTcpExtTCPACKSkippedSynRecv, TcpExtTCPACKSkippedFinWait2 or 817*4882a593SmuzhiyunTcpExtTCPACKSkippedTimeWait. In all other statuses, the skipped ACK 818*4882a593Smuzhiyunwould be counted to TcpExtTCPACKSkippedPAWS. 819*4882a593Smuzhiyun 820*4882a593Smuzhiyun* TcpExtTCPACKSkippedSeq 821*4882a593Smuzhiyun 822*4882a593SmuzhiyunThe sequence number is out of window and the timestamp passes the PAWS 823*4882a593Smuzhiyuncheck and the TCP status is not Syn-Recv, Fin-Wait-2, and Time-Wait. 824*4882a593Smuzhiyun 825*4882a593Smuzhiyun* TcpExtTCPACKSkippedFinWait2 826*4882a593Smuzhiyun 827*4882a593SmuzhiyunThe ACK is skipped in Fin-Wait-2 status, the reason would be either 828*4882a593SmuzhiyunPAWS check fails or the received sequence number is out of window. 829*4882a593Smuzhiyun 830*4882a593Smuzhiyun* TcpExtTCPACKSkippedTimeWait 831*4882a593Smuzhiyun 832*4882a593SmuzhiyunTha ACK is skipped in Time-Wait status, the reason would be either 833*4882a593SmuzhiyunPAWS check failed or the received sequence number is out of window. 834*4882a593Smuzhiyun 835*4882a593Smuzhiyun* TcpExtTCPACKSkippedChallenge 836*4882a593Smuzhiyun 837*4882a593SmuzhiyunThe ACK is skipped if the ACK is a challenge ACK. The RFC 5961 defines 838*4882a593Smuzhiyun3 kind of challenge ACK, please refer `RFC 5961 section 3.2`_, 839*4882a593Smuzhiyun`RFC 5961 section 4.2`_ and `RFC 5961 section 5.2`_. Besides these 840*4882a593Smuzhiyunthree scenarios, In some TCP status, the linux TCP stack would also 841*4882a593Smuzhiyunsend challenge ACKs if the ACK number is before the first 842*4882a593Smuzhiyununacknowledged number (more strict than `RFC 5961 section 5.2`_). 843*4882a593Smuzhiyun 844*4882a593Smuzhiyun.. _RFC 5961 section 3.2: https://tools.ietf.org/html/rfc5961#page-7 845*4882a593Smuzhiyun.. _RFC 5961 section 4.2: https://tools.ietf.org/html/rfc5961#page-9 846*4882a593Smuzhiyun.. _RFC 5961 section 5.2: https://tools.ietf.org/html/rfc5961#page-11 847*4882a593Smuzhiyun 848*4882a593SmuzhiyunTCP receive window 849*4882a593Smuzhiyun================== 850*4882a593Smuzhiyun* TcpExtTCPWantZeroWindowAdv 851*4882a593Smuzhiyun 852*4882a593SmuzhiyunDepending on current memory usage, the TCP stack tries to set receive 853*4882a593Smuzhiyunwindow to zero. But the receive window might still be a no-zero 854*4882a593Smuzhiyunvalue. For example, if the previous window size is 10, and the TCP 855*4882a593Smuzhiyunstack receives 3 bytes, the current window size would be 7 even if the 856*4882a593Smuzhiyunwindow size calculated by the memory usage is zero. 857*4882a593Smuzhiyun 858*4882a593Smuzhiyun* TcpExtTCPToZeroWindowAdv 859*4882a593Smuzhiyun 860*4882a593SmuzhiyunThe TCP receive window is set to zero from a no-zero value. 861*4882a593Smuzhiyun 862*4882a593Smuzhiyun* TcpExtTCPFromZeroWindowAdv 863*4882a593Smuzhiyun 864*4882a593SmuzhiyunThe TCP receive window is set to no-zero value from zero. 865*4882a593Smuzhiyun 866*4882a593Smuzhiyun 867*4882a593SmuzhiyunDelayed ACK 868*4882a593Smuzhiyun=========== 869*4882a593SmuzhiyunThe TCP Delayed ACK is a technique which is used for reducing the 870*4882a593Smuzhiyunpacket count in the network. For more details, please refer the 871*4882a593Smuzhiyun`Delayed ACK wiki`_ 872*4882a593Smuzhiyun 873*4882a593Smuzhiyun.. _Delayed ACK wiki: https://en.wikipedia.org/wiki/TCP_delayed_acknowledgment 874*4882a593Smuzhiyun 875*4882a593Smuzhiyun* TcpExtDelayedACKs 876*4882a593Smuzhiyun 877*4882a593SmuzhiyunA delayed ACK timer expires. The TCP stack will send a pure ACK packet 878*4882a593Smuzhiyunand exit the delayed ACK mode. 879*4882a593Smuzhiyun 880*4882a593Smuzhiyun* TcpExtDelayedACKLocked 881*4882a593Smuzhiyun 882*4882a593SmuzhiyunA delayed ACK timer expires, but the TCP stack can't send an ACK 883*4882a593Smuzhiyunimmediately due to the socket is locked by a userspace program. The 884*4882a593SmuzhiyunTCP stack will send a pure ACK later (after the userspace program 885*4882a593Smuzhiyununlock the socket). When the TCP stack sends the pure ACK later, the 886*4882a593SmuzhiyunTCP stack will also update TcpExtDelayedACKs and exit the delayed ACK 887*4882a593Smuzhiyunmode. 888*4882a593Smuzhiyun 889*4882a593Smuzhiyun* TcpExtDelayedACKLost 890*4882a593Smuzhiyun 891*4882a593SmuzhiyunIt will be updated when the TCP stack receives a packet which has been 892*4882a593SmuzhiyunACKed. A Delayed ACK loss might cause this issue, but it would also be 893*4882a593Smuzhiyuntriggered by other reasons, such as a packet is duplicated in the 894*4882a593Smuzhiyunnetwork. 895*4882a593Smuzhiyun 896*4882a593SmuzhiyunTail Loss Probe (TLP) 897*4882a593Smuzhiyun===================== 898*4882a593SmuzhiyunTLP is an algorithm which is used to detect TCP packet loss. For more 899*4882a593Smuzhiyundetails, please refer the `TLP paper`_. 900*4882a593Smuzhiyun 901*4882a593Smuzhiyun.. _TLP paper: https://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01 902*4882a593Smuzhiyun 903*4882a593Smuzhiyun* TcpExtTCPLossProbes 904*4882a593Smuzhiyun 905*4882a593SmuzhiyunA TLP probe packet is sent. 906*4882a593Smuzhiyun 907*4882a593Smuzhiyun* TcpExtTCPLossProbeRecovery 908*4882a593Smuzhiyun 909*4882a593SmuzhiyunA packet loss is detected and recovered by TLP. 910*4882a593Smuzhiyun 911*4882a593SmuzhiyunTCP Fast Open description 912*4882a593Smuzhiyun========================= 913*4882a593SmuzhiyunTCP Fast Open is a technology which allows data transfer before the 914*4882a593Smuzhiyun3-way handshake complete. Please refer the `TCP Fast Open wiki`_ for a 915*4882a593Smuzhiyungeneral description. 916*4882a593Smuzhiyun 917*4882a593Smuzhiyun.. _TCP Fast Open wiki: https://en.wikipedia.org/wiki/TCP_Fast_Open 918*4882a593Smuzhiyun 919*4882a593Smuzhiyun* TcpExtTCPFastOpenActive 920*4882a593Smuzhiyun 921*4882a593SmuzhiyunWhen the TCP stack receives an ACK packet in the SYN-SENT status, and 922*4882a593Smuzhiyunthe ACK packet acknowledges the data in the SYN packet, the TCP stack 923*4882a593Smuzhiyununderstand the TFO cookie is accepted by the other side, then it 924*4882a593Smuzhiyunupdates this counter. 925*4882a593Smuzhiyun 926*4882a593Smuzhiyun* TcpExtTCPFastOpenActiveFail 927*4882a593Smuzhiyun 928*4882a593SmuzhiyunThis counter indicates that the TCP stack initiated a TCP Fast Open, 929*4882a593Smuzhiyunbut it failed. This counter would be updated in three scenarios: (1) 930*4882a593Smuzhiyunthe other side doesn't acknowledge the data in the SYN packet. (2) The 931*4882a593SmuzhiyunSYN packet which has the TFO cookie is timeout at least once. (3) 932*4882a593Smuzhiyunafter the 3-way handshake, the retransmission timeout happens 933*4882a593Smuzhiyunnet.ipv4.tcp_retries1 times, because some middle-boxes may black-hole 934*4882a593Smuzhiyunfast open after the handshake. 935*4882a593Smuzhiyun 936*4882a593Smuzhiyun* TcpExtTCPFastOpenPassive 937*4882a593Smuzhiyun 938*4882a593SmuzhiyunThis counter indicates how many times the TCP stack accepts the fast 939*4882a593Smuzhiyunopen request. 940*4882a593Smuzhiyun 941*4882a593Smuzhiyun* TcpExtTCPFastOpenPassiveFail 942*4882a593Smuzhiyun 943*4882a593SmuzhiyunThis counter indicates how many times the TCP stack rejects the fast 944*4882a593Smuzhiyunopen request. It is caused by either the TFO cookie is invalid or the 945*4882a593SmuzhiyunTCP stack finds an error during the socket creating process. 946*4882a593Smuzhiyun 947*4882a593Smuzhiyun* TcpExtTCPFastOpenListenOverflow 948*4882a593Smuzhiyun 949*4882a593SmuzhiyunWhen the pending fast open request number is larger than 950*4882a593Smuzhiyunfastopenq->max_qlen, the TCP stack will reject the fast open request 951*4882a593Smuzhiyunand update this counter. When this counter is updated, the TCP stack 952*4882a593Smuzhiyunwon't update TcpExtTCPFastOpenPassive or 953*4882a593SmuzhiyunTcpExtTCPFastOpenPassiveFail. The fastopenq->max_qlen is set by the 954*4882a593SmuzhiyunTCP_FASTOPEN socket operation and it could not be larger than 955*4882a593Smuzhiyunnet.core.somaxconn. For example: 956*4882a593Smuzhiyun 957*4882a593Smuzhiyunsetsockopt(sfd, SOL_TCP, TCP_FASTOPEN, &qlen, sizeof(qlen)); 958*4882a593Smuzhiyun 959*4882a593Smuzhiyun* TcpExtTCPFastOpenCookieReqd 960*4882a593Smuzhiyun 961*4882a593SmuzhiyunThis counter indicates how many times a client wants to request a TFO 962*4882a593Smuzhiyuncookie. 963*4882a593Smuzhiyun 964*4882a593SmuzhiyunSYN cookies 965*4882a593Smuzhiyun=========== 966*4882a593SmuzhiyunSYN cookies are used to mitigate SYN flood, for details, please refer 967*4882a593Smuzhiyunthe `SYN cookies wiki`_. 968*4882a593Smuzhiyun 969*4882a593Smuzhiyun.. _SYN cookies wiki: https://en.wikipedia.org/wiki/SYN_cookies 970*4882a593Smuzhiyun 971*4882a593Smuzhiyun* TcpExtSyncookiesSent 972*4882a593Smuzhiyun 973*4882a593SmuzhiyunIt indicates how many SYN cookies are sent. 974*4882a593Smuzhiyun 975*4882a593Smuzhiyun* TcpExtSyncookiesRecv 976*4882a593Smuzhiyun 977*4882a593SmuzhiyunHow many reply packets of the SYN cookies the TCP stack receives. 978*4882a593Smuzhiyun 979*4882a593Smuzhiyun* TcpExtSyncookiesFailed 980*4882a593Smuzhiyun 981*4882a593SmuzhiyunThe MSS decoded from the SYN cookie is invalid. When this counter is 982*4882a593Smuzhiyunupdated, the received packet won't be treated as a SYN cookie and the 983*4882a593SmuzhiyunTcpExtSyncookiesRecv counter wont be updated. 984*4882a593Smuzhiyun 985*4882a593SmuzhiyunChallenge ACK 986*4882a593Smuzhiyun============= 987*4882a593SmuzhiyunFor details of challenge ACK, please refer the explaination of 988*4882a593SmuzhiyunTcpExtTCPACKSkippedChallenge. 989*4882a593Smuzhiyun 990*4882a593Smuzhiyun* TcpExtTCPChallengeACK 991*4882a593Smuzhiyun 992*4882a593SmuzhiyunThe number of challenge acks sent. 993*4882a593Smuzhiyun 994*4882a593Smuzhiyun* TcpExtTCPSYNChallenge 995*4882a593Smuzhiyun 996*4882a593SmuzhiyunThe number of challenge acks sent in response to SYN packets. After 997*4882a593Smuzhiyunupdates this counter, the TCP stack might send a challenge ACK and 998*4882a593Smuzhiyunupdate the TcpExtTCPChallengeACK counter, or it might also skip to 999*4882a593Smuzhiyunsend the challenge and update the TcpExtTCPACKSkippedChallenge. 1000*4882a593Smuzhiyun 1001*4882a593Smuzhiyunprune 1002*4882a593Smuzhiyun===== 1003*4882a593SmuzhiyunWhen a socket is under memory pressure, the TCP stack will try to 1004*4882a593Smuzhiyunreclaim memory from the receiving queue and out of order queue. One of 1005*4882a593Smuzhiyunthe reclaiming method is 'collapse', which means allocate a big sbk, 1006*4882a593Smuzhiyuncopy the contiguous skbs to the single big skb, and free these 1007*4882a593Smuzhiyuncontiguous skbs. 1008*4882a593Smuzhiyun 1009*4882a593Smuzhiyun* TcpExtPruneCalled 1010*4882a593Smuzhiyun 1011*4882a593SmuzhiyunThe TCP stack tries to reclaim memory for a socket. After updates this 1012*4882a593Smuzhiyuncounter, the TCP stack will try to collapse the out of order queue and 1013*4882a593Smuzhiyunthe receiving queue. If the memory is still not enough, the TCP stack 1014*4882a593Smuzhiyunwill try to discard packets from the out of order queue (and update the 1015*4882a593SmuzhiyunTcpExtOfoPruned counter) 1016*4882a593Smuzhiyun 1017*4882a593Smuzhiyun* TcpExtOfoPruned 1018*4882a593Smuzhiyun 1019*4882a593SmuzhiyunThe TCP stack tries to discard packet on the out of order queue. 1020*4882a593Smuzhiyun 1021*4882a593Smuzhiyun* TcpExtRcvPruned 1022*4882a593Smuzhiyun 1023*4882a593SmuzhiyunAfter 'collapse' and discard packets from the out of order queue, if 1024*4882a593Smuzhiyunthe actually used memory is still larger than the max allowed memory, 1025*4882a593Smuzhiyunthis counter will be updated. It means the 'prune' fails. 1026*4882a593Smuzhiyun 1027*4882a593Smuzhiyun* TcpExtTCPRcvCollapsed 1028*4882a593Smuzhiyun 1029*4882a593SmuzhiyunThis counter indicates how many skbs are freed during 'collapse'. 1030*4882a593Smuzhiyun 1031*4882a593Smuzhiyunexamples 1032*4882a593Smuzhiyun======== 1033*4882a593Smuzhiyun 1034*4882a593Smuzhiyunping test 1035*4882a593Smuzhiyun--------- 1036*4882a593SmuzhiyunRun the ping command against the public dns server 8.8.8.8:: 1037*4882a593Smuzhiyun 1038*4882a593Smuzhiyun nstatuser@nstat-a:~$ ping 8.8.8.8 -c 1 1039*4882a593Smuzhiyun PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data. 1040*4882a593Smuzhiyun 64 bytes from 8.8.8.8: icmp_seq=1 ttl=119 time=17.8 ms 1041*4882a593Smuzhiyun 1042*4882a593Smuzhiyun --- 8.8.8.8 ping statistics --- 1043*4882a593Smuzhiyun 1 packets transmitted, 1 received, 0% packet loss, time 0ms 1044*4882a593Smuzhiyun rtt min/avg/max/mdev = 17.875/17.875/17.875/0.000 ms 1045*4882a593Smuzhiyun 1046*4882a593SmuzhiyunThe nstayt result:: 1047*4882a593Smuzhiyun 1048*4882a593Smuzhiyun nstatuser@nstat-a:~$ nstat 1049*4882a593Smuzhiyun #kernel 1050*4882a593Smuzhiyun IpInReceives 1 0.0 1051*4882a593Smuzhiyun IpInDelivers 1 0.0 1052*4882a593Smuzhiyun IpOutRequests 1 0.0 1053*4882a593Smuzhiyun IcmpInMsgs 1 0.0 1054*4882a593Smuzhiyun IcmpInEchoReps 1 0.0 1055*4882a593Smuzhiyun IcmpOutMsgs 1 0.0 1056*4882a593Smuzhiyun IcmpOutEchos 1 0.0 1057*4882a593Smuzhiyun IcmpMsgInType0 1 0.0 1058*4882a593Smuzhiyun IcmpMsgOutType8 1 0.0 1059*4882a593Smuzhiyun IpExtInOctets 84 0.0 1060*4882a593Smuzhiyun IpExtOutOctets 84 0.0 1061*4882a593Smuzhiyun IpExtInNoECTPkts 1 0.0 1062*4882a593Smuzhiyun 1063*4882a593SmuzhiyunThe Linux server sent an ICMP Echo packet, so IpOutRequests, 1064*4882a593SmuzhiyunIcmpOutMsgs, IcmpOutEchos and IcmpMsgOutType8 were increased 1. The 1065*4882a593Smuzhiyunserver got ICMP Echo Reply from 8.8.8.8, so IpInReceives, IcmpInMsgs, 1066*4882a593SmuzhiyunIcmpInEchoReps and IcmpMsgInType0 were increased 1. The ICMP Echo Reply 1067*4882a593Smuzhiyunwas passed to the ICMP layer via IP layer, so IpInDelivers was 1068*4882a593Smuzhiyunincreased 1. The default ping data size is 48, so an ICMP Echo packet 1069*4882a593Smuzhiyunand its corresponding Echo Reply packet are constructed by: 1070*4882a593Smuzhiyun 1071*4882a593Smuzhiyun* 14 bytes MAC header 1072*4882a593Smuzhiyun* 20 bytes IP header 1073*4882a593Smuzhiyun* 16 bytes ICMP header 1074*4882a593Smuzhiyun* 48 bytes data (default value of the ping command) 1075*4882a593Smuzhiyun 1076*4882a593SmuzhiyunSo the IpExtInOctets and IpExtOutOctets are 20+16+48=84. 1077*4882a593Smuzhiyun 1078*4882a593Smuzhiyuntcp 3-way handshake 1079*4882a593Smuzhiyun------------------- 1080*4882a593SmuzhiyunOn server side, we run:: 1081*4882a593Smuzhiyun 1082*4882a593Smuzhiyun nstatuser@nstat-b:~$ nc -lknv 0.0.0.0 9000 1083*4882a593Smuzhiyun Listening on [0.0.0.0] (family 0, port 9000) 1084*4882a593Smuzhiyun 1085*4882a593SmuzhiyunOn client side, we run:: 1086*4882a593Smuzhiyun 1087*4882a593Smuzhiyun nstatuser@nstat-a:~$ nc -nv 192.168.122.251 9000 1088*4882a593Smuzhiyun Connection to 192.168.122.251 9000 port [tcp/*] succeeded! 1089*4882a593Smuzhiyun 1090*4882a593SmuzhiyunThe server listened on tcp 9000 port, the client connected to it, they 1091*4882a593Smuzhiyuncompleted the 3-way handshake. 1092*4882a593Smuzhiyun 1093*4882a593SmuzhiyunOn server side, we can find below nstat output:: 1094*4882a593Smuzhiyun 1095*4882a593Smuzhiyun nstatuser@nstat-b:~$ nstat | grep -i tcp 1096*4882a593Smuzhiyun TcpPassiveOpens 1 0.0 1097*4882a593Smuzhiyun TcpInSegs 2 0.0 1098*4882a593Smuzhiyun TcpOutSegs 1 0.0 1099*4882a593Smuzhiyun TcpExtTCPPureAcks 1 0.0 1100*4882a593Smuzhiyun 1101*4882a593SmuzhiyunOn client side, we can find below nstat output:: 1102*4882a593Smuzhiyun 1103*4882a593Smuzhiyun nstatuser@nstat-a:~$ nstat | grep -i tcp 1104*4882a593Smuzhiyun TcpActiveOpens 1 0.0 1105*4882a593Smuzhiyun TcpInSegs 1 0.0 1106*4882a593Smuzhiyun TcpOutSegs 2 0.0 1107*4882a593Smuzhiyun 1108*4882a593SmuzhiyunWhen the server received the first SYN, it replied a SYN+ACK, and came into 1109*4882a593SmuzhiyunSYN-RCVD state, so TcpPassiveOpens increased 1. The server received 1110*4882a593SmuzhiyunSYN, sent SYN+ACK, received ACK, so server sent 1 packet, received 2 1111*4882a593Smuzhiyunpackets, TcpInSegs increased 2, TcpOutSegs increased 1. The last ACK 1112*4882a593Smuzhiyunof the 3-way handshake is a pure ACK without data, so 1113*4882a593SmuzhiyunTcpExtTCPPureAcks increased 1. 1114*4882a593Smuzhiyun 1115*4882a593SmuzhiyunWhen the client sent SYN, the client came into the SYN-SENT state, so 1116*4882a593SmuzhiyunTcpActiveOpens increased 1, the client sent SYN, received SYN+ACK, sent 1117*4882a593SmuzhiyunACK, so client sent 2 packets, received 1 packet, TcpInSegs increased 1118*4882a593Smuzhiyun1, TcpOutSegs increased 2. 1119*4882a593Smuzhiyun 1120*4882a593SmuzhiyunTCP normal traffic 1121*4882a593Smuzhiyun------------------ 1122*4882a593SmuzhiyunRun nc on server:: 1123*4882a593Smuzhiyun 1124*4882a593Smuzhiyun nstatuser@nstat-b:~$ nc -lkv 0.0.0.0 9000 1125*4882a593Smuzhiyun Listening on [0.0.0.0] (family 0, port 9000) 1126*4882a593Smuzhiyun 1127*4882a593SmuzhiyunRun nc on client:: 1128*4882a593Smuzhiyun 1129*4882a593Smuzhiyun nstatuser@nstat-a:~$ nc -v nstat-b 9000 1130*4882a593Smuzhiyun Connection to nstat-b 9000 port [tcp/*] succeeded! 1131*4882a593Smuzhiyun 1132*4882a593SmuzhiyunInput a string in the nc client ('hello' in our example):: 1133*4882a593Smuzhiyun 1134*4882a593Smuzhiyun nstatuser@nstat-a:~$ nc -v nstat-b 9000 1135*4882a593Smuzhiyun Connection to nstat-b 9000 port [tcp/*] succeeded! 1136*4882a593Smuzhiyun hello 1137*4882a593Smuzhiyun 1138*4882a593SmuzhiyunThe client side nstat output:: 1139*4882a593Smuzhiyun 1140*4882a593Smuzhiyun nstatuser@nstat-a:~$ nstat 1141*4882a593Smuzhiyun #kernel 1142*4882a593Smuzhiyun IpInReceives 1 0.0 1143*4882a593Smuzhiyun IpInDelivers 1 0.0 1144*4882a593Smuzhiyun IpOutRequests 1 0.0 1145*4882a593Smuzhiyun TcpInSegs 1 0.0 1146*4882a593Smuzhiyun TcpOutSegs 1 0.0 1147*4882a593Smuzhiyun TcpExtTCPPureAcks 1 0.0 1148*4882a593Smuzhiyun TcpExtTCPOrigDataSent 1 0.0 1149*4882a593Smuzhiyun IpExtInOctets 52 0.0 1150*4882a593Smuzhiyun IpExtOutOctets 58 0.0 1151*4882a593Smuzhiyun IpExtInNoECTPkts 1 0.0 1152*4882a593Smuzhiyun 1153*4882a593SmuzhiyunThe server side nstat output:: 1154*4882a593Smuzhiyun 1155*4882a593Smuzhiyun nstatuser@nstat-b:~$ nstat 1156*4882a593Smuzhiyun #kernel 1157*4882a593Smuzhiyun IpInReceives 1 0.0 1158*4882a593Smuzhiyun IpInDelivers 1 0.0 1159*4882a593Smuzhiyun IpOutRequests 1 0.0 1160*4882a593Smuzhiyun TcpInSegs 1 0.0 1161*4882a593Smuzhiyun TcpOutSegs 1 0.0 1162*4882a593Smuzhiyun IpExtInOctets 58 0.0 1163*4882a593Smuzhiyun IpExtOutOctets 52 0.0 1164*4882a593Smuzhiyun IpExtInNoECTPkts 1 0.0 1165*4882a593Smuzhiyun 1166*4882a593SmuzhiyunInput a string in nc client side again ('world' in our exmaple):: 1167*4882a593Smuzhiyun 1168*4882a593Smuzhiyun nstatuser@nstat-a:~$ nc -v nstat-b 9000 1169*4882a593Smuzhiyun Connection to nstat-b 9000 port [tcp/*] succeeded! 1170*4882a593Smuzhiyun hello 1171*4882a593Smuzhiyun world 1172*4882a593Smuzhiyun 1173*4882a593SmuzhiyunClient side nstat output:: 1174*4882a593Smuzhiyun 1175*4882a593Smuzhiyun nstatuser@nstat-a:~$ nstat 1176*4882a593Smuzhiyun #kernel 1177*4882a593Smuzhiyun IpInReceives 1 0.0 1178*4882a593Smuzhiyun IpInDelivers 1 0.0 1179*4882a593Smuzhiyun IpOutRequests 1 0.0 1180*4882a593Smuzhiyun TcpInSegs 1 0.0 1181*4882a593Smuzhiyun TcpOutSegs 1 0.0 1182*4882a593Smuzhiyun TcpExtTCPHPAcks 1 0.0 1183*4882a593Smuzhiyun TcpExtTCPOrigDataSent 1 0.0 1184*4882a593Smuzhiyun IpExtInOctets 52 0.0 1185*4882a593Smuzhiyun IpExtOutOctets 58 0.0 1186*4882a593Smuzhiyun IpExtInNoECTPkts 1 0.0 1187*4882a593Smuzhiyun 1188*4882a593Smuzhiyun 1189*4882a593SmuzhiyunServer side nstat output:: 1190*4882a593Smuzhiyun 1191*4882a593Smuzhiyun nstatuser@nstat-b:~$ nstat 1192*4882a593Smuzhiyun #kernel 1193*4882a593Smuzhiyun IpInReceives 1 0.0 1194*4882a593Smuzhiyun IpInDelivers 1 0.0 1195*4882a593Smuzhiyun IpOutRequests 1 0.0 1196*4882a593Smuzhiyun TcpInSegs 1 0.0 1197*4882a593Smuzhiyun TcpOutSegs 1 0.0 1198*4882a593Smuzhiyun TcpExtTCPHPHits 1 0.0 1199*4882a593Smuzhiyun IpExtInOctets 58 0.0 1200*4882a593Smuzhiyun IpExtOutOctets 52 0.0 1201*4882a593Smuzhiyun IpExtInNoECTPkts 1 0.0 1202*4882a593Smuzhiyun 1203*4882a593SmuzhiyunCompare the first client-side nstat and the second client-side nstat, 1204*4882a593Smuzhiyunwe could find one difference: the first one had a 'TcpExtTCPPureAcks', 1205*4882a593Smuzhiyunbut the second one had a 'TcpExtTCPHPAcks'. The first server-side 1206*4882a593Smuzhiyunnstat and the second server-side nstat had a difference too: the 1207*4882a593Smuzhiyunsecond server-side nstat had a TcpExtTCPHPHits, but the first 1208*4882a593Smuzhiyunserver-side nstat didn't have it. The network traffic patterns were 1209*4882a593Smuzhiyunexactly the same: the client sent a packet to the server, the server 1210*4882a593Smuzhiyunreplied an ACK. But kernel handled them in different ways. When the 1211*4882a593SmuzhiyunTCP window scale option is not used, kernel will try to enable fast 1212*4882a593Smuzhiyunpath immediately when the connection comes into the established state, 1213*4882a593Smuzhiyunbut if the TCP window scale option is used, kernel will disable the 1214*4882a593Smuzhiyunfast path at first, and try to enable it after kerenl receives 1215*4882a593Smuzhiyunpackets. We could use the 'ss' command to verify whether the window 1216*4882a593Smuzhiyunscale option is used. e.g. run below command on either server or 1217*4882a593Smuzhiyunclient:: 1218*4882a593Smuzhiyun 1219*4882a593Smuzhiyun nstatuser@nstat-a:~$ ss -o state established -i '( dport = :9000 or sport = :9000 ) 1220*4882a593Smuzhiyun Netid Recv-Q Send-Q Local Address:Port Peer Address:Port 1221*4882a593Smuzhiyun tcp 0 0 192.168.122.250:40654 192.168.122.251:9000 1222*4882a593Smuzhiyun ts sack cubic wscale:7,7 rto:204 rtt:0.98/0.49 mss:1448 pmtu:1500 rcvmss:536 advmss:1448 cwnd:10 bytes_acked:1 segs_out:2 segs_in:1 send 118.2Mbps lastsnd:46572 lastrcv:46572 lastack:46572 pacing_rate 236.4Mbps rcv_space:29200 rcv_ssthresh:29200 minrtt:0.98 1223*4882a593Smuzhiyun 1224*4882a593SmuzhiyunThe 'wscale:7,7' means both server and client set the window scale 1225*4882a593Smuzhiyunoption to 7. Now we could explain the nstat output in our test: 1226*4882a593Smuzhiyun 1227*4882a593SmuzhiyunIn the first nstat output of client side, the client sent a packet, server 1228*4882a593Smuzhiyunreply an ACK, when kernel handled this ACK, the fast path was not 1229*4882a593Smuzhiyunenabled, so the ACK was counted into 'TcpExtTCPPureAcks'. 1230*4882a593Smuzhiyun 1231*4882a593SmuzhiyunIn the second nstat output of client side, the client sent a packet again, 1232*4882a593Smuzhiyunand received another ACK from the server, in this time, the fast path is 1233*4882a593Smuzhiyunenabled, and the ACK was qualified for fast path, so it was handled by 1234*4882a593Smuzhiyunthe fast path, so this ACK was counted into TcpExtTCPHPAcks. 1235*4882a593Smuzhiyun 1236*4882a593SmuzhiyunIn the first nstat output of server side, fast path was not enabled, 1237*4882a593Smuzhiyunso there was no 'TcpExtTCPHPHits'. 1238*4882a593Smuzhiyun 1239*4882a593SmuzhiyunIn the second nstat output of server side, the fast path was enabled, 1240*4882a593Smuzhiyunand the packet received from client qualified for fast path, so it 1241*4882a593Smuzhiyunwas counted into 'TcpExtTCPHPHits'. 1242*4882a593Smuzhiyun 1243*4882a593SmuzhiyunTcpExtTCPAbortOnClose 1244*4882a593Smuzhiyun--------------------- 1245*4882a593SmuzhiyunOn the server side, we run below python script:: 1246*4882a593Smuzhiyun 1247*4882a593Smuzhiyun import socket 1248*4882a593Smuzhiyun import time 1249*4882a593Smuzhiyun 1250*4882a593Smuzhiyun port = 9000 1251*4882a593Smuzhiyun 1252*4882a593Smuzhiyun s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 1253*4882a593Smuzhiyun s.bind(('0.0.0.0', port)) 1254*4882a593Smuzhiyun s.listen(1) 1255*4882a593Smuzhiyun sock, addr = s.accept() 1256*4882a593Smuzhiyun while True: 1257*4882a593Smuzhiyun time.sleep(9999999) 1258*4882a593Smuzhiyun 1259*4882a593SmuzhiyunThis python script listen on 9000 port, but doesn't read anything from 1260*4882a593Smuzhiyunthe connection. 1261*4882a593Smuzhiyun 1262*4882a593SmuzhiyunOn the client side, we send the string "hello" by nc:: 1263*4882a593Smuzhiyun 1264*4882a593Smuzhiyun nstatuser@nstat-a:~$ echo "hello" | nc nstat-b 9000 1265*4882a593Smuzhiyun 1266*4882a593SmuzhiyunThen, we come back to the server side, the server has received the "hello" 1267*4882a593Smuzhiyunpacket, and the TCP layer has acked this packet, but the application didn't 1268*4882a593Smuzhiyunread it yet. We type Ctrl-C to terminate the server script. Then we 1269*4882a593Smuzhiyuncould find TcpExtTCPAbortOnClose increased 1 on the server side:: 1270*4882a593Smuzhiyun 1271*4882a593Smuzhiyun nstatuser@nstat-b:~$ nstat | grep -i abort 1272*4882a593Smuzhiyun TcpExtTCPAbortOnClose 1 0.0 1273*4882a593Smuzhiyun 1274*4882a593SmuzhiyunIf we run tcpdump on the server side, we could find the server sent a 1275*4882a593SmuzhiyunRST after we type Ctrl-C. 1276*4882a593Smuzhiyun 1277*4882a593SmuzhiyunTcpExtTCPAbortOnMemory and TcpExtTCPAbortOnTimeout 1278*4882a593Smuzhiyun--------------------------------------------------- 1279*4882a593SmuzhiyunBelow is an example which let the orphan socket count be higher than 1280*4882a593Smuzhiyunnet.ipv4.tcp_max_orphans. 1281*4882a593SmuzhiyunChange tcp_max_orphans to a smaller value on client:: 1282*4882a593Smuzhiyun 1283*4882a593Smuzhiyun sudo bash -c "echo 10 > /proc/sys/net/ipv4/tcp_max_orphans" 1284*4882a593Smuzhiyun 1285*4882a593SmuzhiyunClient code (create 64 connection to server):: 1286*4882a593Smuzhiyun 1287*4882a593Smuzhiyun nstatuser@nstat-a:~$ cat client_orphan.py 1288*4882a593Smuzhiyun import socket 1289*4882a593Smuzhiyun import time 1290*4882a593Smuzhiyun 1291*4882a593Smuzhiyun server = 'nstat-b' # server address 1292*4882a593Smuzhiyun port = 9000 1293*4882a593Smuzhiyun 1294*4882a593Smuzhiyun count = 64 1295*4882a593Smuzhiyun 1296*4882a593Smuzhiyun connection_list = [] 1297*4882a593Smuzhiyun 1298*4882a593Smuzhiyun for i in range(64): 1299*4882a593Smuzhiyun s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 1300*4882a593Smuzhiyun s.connect((server, port)) 1301*4882a593Smuzhiyun connection_list.append(s) 1302*4882a593Smuzhiyun print("connection_count: %d" % len(connection_list)) 1303*4882a593Smuzhiyun 1304*4882a593Smuzhiyun while True: 1305*4882a593Smuzhiyun time.sleep(99999) 1306*4882a593Smuzhiyun 1307*4882a593SmuzhiyunServer code (accept 64 connection from client):: 1308*4882a593Smuzhiyun 1309*4882a593Smuzhiyun nstatuser@nstat-b:~$ cat server_orphan.py 1310*4882a593Smuzhiyun import socket 1311*4882a593Smuzhiyun import time 1312*4882a593Smuzhiyun 1313*4882a593Smuzhiyun port = 9000 1314*4882a593Smuzhiyun count = 64 1315*4882a593Smuzhiyun 1316*4882a593Smuzhiyun s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 1317*4882a593Smuzhiyun s.bind(('0.0.0.0', port)) 1318*4882a593Smuzhiyun s.listen(count) 1319*4882a593Smuzhiyun connection_list = [] 1320*4882a593Smuzhiyun while True: 1321*4882a593Smuzhiyun sock, addr = s.accept() 1322*4882a593Smuzhiyun connection_list.append((sock, addr)) 1323*4882a593Smuzhiyun print("connection_count: %d" % len(connection_list)) 1324*4882a593Smuzhiyun 1325*4882a593SmuzhiyunRun the python scripts on server and client. 1326*4882a593Smuzhiyun 1327*4882a593SmuzhiyunOn server:: 1328*4882a593Smuzhiyun 1329*4882a593Smuzhiyun python3 server_orphan.py 1330*4882a593Smuzhiyun 1331*4882a593SmuzhiyunOn client:: 1332*4882a593Smuzhiyun 1333*4882a593Smuzhiyun python3 client_orphan.py 1334*4882a593Smuzhiyun 1335*4882a593SmuzhiyunRun iptables on server:: 1336*4882a593Smuzhiyun 1337*4882a593Smuzhiyun sudo iptables -A INPUT -i ens3 -p tcp --destination-port 9000 -j DROP 1338*4882a593Smuzhiyun 1339*4882a593SmuzhiyunType Ctrl-C on client, stop client_orphan.py. 1340*4882a593Smuzhiyun 1341*4882a593SmuzhiyunCheck TcpExtTCPAbortOnMemory on client:: 1342*4882a593Smuzhiyun 1343*4882a593Smuzhiyun nstatuser@nstat-a:~$ nstat | grep -i abort 1344*4882a593Smuzhiyun TcpExtTCPAbortOnMemory 54 0.0 1345*4882a593Smuzhiyun 1346*4882a593SmuzhiyunCheck orphane socket count on client:: 1347*4882a593Smuzhiyun 1348*4882a593Smuzhiyun nstatuser@nstat-a:~$ ss -s 1349*4882a593Smuzhiyun Total: 131 (kernel 0) 1350*4882a593Smuzhiyun TCP: 14 (estab 1, closed 0, orphaned 10, synrecv 0, timewait 0/0), ports 0 1351*4882a593Smuzhiyun 1352*4882a593Smuzhiyun Transport Total IP IPv6 1353*4882a593Smuzhiyun * 0 - - 1354*4882a593Smuzhiyun RAW 1 0 1 1355*4882a593Smuzhiyun UDP 1 1 0 1356*4882a593Smuzhiyun TCP 14 13 1 1357*4882a593Smuzhiyun INET 16 14 2 1358*4882a593Smuzhiyun FRAG 0 0 0 1359*4882a593Smuzhiyun 1360*4882a593SmuzhiyunThe explanation of the test: after run server_orphan.py and 1361*4882a593Smuzhiyunclient_orphan.py, we set up 64 connections between server and 1362*4882a593Smuzhiyunclient. Run the iptables command, the server will drop all packets from 1363*4882a593Smuzhiyunthe client, type Ctrl-C on client_orphan.py, the system of the client 1364*4882a593Smuzhiyunwould try to close these connections, and before they are closed 1365*4882a593Smuzhiyungracefully, these connections became orphan sockets. As the iptables 1366*4882a593Smuzhiyunof the server blocked packets from the client, the server won't receive fin 1367*4882a593Smuzhiyunfrom the client, so all connection on clients would be stuck on FIN_WAIT_1 1368*4882a593Smuzhiyunstage, so they will keep as orphan sockets until timeout. We have echo 1369*4882a593Smuzhiyun10 to /proc/sys/net/ipv4/tcp_max_orphans, so the client system would 1370*4882a593Smuzhiyunonly keep 10 orphan sockets, for all other orphan sockets, the client 1371*4882a593Smuzhiyunsystem sent RST for them and delete them. We have 64 connections, so 1372*4882a593Smuzhiyunthe 'ss -s' command shows the system has 10 orphan sockets, and the 1373*4882a593Smuzhiyunvalue of TcpExtTCPAbortOnMemory was 54. 1374*4882a593Smuzhiyun 1375*4882a593SmuzhiyunAn additional explanation about orphan socket count: You could find the 1376*4882a593Smuzhiyunexactly orphan socket count by the 'ss -s' command, but when kernel 1377*4882a593Smuzhiyundecide whither increases TcpExtTCPAbortOnMemory and sends RST, kernel 1378*4882a593Smuzhiyundoesn't always check the exactly orphan socket count. For increasing 1379*4882a593Smuzhiyunperformance, kernel checks an approximate count firstly, if the 1380*4882a593Smuzhiyunapproximate count is more than tcp_max_orphans, kernel checks the 1381*4882a593Smuzhiyunexact count again. So if the approximate count is less than 1382*4882a593Smuzhiyuntcp_max_orphans, but exactly count is more than tcp_max_orphans, you 1383*4882a593Smuzhiyunwould find TcpExtTCPAbortOnMemory is not increased at all. If 1384*4882a593Smuzhiyuntcp_max_orphans is large enough, it won't occur, but if you decrease 1385*4882a593Smuzhiyuntcp_max_orphans to a small value like our test, you might find this 1386*4882a593Smuzhiyunissue. So in our test, the client set up 64 connections although the 1387*4882a593Smuzhiyuntcp_max_orphans is 10. If the client only set up 11 connections, we 1388*4882a593Smuzhiyuncan't find the change of TcpExtTCPAbortOnMemory. 1389*4882a593Smuzhiyun 1390*4882a593SmuzhiyunContinue the previous test, we wait for several minutes. Because of the 1391*4882a593Smuzhiyuniptables on the server blocked the traffic, the server wouldn't receive 1392*4882a593Smuzhiyunfin, and all the client's orphan sockets would timeout on the 1393*4882a593SmuzhiyunFIN_WAIT_1 state finally. So we wait for a few minutes, we could find 1394*4882a593Smuzhiyun10 timeout on the client:: 1395*4882a593Smuzhiyun 1396*4882a593Smuzhiyun nstatuser@nstat-a:~$ nstat | grep -i abort 1397*4882a593Smuzhiyun TcpExtTCPAbortOnTimeout 10 0.0 1398*4882a593Smuzhiyun 1399*4882a593SmuzhiyunTcpExtTCPAbortOnLinger 1400*4882a593Smuzhiyun---------------------- 1401*4882a593SmuzhiyunThe server side code:: 1402*4882a593Smuzhiyun 1403*4882a593Smuzhiyun nstatuser@nstat-b:~$ cat server_linger.py 1404*4882a593Smuzhiyun import socket 1405*4882a593Smuzhiyun import time 1406*4882a593Smuzhiyun 1407*4882a593Smuzhiyun port = 9000 1408*4882a593Smuzhiyun 1409*4882a593Smuzhiyun s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 1410*4882a593Smuzhiyun s.bind(('0.0.0.0', port)) 1411*4882a593Smuzhiyun s.listen(1) 1412*4882a593Smuzhiyun sock, addr = s.accept() 1413*4882a593Smuzhiyun while True: 1414*4882a593Smuzhiyun time.sleep(9999999) 1415*4882a593Smuzhiyun 1416*4882a593SmuzhiyunThe client side code:: 1417*4882a593Smuzhiyun 1418*4882a593Smuzhiyun nstatuser@nstat-a:~$ cat client_linger.py 1419*4882a593Smuzhiyun import socket 1420*4882a593Smuzhiyun import struct 1421*4882a593Smuzhiyun 1422*4882a593Smuzhiyun server = 'nstat-b' # server address 1423*4882a593Smuzhiyun port = 9000 1424*4882a593Smuzhiyun 1425*4882a593Smuzhiyun s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 1426*4882a593Smuzhiyun s.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER, struct.pack('ii', 1, 10)) 1427*4882a593Smuzhiyun s.setsockopt(socket.SOL_TCP, socket.TCP_LINGER2, struct.pack('i', -1)) 1428*4882a593Smuzhiyun s.connect((server, port)) 1429*4882a593Smuzhiyun s.close() 1430*4882a593Smuzhiyun 1431*4882a593SmuzhiyunRun server_linger.py on server:: 1432*4882a593Smuzhiyun 1433*4882a593Smuzhiyun nstatuser@nstat-b:~$ python3 server_linger.py 1434*4882a593Smuzhiyun 1435*4882a593SmuzhiyunRun client_linger.py on client:: 1436*4882a593Smuzhiyun 1437*4882a593Smuzhiyun nstatuser@nstat-a:~$ python3 client_linger.py 1438*4882a593Smuzhiyun 1439*4882a593SmuzhiyunAfter run client_linger.py, check the output of nstat:: 1440*4882a593Smuzhiyun 1441*4882a593Smuzhiyun nstatuser@nstat-a:~$ nstat | grep -i abort 1442*4882a593Smuzhiyun TcpExtTCPAbortOnLinger 1 0.0 1443*4882a593Smuzhiyun 1444*4882a593SmuzhiyunTcpExtTCPRcvCoalesce 1445*4882a593Smuzhiyun-------------------- 1446*4882a593SmuzhiyunOn the server, we run a program which listen on TCP port 9000, but 1447*4882a593Smuzhiyundoesn't read any data:: 1448*4882a593Smuzhiyun 1449*4882a593Smuzhiyun import socket 1450*4882a593Smuzhiyun import time 1451*4882a593Smuzhiyun port = 9000 1452*4882a593Smuzhiyun s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 1453*4882a593Smuzhiyun s.bind(('0.0.0.0', port)) 1454*4882a593Smuzhiyun s.listen(1) 1455*4882a593Smuzhiyun sock, addr = s.accept() 1456*4882a593Smuzhiyun while True: 1457*4882a593Smuzhiyun time.sleep(9999999) 1458*4882a593Smuzhiyun 1459*4882a593SmuzhiyunSave the above code as server_coalesce.py, and run:: 1460*4882a593Smuzhiyun 1461*4882a593Smuzhiyun python3 server_coalesce.py 1462*4882a593Smuzhiyun 1463*4882a593SmuzhiyunOn the client, save below code as client_coalesce.py:: 1464*4882a593Smuzhiyun 1465*4882a593Smuzhiyun import socket 1466*4882a593Smuzhiyun server = 'nstat-b' 1467*4882a593Smuzhiyun port = 9000 1468*4882a593Smuzhiyun s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 1469*4882a593Smuzhiyun s.connect((server, port)) 1470*4882a593Smuzhiyun 1471*4882a593SmuzhiyunRun:: 1472*4882a593Smuzhiyun 1473*4882a593Smuzhiyun nstatuser@nstat-a:~$ python3 -i client_coalesce.py 1474*4882a593Smuzhiyun 1475*4882a593SmuzhiyunWe use '-i' to come into the interactive mode, then a packet:: 1476*4882a593Smuzhiyun 1477*4882a593Smuzhiyun >>> s.send(b'foo') 1478*4882a593Smuzhiyun 3 1479*4882a593Smuzhiyun 1480*4882a593SmuzhiyunSend a packet again:: 1481*4882a593Smuzhiyun 1482*4882a593Smuzhiyun >>> s.send(b'bar') 1483*4882a593Smuzhiyun 3 1484*4882a593Smuzhiyun 1485*4882a593SmuzhiyunOn the server, run nstat:: 1486*4882a593Smuzhiyun 1487*4882a593Smuzhiyun ubuntu@nstat-b:~$ nstat 1488*4882a593Smuzhiyun #kernel 1489*4882a593Smuzhiyun IpInReceives 2 0.0 1490*4882a593Smuzhiyun IpInDelivers 2 0.0 1491*4882a593Smuzhiyun IpOutRequests 2 0.0 1492*4882a593Smuzhiyun TcpInSegs 2 0.0 1493*4882a593Smuzhiyun TcpOutSegs 2 0.0 1494*4882a593Smuzhiyun TcpExtTCPRcvCoalesce 1 0.0 1495*4882a593Smuzhiyun IpExtInOctets 110 0.0 1496*4882a593Smuzhiyun IpExtOutOctets 104 0.0 1497*4882a593Smuzhiyun IpExtInNoECTPkts 2 0.0 1498*4882a593Smuzhiyun 1499*4882a593SmuzhiyunThe client sent two packets, server didn't read any data. When 1500*4882a593Smuzhiyunthe second packet arrived at server, the first packet was still in 1501*4882a593Smuzhiyunthe receiving queue. So the TCP layer merged the two packets, and we 1502*4882a593Smuzhiyuncould find the TcpExtTCPRcvCoalesce increased 1. 1503*4882a593Smuzhiyun 1504*4882a593SmuzhiyunTcpExtListenOverflows and TcpExtListenDrops 1505*4882a593Smuzhiyun------------------------------------------- 1506*4882a593SmuzhiyunOn server, run the nc command, listen on port 9000:: 1507*4882a593Smuzhiyun 1508*4882a593Smuzhiyun nstatuser@nstat-b:~$ nc -lkv 0.0.0.0 9000 1509*4882a593Smuzhiyun Listening on [0.0.0.0] (family 0, port 9000) 1510*4882a593Smuzhiyun 1511*4882a593SmuzhiyunOn client, run 3 nc commands in different terminals:: 1512*4882a593Smuzhiyun 1513*4882a593Smuzhiyun nstatuser@nstat-a:~$ nc -v nstat-b 9000 1514*4882a593Smuzhiyun Connection to nstat-b 9000 port [tcp/*] succeeded! 1515*4882a593Smuzhiyun 1516*4882a593SmuzhiyunThe nc command only accepts 1 connection, and the accept queue length 1517*4882a593Smuzhiyunis 1. On current linux implementation, set queue length to n means the 1518*4882a593Smuzhiyunactual queue length is n+1. Now we create 3 connections, 1 is accepted 1519*4882a593Smuzhiyunby nc, 2 in accepted queue, so the accept queue is full. 1520*4882a593Smuzhiyun 1521*4882a593SmuzhiyunBefore running the 4th nc, we clean the nstat history on the server:: 1522*4882a593Smuzhiyun 1523*4882a593Smuzhiyun nstatuser@nstat-b:~$ nstat -n 1524*4882a593Smuzhiyun 1525*4882a593SmuzhiyunRun the 4th nc on the client:: 1526*4882a593Smuzhiyun 1527*4882a593Smuzhiyun nstatuser@nstat-a:~$ nc -v nstat-b 9000 1528*4882a593Smuzhiyun 1529*4882a593SmuzhiyunIf the nc server is running on kernel 4.10 or higher version, you 1530*4882a593Smuzhiyunwon't see the "Connection to ... succeeded!" string, because kernel 1531*4882a593Smuzhiyunwill drop the SYN if the accept queue is full. If the nc client is running 1532*4882a593Smuzhiyunon an old kernel, you would see that the connection is succeeded, 1533*4882a593Smuzhiyunbecause kernel would complete the 3 way handshake and keep the socket 1534*4882a593Smuzhiyunon half open queue. I did the test on kernel 4.15. Below is the nstat 1535*4882a593Smuzhiyunon the server:: 1536*4882a593Smuzhiyun 1537*4882a593Smuzhiyun nstatuser@nstat-b:~$ nstat 1538*4882a593Smuzhiyun #kernel 1539*4882a593Smuzhiyun IpInReceives 4 0.0 1540*4882a593Smuzhiyun IpInDelivers 4 0.0 1541*4882a593Smuzhiyun TcpInSegs 4 0.0 1542*4882a593Smuzhiyun TcpExtListenOverflows 4 0.0 1543*4882a593Smuzhiyun TcpExtListenDrops 4 0.0 1544*4882a593Smuzhiyun IpExtInOctets 240 0.0 1545*4882a593Smuzhiyun IpExtInNoECTPkts 4 0.0 1546*4882a593Smuzhiyun 1547*4882a593SmuzhiyunBoth TcpExtListenOverflows and TcpExtListenDrops were 4. If the time 1548*4882a593Smuzhiyunbetween the 4th nc and the nstat was longer, the value of 1549*4882a593SmuzhiyunTcpExtListenOverflows and TcpExtListenDrops would be larger, because 1550*4882a593Smuzhiyunthe SYN of the 4th nc was dropped, the client was retrying. 1551*4882a593Smuzhiyun 1552*4882a593SmuzhiyunIpInAddrErrors, IpExtInNoRoutes and IpOutNoRoutes 1553*4882a593Smuzhiyun------------------------------------------------- 1554*4882a593Smuzhiyunserver A IP address: 192.168.122.250 1555*4882a593Smuzhiyunserver B IP address: 192.168.122.251 1556*4882a593SmuzhiyunPrepare on server A, add a route to server B:: 1557*4882a593Smuzhiyun 1558*4882a593Smuzhiyun $ sudo ip route add 8.8.8.8/32 via 192.168.122.251 1559*4882a593Smuzhiyun 1560*4882a593SmuzhiyunPrepare on server B, disable send_redirects for all interfaces:: 1561*4882a593Smuzhiyun 1562*4882a593Smuzhiyun $ sudo sysctl -w net.ipv4.conf.all.send_redirects=0 1563*4882a593Smuzhiyun $ sudo sysctl -w net.ipv4.conf.ens3.send_redirects=0 1564*4882a593Smuzhiyun $ sudo sysctl -w net.ipv4.conf.lo.send_redirects=0 1565*4882a593Smuzhiyun $ sudo sysctl -w net.ipv4.conf.default.send_redirects=0 1566*4882a593Smuzhiyun 1567*4882a593SmuzhiyunWe want to let sever A send a packet to 8.8.8.8, and route the packet 1568*4882a593Smuzhiyunto server B. When server B receives such packet, it might send a ICMP 1569*4882a593SmuzhiyunRedirect message to server A, set send_redirects to 0 will disable 1570*4882a593Smuzhiyunthis behavior. 1571*4882a593Smuzhiyun 1572*4882a593SmuzhiyunFirst, generate InAddrErrors. On server B, we disable IP forwarding:: 1573*4882a593Smuzhiyun 1574*4882a593Smuzhiyun $ sudo sysctl -w net.ipv4.conf.all.forwarding=0 1575*4882a593Smuzhiyun 1576*4882a593SmuzhiyunOn server A, we send packets to 8.8.8.8:: 1577*4882a593Smuzhiyun 1578*4882a593Smuzhiyun $ nc -v 8.8.8.8 53 1579*4882a593Smuzhiyun 1580*4882a593SmuzhiyunOn server B, we check the output of nstat:: 1581*4882a593Smuzhiyun 1582*4882a593Smuzhiyun $ nstat 1583*4882a593Smuzhiyun #kernel 1584*4882a593Smuzhiyun IpInReceives 3 0.0 1585*4882a593Smuzhiyun IpInAddrErrors 3 0.0 1586*4882a593Smuzhiyun IpExtInOctets 180 0.0 1587*4882a593Smuzhiyun IpExtInNoECTPkts 3 0.0 1588*4882a593Smuzhiyun 1589*4882a593SmuzhiyunAs we have let server A route 8.8.8.8 to server B, and we disabled IP 1590*4882a593Smuzhiyunforwarding on server B, Server A sent packets to server B, then server B 1591*4882a593Smuzhiyundropped packets and increased IpInAddrErrors. As the nc command would 1592*4882a593Smuzhiyunre-send the SYN packet if it didn't receive a SYN+ACK, we could find 1593*4882a593Smuzhiyunmultiple IpInAddrErrors. 1594*4882a593Smuzhiyun 1595*4882a593SmuzhiyunSecond, generate IpExtInNoRoutes. On server B, we enable IP 1596*4882a593Smuzhiyunforwarding:: 1597*4882a593Smuzhiyun 1598*4882a593Smuzhiyun $ sudo sysctl -w net.ipv4.conf.all.forwarding=1 1599*4882a593Smuzhiyun 1600*4882a593SmuzhiyunCheck the route table of server B and remove the default route:: 1601*4882a593Smuzhiyun 1602*4882a593Smuzhiyun $ ip route show 1603*4882a593Smuzhiyun default via 192.168.122.1 dev ens3 proto static 1604*4882a593Smuzhiyun 192.168.122.0/24 dev ens3 proto kernel scope link src 192.168.122.251 1605*4882a593Smuzhiyun $ sudo ip route delete default via 192.168.122.1 dev ens3 proto static 1606*4882a593Smuzhiyun 1607*4882a593SmuzhiyunOn server A, we contact 8.8.8.8 again:: 1608*4882a593Smuzhiyun 1609*4882a593Smuzhiyun $ nc -v 8.8.8.8 53 1610*4882a593Smuzhiyun nc: connect to 8.8.8.8 port 53 (tcp) failed: Network is unreachable 1611*4882a593Smuzhiyun 1612*4882a593SmuzhiyunOn server B, run nstat:: 1613*4882a593Smuzhiyun 1614*4882a593Smuzhiyun $ nstat 1615*4882a593Smuzhiyun #kernel 1616*4882a593Smuzhiyun IpInReceives 1 0.0 1617*4882a593Smuzhiyun IpOutRequests 1 0.0 1618*4882a593Smuzhiyun IcmpOutMsgs 1 0.0 1619*4882a593Smuzhiyun IcmpOutDestUnreachs 1 0.0 1620*4882a593Smuzhiyun IcmpMsgOutType3 1 0.0 1621*4882a593Smuzhiyun IpExtInNoRoutes 1 0.0 1622*4882a593Smuzhiyun IpExtInOctets 60 0.0 1623*4882a593Smuzhiyun IpExtOutOctets 88 0.0 1624*4882a593Smuzhiyun IpExtInNoECTPkts 1 0.0 1625*4882a593Smuzhiyun 1626*4882a593SmuzhiyunWe enabled IP forwarding on server B, when server B received a packet 1627*4882a593Smuzhiyunwhich destination IP address is 8.8.8.8, server B will try to forward 1628*4882a593Smuzhiyunthis packet. We have deleted the default route, there was no route for 1629*4882a593Smuzhiyun8.8.8.8, so server B increase IpExtInNoRoutes and sent the "ICMP 1630*4882a593SmuzhiyunDestination Unreachable" message to server A. 1631*4882a593Smuzhiyun 1632*4882a593SmuzhiyunThird, generate IpOutNoRoutes. Run ping command on server B:: 1633*4882a593Smuzhiyun 1634*4882a593Smuzhiyun $ ping -c 1 8.8.8.8 1635*4882a593Smuzhiyun connect: Network is unreachable 1636*4882a593Smuzhiyun 1637*4882a593SmuzhiyunRun nstat on server B:: 1638*4882a593Smuzhiyun 1639*4882a593Smuzhiyun $ nstat 1640*4882a593Smuzhiyun #kernel 1641*4882a593Smuzhiyun IpOutNoRoutes 1 0.0 1642*4882a593Smuzhiyun 1643*4882a593SmuzhiyunWe have deleted the default route on server B. Server B couldn't find 1644*4882a593Smuzhiyuna route for the 8.8.8.8 IP address, so server B increased 1645*4882a593SmuzhiyunIpOutNoRoutes. 1646*4882a593Smuzhiyun 1647*4882a593SmuzhiyunTcpExtTCPACKSkippedSynRecv 1648*4882a593Smuzhiyun-------------------------- 1649*4882a593SmuzhiyunIn this test, we send 3 same SYN packets from client to server. The 1650*4882a593Smuzhiyunfirst SYN will let server create a socket, set it to Syn-Recv status, 1651*4882a593Smuzhiyunand reply a SYN/ACK. The second SYN will let server reply the SYN/ACK 1652*4882a593Smuzhiyunagain, and record the reply time (the duplicate ACK reply time). The 1653*4882a593Smuzhiyunthird SYN will let server check the previous duplicate ACK reply time, 1654*4882a593Smuzhiyunand decide to skip the duplicate ACK, then increase the 1655*4882a593SmuzhiyunTcpExtTCPACKSkippedSynRecv counter. 1656*4882a593Smuzhiyun 1657*4882a593SmuzhiyunRun tcpdump to capture a SYN packet:: 1658*4882a593Smuzhiyun 1659*4882a593Smuzhiyun nstatuser@nstat-a:~$ sudo tcpdump -c 1 -w /tmp/syn.pcap port 9000 1660*4882a593Smuzhiyun tcpdump: listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes 1661*4882a593Smuzhiyun 1662*4882a593SmuzhiyunOpen another terminal, run nc command:: 1663*4882a593Smuzhiyun 1664*4882a593Smuzhiyun nstatuser@nstat-a:~$ nc nstat-b 9000 1665*4882a593Smuzhiyun 1666*4882a593SmuzhiyunAs the nstat-b didn't listen on port 9000, it should reply a RST, and 1667*4882a593Smuzhiyunthe nc command exited immediately. It was enough for the tcpdump 1668*4882a593Smuzhiyuncommand to capture a SYN packet. A linux server might use hardware 1669*4882a593Smuzhiyunoffload for the TCP checksum, so the checksum in the /tmp/syn.pcap 1670*4882a593Smuzhiyunmight be not correct. We call tcprewrite to fix it:: 1671*4882a593Smuzhiyun 1672*4882a593Smuzhiyun nstatuser@nstat-a:~$ tcprewrite --infile=/tmp/syn.pcap --outfile=/tmp/syn_fixcsum.pcap --fixcsum 1673*4882a593Smuzhiyun 1674*4882a593SmuzhiyunOn nstat-b, we run nc to listen on port 9000:: 1675*4882a593Smuzhiyun 1676*4882a593Smuzhiyun nstatuser@nstat-b:~$ nc -lkv 9000 1677*4882a593Smuzhiyun Listening on [0.0.0.0] (family 0, port 9000) 1678*4882a593Smuzhiyun 1679*4882a593SmuzhiyunOn nstat-a, we blocked the packet from port 9000, or nstat-a would send 1680*4882a593SmuzhiyunRST to nstat-b:: 1681*4882a593Smuzhiyun 1682*4882a593Smuzhiyun nstatuser@nstat-a:~$ sudo iptables -A INPUT -p tcp --sport 9000 -j DROP 1683*4882a593Smuzhiyun 1684*4882a593SmuzhiyunSend 3 SYN repeatly to nstat-b:: 1685*4882a593Smuzhiyun 1686*4882a593Smuzhiyun nstatuser@nstat-a:~$ for i in {1..3}; do sudo tcpreplay -i ens3 /tmp/syn_fixcsum.pcap; done 1687*4882a593Smuzhiyun 1688*4882a593SmuzhiyunCheck snmp cunter on nstat-b:: 1689*4882a593Smuzhiyun 1690*4882a593Smuzhiyun nstatuser@nstat-b:~$ nstat | grep -i skip 1691*4882a593Smuzhiyun TcpExtTCPACKSkippedSynRecv 1 0.0 1692*4882a593Smuzhiyun 1693*4882a593SmuzhiyunAs we expected, TcpExtTCPACKSkippedSynRecv is 1. 1694*4882a593Smuzhiyun 1695*4882a593SmuzhiyunTcpExtTCPACKSkippedPAWS 1696*4882a593Smuzhiyun----------------------- 1697*4882a593SmuzhiyunTo trigger PAWS, we could send an old SYN. 1698*4882a593Smuzhiyun 1699*4882a593SmuzhiyunOn nstat-b, let nc listen on port 9000:: 1700*4882a593Smuzhiyun 1701*4882a593Smuzhiyun nstatuser@nstat-b:~$ nc -lkv 9000 1702*4882a593Smuzhiyun Listening on [0.0.0.0] (family 0, port 9000) 1703*4882a593Smuzhiyun 1704*4882a593SmuzhiyunOn nstat-a, run tcpdump to capture a SYN:: 1705*4882a593Smuzhiyun 1706*4882a593Smuzhiyun nstatuser@nstat-a:~$ sudo tcpdump -w /tmp/paws_pre.pcap -c 1 port 9000 1707*4882a593Smuzhiyun tcpdump: listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes 1708*4882a593Smuzhiyun 1709*4882a593SmuzhiyunOn nstat-a, run nc as a client to connect nstat-b:: 1710*4882a593Smuzhiyun 1711*4882a593Smuzhiyun nstatuser@nstat-a:~$ nc -v nstat-b 9000 1712*4882a593Smuzhiyun Connection to nstat-b 9000 port [tcp/*] succeeded! 1713*4882a593Smuzhiyun 1714*4882a593SmuzhiyunNow the tcpdump has captured the SYN and exit. We should fix the 1715*4882a593Smuzhiyunchecksum:: 1716*4882a593Smuzhiyun 1717*4882a593Smuzhiyun nstatuser@nstat-a:~$ tcprewrite --infile /tmp/paws_pre.pcap --outfile /tmp/paws.pcap --fixcsum 1718*4882a593Smuzhiyun 1719*4882a593SmuzhiyunSend the SYN packet twice:: 1720*4882a593Smuzhiyun 1721*4882a593Smuzhiyun nstatuser@nstat-a:~$ for i in {1..2}; do sudo tcpreplay -i ens3 /tmp/paws.pcap; done 1722*4882a593Smuzhiyun 1723*4882a593SmuzhiyunOn nstat-b, check the snmp counter:: 1724*4882a593Smuzhiyun 1725*4882a593Smuzhiyun nstatuser@nstat-b:~$ nstat | grep -i skip 1726*4882a593Smuzhiyun TcpExtTCPACKSkippedPAWS 1 0.0 1727*4882a593Smuzhiyun 1728*4882a593SmuzhiyunWe sent two SYN via tcpreplay, both of them would let PAWS check 1729*4882a593Smuzhiyunfailed, the nstat-b replied an ACK for the first SYN, skipped the ACK 1730*4882a593Smuzhiyunfor the second SYN, and updated TcpExtTCPACKSkippedPAWS. 1731*4882a593Smuzhiyun 1732*4882a593SmuzhiyunTcpExtTCPACKSkippedSeq 1733*4882a593Smuzhiyun---------------------- 1734*4882a593SmuzhiyunTo trigger TcpExtTCPACKSkippedSeq, we send packets which have valid 1735*4882a593Smuzhiyuntimestamp (to pass PAWS check) but the sequence number is out of 1736*4882a593Smuzhiyunwindow. The linux TCP stack would avoid to skip if the packet has 1737*4882a593Smuzhiyundata, so we need a pure ACK packet. To generate such a packet, we 1738*4882a593Smuzhiyuncould create two sockets: one on port 9000, another on port 9001. Then 1739*4882a593Smuzhiyunwe capture an ACK on port 9001, change the source/destination port 1740*4882a593Smuzhiyunnumbers to match the port 9000 socket. Then we could trigger 1741*4882a593SmuzhiyunTcpExtTCPACKSkippedSeq via this packet. 1742*4882a593Smuzhiyun 1743*4882a593SmuzhiyunOn nstat-b, open two terminals, run two nc commands to listen on both 1744*4882a593Smuzhiyunport 9000 and port 9001:: 1745*4882a593Smuzhiyun 1746*4882a593Smuzhiyun nstatuser@nstat-b:~$ nc -lkv 9000 1747*4882a593Smuzhiyun Listening on [0.0.0.0] (family 0, port 9000) 1748*4882a593Smuzhiyun 1749*4882a593Smuzhiyun nstatuser@nstat-b:~$ nc -lkv 9001 1750*4882a593Smuzhiyun Listening on [0.0.0.0] (family 0, port 9001) 1751*4882a593Smuzhiyun 1752*4882a593SmuzhiyunOn nstat-a, run two nc clients:: 1753*4882a593Smuzhiyun 1754*4882a593Smuzhiyun nstatuser@nstat-a:~$ nc -v nstat-b 9000 1755*4882a593Smuzhiyun Connection to nstat-b 9000 port [tcp/*] succeeded! 1756*4882a593Smuzhiyun 1757*4882a593Smuzhiyun nstatuser@nstat-a:~$ nc -v nstat-b 9001 1758*4882a593Smuzhiyun Connection to nstat-b 9001 port [tcp/*] succeeded! 1759*4882a593Smuzhiyun 1760*4882a593SmuzhiyunOn nstat-a, run tcpdump to capture an ACK:: 1761*4882a593Smuzhiyun 1762*4882a593Smuzhiyun nstatuser@nstat-a:~$ sudo tcpdump -w /tmp/seq_pre.pcap -c 1 dst port 9001 1763*4882a593Smuzhiyun tcpdump: listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes 1764*4882a593Smuzhiyun 1765*4882a593SmuzhiyunOn nstat-b, send a packet via the port 9001 socket. E.g. we sent a 1766*4882a593Smuzhiyunstring 'foo' in our example:: 1767*4882a593Smuzhiyun 1768*4882a593Smuzhiyun nstatuser@nstat-b:~$ nc -lkv 9001 1769*4882a593Smuzhiyun Listening on [0.0.0.0] (family 0, port 9001) 1770*4882a593Smuzhiyun Connection from nstat-a 42132 received! 1771*4882a593Smuzhiyun foo 1772*4882a593Smuzhiyun 1773*4882a593SmuzhiyunOn nstat-a, the tcpdump should have caputred the ACK. We should check 1774*4882a593Smuzhiyunthe source port numbers of the two nc clients:: 1775*4882a593Smuzhiyun 1776*4882a593Smuzhiyun nstatuser@nstat-a:~$ ss -ta '( dport = :9000 || dport = :9001 )' | tee 1777*4882a593Smuzhiyun State Recv-Q Send-Q Local Address:Port Peer Address:Port 1778*4882a593Smuzhiyun ESTAB 0 0 192.168.122.250:50208 192.168.122.251:9000 1779*4882a593Smuzhiyun ESTAB 0 0 192.168.122.250:42132 192.168.122.251:9001 1780*4882a593Smuzhiyun 1781*4882a593SmuzhiyunRun tcprewrite, change port 9001 to port 9000, chagne port 42132 to 1782*4882a593Smuzhiyunport 50208:: 1783*4882a593Smuzhiyun 1784*4882a593Smuzhiyun nstatuser@nstat-a:~$ tcprewrite --infile /tmp/seq_pre.pcap --outfile /tmp/seq.pcap -r 9001:9000 -r 42132:50208 --fixcsum 1785*4882a593Smuzhiyun 1786*4882a593SmuzhiyunNow the /tmp/seq.pcap is the packet we need. Send it to nstat-b:: 1787*4882a593Smuzhiyun 1788*4882a593Smuzhiyun nstatuser@nstat-a:~$ for i in {1..2}; do sudo tcpreplay -i ens3 /tmp/seq.pcap; done 1789*4882a593Smuzhiyun 1790*4882a593SmuzhiyunCheck TcpExtTCPACKSkippedSeq on nstat-b:: 1791*4882a593Smuzhiyun 1792*4882a593Smuzhiyun nstatuser@nstat-b:~$ nstat | grep -i skip 1793*4882a593Smuzhiyun TcpExtTCPACKSkippedSeq 1 0.0 1794