1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0 2*4882a593Smuzhiyun 3*4882a593Smuzhiyun============= 4*4882a593SmuzhiyunDCCP protocol 5*4882a593Smuzhiyun============= 6*4882a593Smuzhiyun 7*4882a593Smuzhiyun 8*4882a593Smuzhiyun.. Contents 9*4882a593Smuzhiyun - Introduction 10*4882a593Smuzhiyun - Missing features 11*4882a593Smuzhiyun - Socket options 12*4882a593Smuzhiyun - Sysctl variables 13*4882a593Smuzhiyun - IOCTLs 14*4882a593Smuzhiyun - Other tunables 15*4882a593Smuzhiyun - Notes 16*4882a593Smuzhiyun 17*4882a593Smuzhiyun 18*4882a593SmuzhiyunIntroduction 19*4882a593Smuzhiyun============ 20*4882a593SmuzhiyunDatagram Congestion Control Protocol (DCCP) is an unreliable, connection 21*4882a593Smuzhiyunoriented protocol designed to solve issues present in UDP and TCP, particularly 22*4882a593Smuzhiyunfor real-time and multimedia (streaming) traffic. 23*4882a593SmuzhiyunIt divides into a base protocol (RFC 4340) and pluggable congestion control 24*4882a593Smuzhiyunmodules called CCIDs. Like pluggable TCP congestion control, at least one CCID 25*4882a593Smuzhiyunneeds to be enabled in order for the protocol to function properly. In the Linux 26*4882a593Smuzhiyunimplementation, this is the TCP-like CCID2 (RFC 4341). Additional CCIDs, such as 27*4882a593Smuzhiyunthe TCP-friendly CCID3 (RFC 4342), are optional. 28*4882a593SmuzhiyunFor a brief introduction to CCIDs and suggestions for choosing a CCID to match 29*4882a593Smuzhiyungiven applications, see section 10 of RFC 4340. 30*4882a593Smuzhiyun 31*4882a593SmuzhiyunIt has a base protocol and pluggable congestion control IDs (CCIDs). 32*4882a593Smuzhiyun 33*4882a593SmuzhiyunDCCP is a Proposed Standard (RFC 2026), and the homepage for DCCP as a protocol 34*4882a593Smuzhiyunis at http://www.ietf.org/html.charters/dccp-charter.html 35*4882a593Smuzhiyun 36*4882a593Smuzhiyun 37*4882a593SmuzhiyunMissing features 38*4882a593Smuzhiyun================ 39*4882a593SmuzhiyunThe Linux DCCP implementation does not currently support all the features that are 40*4882a593Smuzhiyunspecified in RFCs 4340...42. 41*4882a593Smuzhiyun 42*4882a593SmuzhiyunThe known bugs are at: 43*4882a593Smuzhiyun 44*4882a593Smuzhiyun http://www.linuxfoundation.org/collaborate/workgroups/networking/todo#DCCP 45*4882a593Smuzhiyun 46*4882a593SmuzhiyunFor more up-to-date versions of the DCCP implementation, please consider using 47*4882a593Smuzhiyunthe experimental DCCP test tree; instructions for checking this out are on: 48*4882a593Smuzhiyunhttp://www.linuxfoundation.org/collaborate/workgroups/networking/dccp_testing#Experimental_DCCP_source_tree 49*4882a593Smuzhiyun 50*4882a593Smuzhiyun 51*4882a593SmuzhiyunSocket options 52*4882a593Smuzhiyun============== 53*4882a593SmuzhiyunDCCP_SOCKOPT_QPOLICY_ID sets the dequeuing policy for outgoing packets. It takes 54*4882a593Smuzhiyuna policy ID as argument and can only be set before the connection (i.e. changes 55*4882a593Smuzhiyunduring an established connection are not supported). Currently, two policies are 56*4882a593Smuzhiyundefined: the "simple" policy (DCCPQ_POLICY_SIMPLE), which does nothing special, 57*4882a593Smuzhiyunand a priority-based variant (DCCPQ_POLICY_PRIO). The latter allows to pass an 58*4882a593Smuzhiyunu32 priority value as ancillary data to sendmsg(), where higher numbers indicate 59*4882a593Smuzhiyuna higher packet priority (similar to SO_PRIORITY). This ancillary data needs to 60*4882a593Smuzhiyunbe formatted using a cmsg(3) message header filled in as follows:: 61*4882a593Smuzhiyun 62*4882a593Smuzhiyun cmsg->cmsg_level = SOL_DCCP; 63*4882a593Smuzhiyun cmsg->cmsg_type = DCCP_SCM_PRIORITY; 64*4882a593Smuzhiyun cmsg->cmsg_len = CMSG_LEN(sizeof(uint32_t)); /* or CMSG_LEN(4) */ 65*4882a593Smuzhiyun 66*4882a593SmuzhiyunDCCP_SOCKOPT_QPOLICY_TXQLEN sets the maximum length of the output queue. A zero 67*4882a593Smuzhiyunvalue is always interpreted as unbounded queue length. If different from zero, 68*4882a593Smuzhiyunthe interpretation of this parameter depends on the current dequeuing policy 69*4882a593Smuzhiyun(see above): the "simple" policy will enforce a fixed queue size by returning 70*4882a593SmuzhiyunEAGAIN, whereas the "prio" policy enforces a fixed queue length by dropping the 71*4882a593Smuzhiyunlowest-priority packet first. The default value for this parameter is 72*4882a593Smuzhiyuninitialised from /proc/sys/net/dccp/default/tx_qlen. 73*4882a593Smuzhiyun 74*4882a593SmuzhiyunDCCP_SOCKOPT_SERVICE sets the service. The specification mandates use of 75*4882a593Smuzhiyunservice codes (RFC 4340, sec. 8.1.2); if this socket option is not set, 76*4882a593Smuzhiyunthe socket will fall back to 0 (which means that no meaningful service code 77*4882a593Smuzhiyunis present). On active sockets this is set before connect(); specifying more 78*4882a593Smuzhiyunthan one code has no effect (all subsequent service codes are ignored). The 79*4882a593Smuzhiyuncase is different for passive sockets, where multiple service codes (up to 32) 80*4882a593Smuzhiyuncan be set before calling bind(). 81*4882a593Smuzhiyun 82*4882a593SmuzhiyunDCCP_SOCKOPT_GET_CUR_MPS is read-only and retrieves the current maximum packet 83*4882a593Smuzhiyunsize (application payload size) in bytes, see RFC 4340, section 14. 84*4882a593Smuzhiyun 85*4882a593SmuzhiyunDCCP_SOCKOPT_AVAILABLE_CCIDS is also read-only and returns the list of CCIDs 86*4882a593Smuzhiyunsupported by the endpoint. The option value is an array of type uint8_t whose 87*4882a593Smuzhiyunsize is passed as option length. The minimum array size is 4 elements, the 88*4882a593Smuzhiyunvalue returned in the optlen argument always reflects the true number of 89*4882a593Smuzhiyunbuilt-in CCIDs. 90*4882a593Smuzhiyun 91*4882a593SmuzhiyunDCCP_SOCKOPT_CCID is write-only and sets both the TX and RX CCIDs at the same 92*4882a593Smuzhiyuntime, combining the operation of the next two socket options. This option is 93*4882a593Smuzhiyunpreferable over the latter two, since often applications will use the same 94*4882a593Smuzhiyuntype of CCID for both directions; and mixed use of CCIDs is not currently well 95*4882a593Smuzhiyununderstood. This socket option takes as argument at least one uint8_t value, or 96*4882a593Smuzhiyunan array of uint8_t values, which must match available CCIDS (see above). CCIDs 97*4882a593Smuzhiyunmust be registered on the socket before calling connect() or listen(). 98*4882a593Smuzhiyun 99*4882a593SmuzhiyunDCCP_SOCKOPT_TX_CCID is read/write. It returns the current CCID (if set) or sets 100*4882a593Smuzhiyunthe preference list for the TX CCID, using the same format as DCCP_SOCKOPT_CCID. 101*4882a593SmuzhiyunPlease note that the getsockopt argument type here is ``int``, not uint8_t. 102*4882a593Smuzhiyun 103*4882a593SmuzhiyunDCCP_SOCKOPT_RX_CCID is analogous to DCCP_SOCKOPT_TX_CCID, but for the RX CCID. 104*4882a593Smuzhiyun 105*4882a593SmuzhiyunDCCP_SOCKOPT_SERVER_TIMEWAIT enables the server (listening socket) to hold 106*4882a593Smuzhiyuntimewait state when closing the connection (RFC 4340, 8.3). The usual case is 107*4882a593Smuzhiyunthat the closing server sends a CloseReq, whereupon the client holds timewait 108*4882a593Smuzhiyunstate. When this boolean socket option is on, the server sends a Close instead 109*4882a593Smuzhiyunand will enter TIMEWAIT. This option must be set after accept() returns. 110*4882a593Smuzhiyun 111*4882a593SmuzhiyunDCCP_SOCKOPT_SEND_CSCOV and DCCP_SOCKOPT_RECV_CSCOV are used for setting the 112*4882a593Smuzhiyunpartial checksum coverage (RFC 4340, sec. 9.2). The default is that checksums 113*4882a593Smuzhiyunalways cover the entire packet and that only fully covered application data is 114*4882a593Smuzhiyunaccepted by the receiver. Hence, when using this feature on the sender, it must 115*4882a593Smuzhiyunbe enabled at the receiver, too with suitable choice of CsCov. 116*4882a593Smuzhiyun 117*4882a593SmuzhiyunDCCP_SOCKOPT_SEND_CSCOV sets the sender checksum coverage. Values in the 118*4882a593Smuzhiyun range 0..15 are acceptable. The default setting is 0 (full coverage), 119*4882a593Smuzhiyun values between 1..15 indicate partial coverage. 120*4882a593Smuzhiyun 121*4882a593SmuzhiyunDCCP_SOCKOPT_RECV_CSCOV is for the receiver and has a different meaning: it 122*4882a593Smuzhiyun sets a threshold, where again values 0..15 are acceptable. The default 123*4882a593Smuzhiyun of 0 means that all packets with a partial coverage will be discarded. 124*4882a593Smuzhiyun Values in the range 1..15 indicate that packets with minimally such a 125*4882a593Smuzhiyun coverage value are also acceptable. The higher the number, the more 126*4882a593Smuzhiyun restrictive this setting (see [RFC 4340, sec. 9.2.1]). Partial coverage 127*4882a593Smuzhiyun settings are inherited to the child socket after accept(). 128*4882a593Smuzhiyun 129*4882a593SmuzhiyunThe following two options apply to CCID 3 exclusively and are getsockopt()-only. 130*4882a593SmuzhiyunIn either case, a TFRC info struct (defined in <linux/tfrc.h>) is returned. 131*4882a593Smuzhiyun 132*4882a593SmuzhiyunDCCP_SOCKOPT_CCID_RX_INFO 133*4882a593Smuzhiyun Returns a ``struct tfrc_rx_info`` in optval; the buffer for optval and 134*4882a593Smuzhiyun optlen must be set to at least sizeof(struct tfrc_rx_info). 135*4882a593Smuzhiyun 136*4882a593SmuzhiyunDCCP_SOCKOPT_CCID_TX_INFO 137*4882a593Smuzhiyun Returns a ``struct tfrc_tx_info`` in optval; the buffer for optval and 138*4882a593Smuzhiyun optlen must be set to at least sizeof(struct tfrc_tx_info). 139*4882a593Smuzhiyun 140*4882a593SmuzhiyunOn unidirectional connections it is useful to close the unused half-connection 141*4882a593Smuzhiyunvia shutdown (SHUT_WR or SHUT_RD): this will reduce per-packet processing costs. 142*4882a593Smuzhiyun 143*4882a593Smuzhiyun 144*4882a593SmuzhiyunSysctl variables 145*4882a593Smuzhiyun================ 146*4882a593SmuzhiyunSeveral DCCP default parameters can be managed by the following sysctls 147*4882a593Smuzhiyun(sysctl net.dccp.default or /proc/sys/net/dccp/default): 148*4882a593Smuzhiyun 149*4882a593Smuzhiyunrequest_retries 150*4882a593Smuzhiyun The number of active connection initiation retries (the number of 151*4882a593Smuzhiyun Requests minus one) before timing out. In addition, it also governs 152*4882a593Smuzhiyun the behaviour of the other, passive side: this variable also sets 153*4882a593Smuzhiyun the number of times DCCP repeats sending a Response when the initial 154*4882a593Smuzhiyun handshake does not progress from RESPOND to OPEN (i.e. when no Ack 155*4882a593Smuzhiyun is received after the initial Request). This value should be greater 156*4882a593Smuzhiyun than 0, suggested is less than 10. Analogue of tcp_syn_retries. 157*4882a593Smuzhiyun 158*4882a593Smuzhiyunretries1 159*4882a593Smuzhiyun How often a DCCP Response is retransmitted until the listening DCCP 160*4882a593Smuzhiyun side considers its connecting peer dead. Analogue of tcp_retries1. 161*4882a593Smuzhiyun 162*4882a593Smuzhiyunretries2 163*4882a593Smuzhiyun The number of times a general DCCP packet is retransmitted. This has 164*4882a593Smuzhiyun importance for retransmitted acknowledgments and feature negotiation, 165*4882a593Smuzhiyun data packets are never retransmitted. Analogue of tcp_retries2. 166*4882a593Smuzhiyun 167*4882a593Smuzhiyuntx_ccid = 2 168*4882a593Smuzhiyun Default CCID for the sender-receiver half-connection. Depending on the 169*4882a593Smuzhiyun choice of CCID, the Send Ack Vector feature is enabled automatically. 170*4882a593Smuzhiyun 171*4882a593Smuzhiyunrx_ccid = 2 172*4882a593Smuzhiyun Default CCID for the receiver-sender half-connection; see tx_ccid. 173*4882a593Smuzhiyun 174*4882a593Smuzhiyunseq_window = 100 175*4882a593Smuzhiyun The initial sequence window (sec. 7.5.2) of the sender. This influences 176*4882a593Smuzhiyun the local ackno validity and the remote seqno validity windows (7.5.1). 177*4882a593Smuzhiyun Values in the range Wmin = 32 (RFC 4340, 7.5.2) up to 2^32-1 can be set. 178*4882a593Smuzhiyun 179*4882a593Smuzhiyuntx_qlen = 5 180*4882a593Smuzhiyun The size of the transmit buffer in packets. A value of 0 corresponds 181*4882a593Smuzhiyun to an unbounded transmit buffer. 182*4882a593Smuzhiyun 183*4882a593Smuzhiyunsync_ratelimit = 125 ms 184*4882a593Smuzhiyun The timeout between subsequent DCCP-Sync packets sent in response to 185*4882a593Smuzhiyun sequence-invalid packets on the same socket (RFC 4340, 7.5.4). The unit 186*4882a593Smuzhiyun of this parameter is milliseconds; a value of 0 disables rate-limiting. 187*4882a593Smuzhiyun 188*4882a593Smuzhiyun 189*4882a593SmuzhiyunIOCTLS 190*4882a593Smuzhiyun====== 191*4882a593SmuzhiyunFIONREAD 192*4882a593Smuzhiyun Works as in udp(7): returns in the ``int`` argument pointer the size of 193*4882a593Smuzhiyun the next pending datagram in bytes, or 0 when no datagram is pending. 194*4882a593Smuzhiyun 195*4882a593SmuzhiyunSIOCOUTQ 196*4882a593Smuzhiyun Returns the number of unsent data bytes in the socket send queue as ``int`` 197*4882a593Smuzhiyun into the buffer specified by the argument pointer. 198*4882a593Smuzhiyun 199*4882a593SmuzhiyunOther tunables 200*4882a593Smuzhiyun============== 201*4882a593SmuzhiyunPer-route rto_min support 202*4882a593Smuzhiyun CCID-2 supports the RTAX_RTO_MIN per-route setting for the minimum value 203*4882a593Smuzhiyun of the RTO timer. This setting can be modified via the 'rto_min' option 204*4882a593Smuzhiyun of iproute2; for example:: 205*4882a593Smuzhiyun 206*4882a593Smuzhiyun > ip route change 10.0.0.0/24 rto_min 250j dev wlan0 207*4882a593Smuzhiyun > ip route add 10.0.0.254/32 rto_min 800j dev wlan0 208*4882a593Smuzhiyun > ip route show dev wlan0 209*4882a593Smuzhiyun 210*4882a593Smuzhiyun CCID-3 also supports the rto_min setting: it is used to define the lower 211*4882a593Smuzhiyun bound for the expiry of the nofeedback timer. This can be useful on LANs 212*4882a593Smuzhiyun with very low RTTs (e.g., loopback, Gbit ethernet). 213*4882a593Smuzhiyun 214*4882a593Smuzhiyun 215*4882a593SmuzhiyunNotes 216*4882a593Smuzhiyun===== 217*4882a593SmuzhiyunDCCP does not travel through NAT successfully at present on many boxes. This is 218*4882a593Smuzhiyunbecause the checksum covers the pseudo-header as per TCP and UDP. Linux NAT 219*4882a593Smuzhiyunsupport for DCCP has been added. 220