xref: /OK3568_Linux_fs/kernel/Documentation/networking/dccp.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun=============
4*4882a593SmuzhiyunDCCP protocol
5*4882a593Smuzhiyun=============
6*4882a593Smuzhiyun
7*4882a593Smuzhiyun
8*4882a593Smuzhiyun.. Contents
9*4882a593Smuzhiyun   - Introduction
10*4882a593Smuzhiyun   - Missing features
11*4882a593Smuzhiyun   - Socket options
12*4882a593Smuzhiyun   - Sysctl variables
13*4882a593Smuzhiyun   - IOCTLs
14*4882a593Smuzhiyun   - Other tunables
15*4882a593Smuzhiyun   - Notes
16*4882a593Smuzhiyun
17*4882a593Smuzhiyun
18*4882a593SmuzhiyunIntroduction
19*4882a593Smuzhiyun============
20*4882a593SmuzhiyunDatagram Congestion Control Protocol (DCCP) is an unreliable, connection
21*4882a593Smuzhiyunoriented protocol designed to solve issues present in UDP and TCP, particularly
22*4882a593Smuzhiyunfor real-time and multimedia (streaming) traffic.
23*4882a593SmuzhiyunIt divides into a base protocol (RFC 4340) and pluggable congestion control
24*4882a593Smuzhiyunmodules called CCIDs. Like pluggable TCP congestion control, at least one CCID
25*4882a593Smuzhiyunneeds to be enabled in order for the protocol to function properly. In the Linux
26*4882a593Smuzhiyunimplementation, this is the TCP-like CCID2 (RFC 4341). Additional CCIDs, such as
27*4882a593Smuzhiyunthe TCP-friendly CCID3 (RFC 4342), are optional.
28*4882a593SmuzhiyunFor a brief introduction to CCIDs and suggestions for choosing a CCID to match
29*4882a593Smuzhiyungiven applications, see section 10 of RFC 4340.
30*4882a593Smuzhiyun
31*4882a593SmuzhiyunIt has a base protocol and pluggable congestion control IDs (CCIDs).
32*4882a593Smuzhiyun
33*4882a593SmuzhiyunDCCP is a Proposed Standard (RFC 2026), and the homepage for DCCP as a protocol
34*4882a593Smuzhiyunis at http://www.ietf.org/html.charters/dccp-charter.html
35*4882a593Smuzhiyun
36*4882a593Smuzhiyun
37*4882a593SmuzhiyunMissing features
38*4882a593Smuzhiyun================
39*4882a593SmuzhiyunThe Linux DCCP implementation does not currently support all the features that are
40*4882a593Smuzhiyunspecified in RFCs 4340...42.
41*4882a593Smuzhiyun
42*4882a593SmuzhiyunThe known bugs are at:
43*4882a593Smuzhiyun
44*4882a593Smuzhiyun	http://www.linuxfoundation.org/collaborate/workgroups/networking/todo#DCCP
45*4882a593Smuzhiyun
46*4882a593SmuzhiyunFor more up-to-date versions of the DCCP implementation, please consider using
47*4882a593Smuzhiyunthe experimental DCCP test tree; instructions for checking this out are on:
48*4882a593Smuzhiyunhttp://www.linuxfoundation.org/collaborate/workgroups/networking/dccp_testing#Experimental_DCCP_source_tree
49*4882a593Smuzhiyun
50*4882a593Smuzhiyun
51*4882a593SmuzhiyunSocket options
52*4882a593Smuzhiyun==============
53*4882a593SmuzhiyunDCCP_SOCKOPT_QPOLICY_ID sets the dequeuing policy for outgoing packets. It takes
54*4882a593Smuzhiyuna policy ID as argument and can only be set before the connection (i.e. changes
55*4882a593Smuzhiyunduring an established connection are not supported). Currently, two policies are
56*4882a593Smuzhiyundefined: the "simple" policy (DCCPQ_POLICY_SIMPLE), which does nothing special,
57*4882a593Smuzhiyunand a priority-based variant (DCCPQ_POLICY_PRIO). The latter allows to pass an
58*4882a593Smuzhiyunu32 priority value as ancillary data to sendmsg(), where higher numbers indicate
59*4882a593Smuzhiyuna higher packet priority (similar to SO_PRIORITY). This ancillary data needs to
60*4882a593Smuzhiyunbe formatted using a cmsg(3) message header filled in as follows::
61*4882a593Smuzhiyun
62*4882a593Smuzhiyun	cmsg->cmsg_level = SOL_DCCP;
63*4882a593Smuzhiyun	cmsg->cmsg_type	 = DCCP_SCM_PRIORITY;
64*4882a593Smuzhiyun	cmsg->cmsg_len	 = CMSG_LEN(sizeof(uint32_t));	/* or CMSG_LEN(4) */
65*4882a593Smuzhiyun
66*4882a593SmuzhiyunDCCP_SOCKOPT_QPOLICY_TXQLEN sets the maximum length of the output queue. A zero
67*4882a593Smuzhiyunvalue is always interpreted as unbounded queue length. If different from zero,
68*4882a593Smuzhiyunthe interpretation of this parameter depends on the current dequeuing policy
69*4882a593Smuzhiyun(see above): the "simple" policy will enforce a fixed queue size by returning
70*4882a593SmuzhiyunEAGAIN, whereas the "prio" policy enforces a fixed queue length by dropping the
71*4882a593Smuzhiyunlowest-priority packet first. The default value for this parameter is
72*4882a593Smuzhiyuninitialised from /proc/sys/net/dccp/default/tx_qlen.
73*4882a593Smuzhiyun
74*4882a593SmuzhiyunDCCP_SOCKOPT_SERVICE sets the service. The specification mandates use of
75*4882a593Smuzhiyunservice codes (RFC 4340, sec. 8.1.2); if this socket option is not set,
76*4882a593Smuzhiyunthe socket will fall back to 0 (which means that no meaningful service code
77*4882a593Smuzhiyunis present). On active sockets this is set before connect(); specifying more
78*4882a593Smuzhiyunthan one code has no effect (all subsequent service codes are ignored). The
79*4882a593Smuzhiyuncase is different for passive sockets, where multiple service codes (up to 32)
80*4882a593Smuzhiyuncan be set before calling bind().
81*4882a593Smuzhiyun
82*4882a593SmuzhiyunDCCP_SOCKOPT_GET_CUR_MPS is read-only and retrieves the current maximum packet
83*4882a593Smuzhiyunsize (application payload size) in bytes, see RFC 4340, section 14.
84*4882a593Smuzhiyun
85*4882a593SmuzhiyunDCCP_SOCKOPT_AVAILABLE_CCIDS is also read-only and returns the list of CCIDs
86*4882a593Smuzhiyunsupported by the endpoint. The option value is an array of type uint8_t whose
87*4882a593Smuzhiyunsize is passed as option length. The minimum array size is 4 elements, the
88*4882a593Smuzhiyunvalue returned in the optlen argument always reflects the true number of
89*4882a593Smuzhiyunbuilt-in CCIDs.
90*4882a593Smuzhiyun
91*4882a593SmuzhiyunDCCP_SOCKOPT_CCID is write-only and sets both the TX and RX CCIDs at the same
92*4882a593Smuzhiyuntime, combining the operation of the next two socket options. This option is
93*4882a593Smuzhiyunpreferable over the latter two, since often applications will use the same
94*4882a593Smuzhiyuntype of CCID for both directions; and mixed use of CCIDs is not currently well
95*4882a593Smuzhiyununderstood. This socket option takes as argument at least one uint8_t value, or
96*4882a593Smuzhiyunan array of uint8_t values, which must match available CCIDS (see above). CCIDs
97*4882a593Smuzhiyunmust be registered on the socket before calling connect() or listen().
98*4882a593Smuzhiyun
99*4882a593SmuzhiyunDCCP_SOCKOPT_TX_CCID is read/write. It returns the current CCID (if set) or sets
100*4882a593Smuzhiyunthe preference list for the TX CCID, using the same format as DCCP_SOCKOPT_CCID.
101*4882a593SmuzhiyunPlease note that the getsockopt argument type here is ``int``, not uint8_t.
102*4882a593Smuzhiyun
103*4882a593SmuzhiyunDCCP_SOCKOPT_RX_CCID is analogous to DCCP_SOCKOPT_TX_CCID, but for the RX CCID.
104*4882a593Smuzhiyun
105*4882a593SmuzhiyunDCCP_SOCKOPT_SERVER_TIMEWAIT enables the server (listening socket) to hold
106*4882a593Smuzhiyuntimewait state when closing the connection (RFC 4340, 8.3). The usual case is
107*4882a593Smuzhiyunthat the closing server sends a CloseReq, whereupon the client holds timewait
108*4882a593Smuzhiyunstate. When this boolean socket option is on, the server sends a Close instead
109*4882a593Smuzhiyunand will enter TIMEWAIT. This option must be set after accept() returns.
110*4882a593Smuzhiyun
111*4882a593SmuzhiyunDCCP_SOCKOPT_SEND_CSCOV and DCCP_SOCKOPT_RECV_CSCOV are used for setting the
112*4882a593Smuzhiyunpartial checksum coverage (RFC 4340, sec. 9.2). The default is that checksums
113*4882a593Smuzhiyunalways cover the entire packet and that only fully covered application data is
114*4882a593Smuzhiyunaccepted by the receiver. Hence, when using this feature on the sender, it must
115*4882a593Smuzhiyunbe enabled at the receiver, too with suitable choice of CsCov.
116*4882a593Smuzhiyun
117*4882a593SmuzhiyunDCCP_SOCKOPT_SEND_CSCOV sets the sender checksum coverage. Values in the
118*4882a593Smuzhiyun	range 0..15 are acceptable. The default setting is 0 (full coverage),
119*4882a593Smuzhiyun	values between 1..15 indicate partial coverage.
120*4882a593Smuzhiyun
121*4882a593SmuzhiyunDCCP_SOCKOPT_RECV_CSCOV is for the receiver and has a different meaning: it
122*4882a593Smuzhiyun	sets a threshold, where again values 0..15 are acceptable. The default
123*4882a593Smuzhiyun	of 0 means that all packets with a partial coverage will be discarded.
124*4882a593Smuzhiyun	Values in the range 1..15 indicate that packets with minimally such a
125*4882a593Smuzhiyun	coverage value are also acceptable. The higher the number, the more
126*4882a593Smuzhiyun	restrictive this setting (see [RFC 4340, sec. 9.2.1]). Partial coverage
127*4882a593Smuzhiyun	settings are inherited to the child socket after accept().
128*4882a593Smuzhiyun
129*4882a593SmuzhiyunThe following two options apply to CCID 3 exclusively and are getsockopt()-only.
130*4882a593SmuzhiyunIn either case, a TFRC info struct (defined in <linux/tfrc.h>) is returned.
131*4882a593Smuzhiyun
132*4882a593SmuzhiyunDCCP_SOCKOPT_CCID_RX_INFO
133*4882a593Smuzhiyun	Returns a ``struct tfrc_rx_info`` in optval; the buffer for optval and
134*4882a593Smuzhiyun	optlen must be set to at least sizeof(struct tfrc_rx_info).
135*4882a593Smuzhiyun
136*4882a593SmuzhiyunDCCP_SOCKOPT_CCID_TX_INFO
137*4882a593Smuzhiyun	Returns a ``struct tfrc_tx_info`` in optval; the buffer for optval and
138*4882a593Smuzhiyun	optlen must be set to at least sizeof(struct tfrc_tx_info).
139*4882a593Smuzhiyun
140*4882a593SmuzhiyunOn unidirectional connections it is useful to close the unused half-connection
141*4882a593Smuzhiyunvia shutdown (SHUT_WR or SHUT_RD): this will reduce per-packet processing costs.
142*4882a593Smuzhiyun
143*4882a593Smuzhiyun
144*4882a593SmuzhiyunSysctl variables
145*4882a593Smuzhiyun================
146*4882a593SmuzhiyunSeveral DCCP default parameters can be managed by the following sysctls
147*4882a593Smuzhiyun(sysctl net.dccp.default or /proc/sys/net/dccp/default):
148*4882a593Smuzhiyun
149*4882a593Smuzhiyunrequest_retries
150*4882a593Smuzhiyun	The number of active connection initiation retries (the number of
151*4882a593Smuzhiyun	Requests minus one) before timing out. In addition, it also governs
152*4882a593Smuzhiyun	the behaviour of the other, passive side: this variable also sets
153*4882a593Smuzhiyun	the number of times DCCP repeats sending a Response when the initial
154*4882a593Smuzhiyun	handshake does not progress from RESPOND to OPEN (i.e. when no Ack
155*4882a593Smuzhiyun	is received after the initial Request).  This value should be greater
156*4882a593Smuzhiyun	than 0, suggested is less than 10. Analogue of tcp_syn_retries.
157*4882a593Smuzhiyun
158*4882a593Smuzhiyunretries1
159*4882a593Smuzhiyun	How often a DCCP Response is retransmitted until the listening DCCP
160*4882a593Smuzhiyun	side considers its connecting peer dead. Analogue of tcp_retries1.
161*4882a593Smuzhiyun
162*4882a593Smuzhiyunretries2
163*4882a593Smuzhiyun	The number of times a general DCCP packet is retransmitted. This has
164*4882a593Smuzhiyun	importance for retransmitted acknowledgments and feature negotiation,
165*4882a593Smuzhiyun	data packets are never retransmitted. Analogue of tcp_retries2.
166*4882a593Smuzhiyun
167*4882a593Smuzhiyuntx_ccid = 2
168*4882a593Smuzhiyun	Default CCID for the sender-receiver half-connection. Depending on the
169*4882a593Smuzhiyun	choice of CCID, the Send Ack Vector feature is enabled automatically.
170*4882a593Smuzhiyun
171*4882a593Smuzhiyunrx_ccid = 2
172*4882a593Smuzhiyun	Default CCID for the receiver-sender half-connection; see tx_ccid.
173*4882a593Smuzhiyun
174*4882a593Smuzhiyunseq_window = 100
175*4882a593Smuzhiyun	The initial sequence window (sec. 7.5.2) of the sender. This influences
176*4882a593Smuzhiyun	the local ackno validity and the remote seqno validity windows (7.5.1).
177*4882a593Smuzhiyun	Values in the range Wmin = 32 (RFC 4340, 7.5.2) up to 2^32-1 can be set.
178*4882a593Smuzhiyun
179*4882a593Smuzhiyuntx_qlen = 5
180*4882a593Smuzhiyun	The size of the transmit buffer in packets. A value of 0 corresponds
181*4882a593Smuzhiyun	to an unbounded transmit buffer.
182*4882a593Smuzhiyun
183*4882a593Smuzhiyunsync_ratelimit = 125 ms
184*4882a593Smuzhiyun	The timeout between subsequent DCCP-Sync packets sent in response to
185*4882a593Smuzhiyun	sequence-invalid packets on the same socket (RFC 4340, 7.5.4). The unit
186*4882a593Smuzhiyun	of this parameter is milliseconds; a value of 0 disables rate-limiting.
187*4882a593Smuzhiyun
188*4882a593Smuzhiyun
189*4882a593SmuzhiyunIOCTLS
190*4882a593Smuzhiyun======
191*4882a593SmuzhiyunFIONREAD
192*4882a593Smuzhiyun	Works as in udp(7): returns in the ``int`` argument pointer the size of
193*4882a593Smuzhiyun	the next pending datagram in bytes, or 0 when no datagram is pending.
194*4882a593Smuzhiyun
195*4882a593SmuzhiyunSIOCOUTQ
196*4882a593Smuzhiyun	Returns the number of unsent data bytes in the socket send queue as ``int``
197*4882a593Smuzhiyun	into the buffer specified by the argument pointer.
198*4882a593Smuzhiyun
199*4882a593SmuzhiyunOther tunables
200*4882a593Smuzhiyun==============
201*4882a593SmuzhiyunPer-route rto_min support
202*4882a593Smuzhiyun	CCID-2 supports the RTAX_RTO_MIN per-route setting for the minimum value
203*4882a593Smuzhiyun	of the RTO timer. This setting can be modified via the 'rto_min' option
204*4882a593Smuzhiyun	of iproute2; for example::
205*4882a593Smuzhiyun
206*4882a593Smuzhiyun		> ip route change 10.0.0.0/24   rto_min 250j dev wlan0
207*4882a593Smuzhiyun		> ip route add    10.0.0.254/32 rto_min 800j dev wlan0
208*4882a593Smuzhiyun		> ip route show dev wlan0
209*4882a593Smuzhiyun
210*4882a593Smuzhiyun	CCID-3 also supports the rto_min setting: it is used to define the lower
211*4882a593Smuzhiyun	bound for the expiry of the nofeedback timer. This can be useful on LANs
212*4882a593Smuzhiyun	with very low RTTs (e.g., loopback, Gbit ethernet).
213*4882a593Smuzhiyun
214*4882a593Smuzhiyun
215*4882a593SmuzhiyunNotes
216*4882a593Smuzhiyun=====
217*4882a593SmuzhiyunDCCP does not travel through NAT successfully at present on many boxes. This is
218*4882a593Smuzhiyunbecause the checksum covers the pseudo-header as per TCP and UDP. Linux NAT
219*4882a593Smuzhiyunsupport for DCCP has been added.
220