From: Mauro Carvalho Chehab Date: Mon, 27 Apr 2020 22:01:28 +0000 (+0200) Subject: docs: networking: convert dccp.txt to ReST X-Git-Url: http://git.maquefel.me/?a=commitdiff_plain;h=33155bac6519f545137d9c46d2e59e5f8332dd50;p=linux.git docs: networking: convert dccp.txt to ReST - add SPDX header; - adjust title markup; - comment out text-only TOC from html/pdf output; - mark code blocks and literals as such; - adjust identation, whitespaces and blank lines; - add to networking/index.rst. Signed-off-by: Mauro Carvalho Chehab Signed-off-by: David S. Miller --- diff --git a/Documentation/networking/dccp.rst b/Documentation/networking/dccp.rst new file mode 100644 index 0000000000000..dde16be044562 --- /dev/null +++ b/Documentation/networking/dccp.rst @@ -0,0 +1,216 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============= +DCCP protocol +============= + + +.. Contents + - Introduction + - Missing features + - Socket options + - Sysctl variables + - IOCTLs + - Other tunables + - Notes + + +Introduction +============ +Datagram Congestion Control Protocol (DCCP) is an unreliable, connection +oriented protocol designed to solve issues present in UDP and TCP, particularly +for real-time and multimedia (streaming) traffic. +It divides into a base protocol (RFC 4340) and pluggable congestion control +modules called CCIDs. Like pluggable TCP congestion control, at least one CCID +needs to be enabled in order for the protocol to function properly. In the Linux +implementation, this is the TCP-like CCID2 (RFC 4341). Additional CCIDs, such as +the TCP-friendly CCID3 (RFC 4342), are optional. +For a brief introduction to CCIDs and suggestions for choosing a CCID to match +given applications, see section 10 of RFC 4340. + +It has a base protocol and pluggable congestion control IDs (CCIDs). + +DCCP is a Proposed Standard (RFC 2026), and the homepage for DCCP as a protocol +is at http://www.ietf.org/html.charters/dccp-charter.html + + +Missing features +================ +The Linux DCCP implementation does not currently support all the features that are +specified in RFCs 4340...42. + +The known bugs are at: + + http://www.linuxfoundation.org/collaborate/workgroups/networking/todo#DCCP + +For more up-to-date versions of the DCCP implementation, please consider using +the experimental DCCP test tree; instructions for checking this out are on: +http://www.linuxfoundation.org/collaborate/workgroups/networking/dccp_testing#Experimental_DCCP_source_tree + + +Socket options +============== +DCCP_SOCKOPT_QPOLICY_ID sets the dequeuing policy for outgoing packets. It takes +a policy ID as argument and can only be set before the connection (i.e. changes +during an established connection are not supported). Currently, two policies are +defined: the "simple" policy (DCCPQ_POLICY_SIMPLE), which does nothing special, +and a priority-based variant (DCCPQ_POLICY_PRIO). The latter allows to pass an +u32 priority value as ancillary data to sendmsg(), where higher numbers indicate +a higher packet priority (similar to SO_PRIORITY). This ancillary data needs to +be formatted using a cmsg(3) message header filled in as follows:: + + cmsg->cmsg_level = SOL_DCCP; + cmsg->cmsg_type = DCCP_SCM_PRIORITY; + cmsg->cmsg_len = CMSG_LEN(sizeof(uint32_t)); /* or CMSG_LEN(4) */ + +DCCP_SOCKOPT_QPOLICY_TXQLEN sets the maximum length of the output queue. A zero +value is always interpreted as unbounded queue length. If different from zero, +the interpretation of this parameter depends on the current dequeuing policy +(see above): the "simple" policy will enforce a fixed queue size by returning +EAGAIN, whereas the "prio" policy enforces a fixed queue length by dropping the +lowest-priority packet first. The default value for this parameter is +initialised from /proc/sys/net/dccp/default/tx_qlen. + +DCCP_SOCKOPT_SERVICE sets the service. The specification mandates use of +service codes (RFC 4340, sec. 8.1.2); if this socket option is not set, +the socket will fall back to 0 (which means that no meaningful service code +is present). On active sockets this is set before connect(); specifying more +than one code has no effect (all subsequent service codes are ignored). The +case is different for passive sockets, where multiple service codes (up to 32) +can be set before calling bind(). + +DCCP_SOCKOPT_GET_CUR_MPS is read-only and retrieves the current maximum packet +size (application payload size) in bytes, see RFC 4340, section 14. + +DCCP_SOCKOPT_AVAILABLE_CCIDS is also read-only and returns the list of CCIDs +supported by the endpoint. The option value is an array of type uint8_t whose +size is passed as option length. The minimum array size is 4 elements, the +value returned in the optlen argument always reflects the true number of +built-in CCIDs. + +DCCP_SOCKOPT_CCID is write-only and sets both the TX and RX CCIDs at the same +time, combining the operation of the next two socket options. This option is +preferable over the latter two, since often applications will use the same +type of CCID for both directions; and mixed use of CCIDs is not currently well +understood. This socket option takes as argument at least one uint8_t value, or +an array of uint8_t values, which must match available CCIDS (see above). CCIDs +must be registered on the socket before calling connect() or listen(). + +DCCP_SOCKOPT_TX_CCID is read/write. It returns the current CCID (if set) or sets +the preference list for the TX CCID, using the same format as DCCP_SOCKOPT_CCID. +Please note that the getsockopt argument type here is ``int``, not uint8_t. + +DCCP_SOCKOPT_RX_CCID is analogous to DCCP_SOCKOPT_TX_CCID, but for the RX CCID. + +DCCP_SOCKOPT_SERVER_TIMEWAIT enables the server (listening socket) to hold +timewait state when closing the connection (RFC 4340, 8.3). The usual case is +that the closing server sends a CloseReq, whereupon the client holds timewait +state. When this boolean socket option is on, the server sends a Close instead +and will enter TIMEWAIT. This option must be set after accept() returns. + +DCCP_SOCKOPT_SEND_CSCOV and DCCP_SOCKOPT_RECV_CSCOV are used for setting the +partial checksum coverage (RFC 4340, sec. 9.2). The default is that checksums +always cover the entire packet and that only fully covered application data is +accepted by the receiver. Hence, when using this feature on the sender, it must +be enabled at the receiver, too with suitable choice of CsCov. + +DCCP_SOCKOPT_SEND_CSCOV sets the sender checksum coverage. Values in the + range 0..15 are acceptable. The default setting is 0 (full coverage), + values between 1..15 indicate partial coverage. + +DCCP_SOCKOPT_RECV_CSCOV is for the receiver and has a different meaning: it + sets a threshold, where again values 0..15 are acceptable. The default + of 0 means that all packets with a partial coverage will be discarded. + Values in the range 1..15 indicate that packets with minimally such a + coverage value are also acceptable. The higher the number, the more + restrictive this setting (see [RFC 4340, sec. 9.2.1]). Partial coverage + settings are inherited to the child socket after accept(). + +The following two options apply to CCID 3 exclusively and are getsockopt()-only. +In either case, a TFRC info struct (defined in ) is returned. + +DCCP_SOCKOPT_CCID_RX_INFO + Returns a ``struct tfrc_rx_info`` in optval; the buffer for optval and + optlen must be set to at least sizeof(struct tfrc_rx_info). + +DCCP_SOCKOPT_CCID_TX_INFO + Returns a ``struct tfrc_tx_info`` in optval; the buffer for optval and + optlen must be set to at least sizeof(struct tfrc_tx_info). + +On unidirectional connections it is useful to close the unused half-connection +via shutdown (SHUT_WR or SHUT_RD): this will reduce per-packet processing costs. + + +Sysctl variables +================ +Several DCCP default parameters can be managed by the following sysctls +(sysctl net.dccp.default or /proc/sys/net/dccp/default): + +request_retries + The number of active connection initiation retries (the number of + Requests minus one) before timing out. In addition, it also governs + the behaviour of the other, passive side: this variable also sets + the number of times DCCP repeats sending a Response when the initial + handshake does not progress from RESPOND to OPEN (i.e. when no Ack + is received after the initial Request). This value should be greater + than 0, suggested is less than 10. Analogue of tcp_syn_retries. + +retries1 + How often a DCCP Response is retransmitted until the listening DCCP + side considers its connecting peer dead. Analogue of tcp_retries1. + +retries2 + The number of times a general DCCP packet is retransmitted. This has + importance for retransmitted acknowledgments and feature negotiation, + data packets are never retransmitted. Analogue of tcp_retries2. + +tx_ccid = 2 + Default CCID for the sender-receiver half-connection. Depending on the + choice of CCID, the Send Ack Vector feature is enabled automatically. + +rx_ccid = 2 + Default CCID for the receiver-sender half-connection; see tx_ccid. + +seq_window = 100 + The initial sequence window (sec. 7.5.2) of the sender. This influences + the local ackno validity and the remote seqno validity windows (7.5.1). + Values in the range Wmin = 32 (RFC 4340, 7.5.2) up to 2^32-1 can be set. + +tx_qlen = 5 + The size of the transmit buffer in packets. A value of 0 corresponds + to an unbounded transmit buffer. + +sync_ratelimit = 125 ms + The timeout between subsequent DCCP-Sync packets sent in response to + sequence-invalid packets on the same socket (RFC 4340, 7.5.4). The unit + of this parameter is milliseconds; a value of 0 disables rate-limiting. + + +IOCTLS +====== +FIONREAD + Works as in udp(7): returns in the ``int`` argument pointer the size of + the next pending datagram in bytes, or 0 when no datagram is pending. + + +Other tunables +============== +Per-route rto_min support + CCID-2 supports the RTAX_RTO_MIN per-route setting for the minimum value + of the RTO timer. This setting can be modified via the 'rto_min' option + of iproute2; for example:: + + > ip route change 10.0.0.0/24 rto_min 250j dev wlan0 + > ip route add 10.0.0.254/32 rto_min 800j dev wlan0 + > ip route show dev wlan0 + + CCID-3 also supports the rto_min setting: it is used to define the lower + bound for the expiry of the nofeedback timer. This can be useful on LANs + with very low RTTs (e.g., loopback, Gbit ethernet). + + +Notes +===== +DCCP does not travel through NAT successfully at present on many boxes. This is +because the checksum covers the pseudo-header as per TCP and UDP. Linux NAT +support for DCCP has been added. diff --git a/Documentation/networking/dccp.txt b/Documentation/networking/dccp.txt deleted file mode 100644 index 55c575fcaf17d..0000000000000 --- a/Documentation/networking/dccp.txt +++ /dev/null @@ -1,207 +0,0 @@ -DCCP protocol -============= - - -Contents -======== -- Introduction -- Missing features -- Socket options -- Sysctl variables -- IOCTLs -- Other tunables -- Notes - - -Introduction -============ -Datagram Congestion Control Protocol (DCCP) is an unreliable, connection -oriented protocol designed to solve issues present in UDP and TCP, particularly -for real-time and multimedia (streaming) traffic. -It divides into a base protocol (RFC 4340) and pluggable congestion control -modules called CCIDs. Like pluggable TCP congestion control, at least one CCID -needs to be enabled in order for the protocol to function properly. In the Linux -implementation, this is the TCP-like CCID2 (RFC 4341). Additional CCIDs, such as -the TCP-friendly CCID3 (RFC 4342), are optional. -For a brief introduction to CCIDs and suggestions for choosing a CCID to match -given applications, see section 10 of RFC 4340. - -It has a base protocol and pluggable congestion control IDs (CCIDs). - -DCCP is a Proposed Standard (RFC 2026), and the homepage for DCCP as a protocol -is at http://www.ietf.org/html.charters/dccp-charter.html - - -Missing features -================ -The Linux DCCP implementation does not currently support all the features that are -specified in RFCs 4340...42. - -The known bugs are at: - http://www.linuxfoundation.org/collaborate/workgroups/networking/todo#DCCP - -For more up-to-date versions of the DCCP implementation, please consider using -the experimental DCCP test tree; instructions for checking this out are on: -http://www.linuxfoundation.org/collaborate/workgroups/networking/dccp_testing#Experimental_DCCP_source_tree - - -Socket options -============== -DCCP_SOCKOPT_QPOLICY_ID sets the dequeuing policy for outgoing packets. It takes -a policy ID as argument and can only be set before the connection (i.e. changes -during an established connection are not supported). Currently, two policies are -defined: the "simple" policy (DCCPQ_POLICY_SIMPLE), which does nothing special, -and a priority-based variant (DCCPQ_POLICY_PRIO). The latter allows to pass an -u32 priority value as ancillary data to sendmsg(), where higher numbers indicate -a higher packet priority (similar to SO_PRIORITY). This ancillary data needs to -be formatted using a cmsg(3) message header filled in as follows: - cmsg->cmsg_level = SOL_DCCP; - cmsg->cmsg_type = DCCP_SCM_PRIORITY; - cmsg->cmsg_len = CMSG_LEN(sizeof(uint32_t)); /* or CMSG_LEN(4) */ - -DCCP_SOCKOPT_QPOLICY_TXQLEN sets the maximum length of the output queue. A zero -value is always interpreted as unbounded queue length. If different from zero, -the interpretation of this parameter depends on the current dequeuing policy -(see above): the "simple" policy will enforce a fixed queue size by returning -EAGAIN, whereas the "prio" policy enforces a fixed queue length by dropping the -lowest-priority packet first. The default value for this parameter is -initialised from /proc/sys/net/dccp/default/tx_qlen. - -DCCP_SOCKOPT_SERVICE sets the service. The specification mandates use of -service codes (RFC 4340, sec. 8.1.2); if this socket option is not set, -the socket will fall back to 0 (which means that no meaningful service code -is present). On active sockets this is set before connect(); specifying more -than one code has no effect (all subsequent service codes are ignored). The -case is different for passive sockets, where multiple service codes (up to 32) -can be set before calling bind(). - -DCCP_SOCKOPT_GET_CUR_MPS is read-only and retrieves the current maximum packet -size (application payload size) in bytes, see RFC 4340, section 14. - -DCCP_SOCKOPT_AVAILABLE_CCIDS is also read-only and returns the list of CCIDs -supported by the endpoint. The option value is an array of type uint8_t whose -size is passed as option length. The minimum array size is 4 elements, the -value returned in the optlen argument always reflects the true number of -built-in CCIDs. - -DCCP_SOCKOPT_CCID is write-only and sets both the TX and RX CCIDs at the same -time, combining the operation of the next two socket options. This option is -preferable over the latter two, since often applications will use the same -type of CCID for both directions; and mixed use of CCIDs is not currently well -understood. This socket option takes as argument at least one uint8_t value, or -an array of uint8_t values, which must match available CCIDS (see above). CCIDs -must be registered on the socket before calling connect() or listen(). - -DCCP_SOCKOPT_TX_CCID is read/write. It returns the current CCID (if set) or sets -the preference list for the TX CCID, using the same format as DCCP_SOCKOPT_CCID. -Please note that the getsockopt argument type here is `int', not uint8_t. - -DCCP_SOCKOPT_RX_CCID is analogous to DCCP_SOCKOPT_TX_CCID, but for the RX CCID. - -DCCP_SOCKOPT_SERVER_TIMEWAIT enables the server (listening socket) to hold -timewait state when closing the connection (RFC 4340, 8.3). The usual case is -that the closing server sends a CloseReq, whereupon the client holds timewait -state. When this boolean socket option is on, the server sends a Close instead -and will enter TIMEWAIT. This option must be set after accept() returns. - -DCCP_SOCKOPT_SEND_CSCOV and DCCP_SOCKOPT_RECV_CSCOV are used for setting the -partial checksum coverage (RFC 4340, sec. 9.2). The default is that checksums -always cover the entire packet and that only fully covered application data is -accepted by the receiver. Hence, when using this feature on the sender, it must -be enabled at the receiver, too with suitable choice of CsCov. - -DCCP_SOCKOPT_SEND_CSCOV sets the sender checksum coverage. Values in the - range 0..15 are acceptable. The default setting is 0 (full coverage), - values between 1..15 indicate partial coverage. -DCCP_SOCKOPT_RECV_CSCOV is for the receiver and has a different meaning: it - sets a threshold, where again values 0..15 are acceptable. The default - of 0 means that all packets with a partial coverage will be discarded. - Values in the range 1..15 indicate that packets with minimally such a - coverage value are also acceptable. The higher the number, the more - restrictive this setting (see [RFC 4340, sec. 9.2.1]). Partial coverage - settings are inherited to the child socket after accept(). - -The following two options apply to CCID 3 exclusively and are getsockopt()-only. -In either case, a TFRC info struct (defined in ) is returned. -DCCP_SOCKOPT_CCID_RX_INFO - Returns a `struct tfrc_rx_info' in optval; the buffer for optval and - optlen must be set to at least sizeof(struct tfrc_rx_info). -DCCP_SOCKOPT_CCID_TX_INFO - Returns a `struct tfrc_tx_info' in optval; the buffer for optval and - optlen must be set to at least sizeof(struct tfrc_tx_info). - -On unidirectional connections it is useful to close the unused half-connection -via shutdown (SHUT_WR or SHUT_RD): this will reduce per-packet processing costs. - - -Sysctl variables -================ -Several DCCP default parameters can be managed by the following sysctls -(sysctl net.dccp.default or /proc/sys/net/dccp/default): - -request_retries - The number of active connection initiation retries (the number of - Requests minus one) before timing out. In addition, it also governs - the behaviour of the other, passive side: this variable also sets - the number of times DCCP repeats sending a Response when the initial - handshake does not progress from RESPOND to OPEN (i.e. when no Ack - is received after the initial Request). This value should be greater - than 0, suggested is less than 10. Analogue of tcp_syn_retries. - -retries1 - How often a DCCP Response is retransmitted until the listening DCCP - side considers its connecting peer dead. Analogue of tcp_retries1. - -retries2 - The number of times a general DCCP packet is retransmitted. This has - importance for retransmitted acknowledgments and feature negotiation, - data packets are never retransmitted. Analogue of tcp_retries2. - -tx_ccid = 2 - Default CCID for the sender-receiver half-connection. Depending on the - choice of CCID, the Send Ack Vector feature is enabled automatically. - -rx_ccid = 2 - Default CCID for the receiver-sender half-connection; see tx_ccid. - -seq_window = 100 - The initial sequence window (sec. 7.5.2) of the sender. This influences - the local ackno validity and the remote seqno validity windows (7.5.1). - Values in the range Wmin = 32 (RFC 4340, 7.5.2) up to 2^32-1 can be set. - -tx_qlen = 5 - The size of the transmit buffer in packets. A value of 0 corresponds - to an unbounded transmit buffer. - -sync_ratelimit = 125 ms - The timeout between subsequent DCCP-Sync packets sent in response to - sequence-invalid packets on the same socket (RFC 4340, 7.5.4). The unit - of this parameter is milliseconds; a value of 0 disables rate-limiting. - - -IOCTLS -====== -FIONREAD - Works as in udp(7): returns in the `int' argument pointer the size of - the next pending datagram in bytes, or 0 when no datagram is pending. - - -Other tunables -============== -Per-route rto_min support - CCID-2 supports the RTAX_RTO_MIN per-route setting for the minimum value - of the RTO timer. This setting can be modified via the 'rto_min' option - of iproute2; for example: - > ip route change 10.0.0.0/24 rto_min 250j dev wlan0 - > ip route add 10.0.0.254/32 rto_min 800j dev wlan0 - > ip route show dev wlan0 - CCID-3 also supports the rto_min setting: it is used to define the lower - bound for the expiry of the nofeedback timer. This can be useful on LANs - with very low RTTs (e.g., loopback, Gbit ethernet). - - -Notes -===== -DCCP does not travel through NAT successfully at present on many boxes. This is -because the checksum covers the pseudo-header as per TCP and UDP. Linux NAT -support for DCCP has been added. diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst index 4c8e896490e05..3894043332dea 100644 --- a/Documentation/networking/index.rst +++ b/Documentation/networking/index.rst @@ -48,6 +48,7 @@ Contents: cdc_mbim cops cxacru + dccp .. only:: subproject and html