Centre for Advanced Internet Architectures, Swinburne University of Technology, Melbourne, Australia 15th March, 2013 Multipath TCP For FreeBSD Kernel Patch v0.2 ---------------------------------------------- OVERVIEW ---------------------------------------------- RFC6824 [1] proposes extensions to TCP [2] whereby multiple addresses (and potentially paths) can be used over a single TCP connection. This is referred to as 'Multipath TCP'. The extension is designed to maintain compatibility with existing TCP Socket APIs and is therefore backwards-compatible with existing TCP applications. It is recommended that the reader should become familiar with the Multipath TCP RFC before attempting to apply the kernel patch. At the time of writing, a single Linux reference implementation is available from [3] as kernel sources, or as a pre-compiled package for debian-based Linuxes. This distribution contains the v0.2 implementation of Multipath TCP for FreeBSD. It is applied as a kernel patch against revision 248226 of FreeBSD-10. Instructions for acquiring the FreeBSD source and applying the patch are provided in the INSTALLATION section. As this is the initial release of the Multipath Kernel it should be considered for experimental use only. In addition, this release is not fully compliant with the RFC (see KNOWN LIMITATIONS). ---------------------------------------------- CHANGES SINCE LAST RELEASE ---------------------------------------------- Please see changelog at: http://caia.swin.edu.au/urp/newtcp/mptcp/tools/mptcp-changelog-v0.2.txt ---------------------------------------------- UNDER DEVELOPMENT/TESTING FOR NEXT RELEASE ---------------------------------------------- o Congestion Control Hooks for per-subflow CC adjustment o Complete compatibility with Linux implementation [3] o Planned release date: 29th March 2013 ---------------------------------------------- LICENCE ---------------------------------------------- The FreeBSD multipath kernel patch is released under a BSD licence. Refer to licence headers in each source file for further details. ---------------------------------------------- INSTALLATION ---------------------------------------------- Prerequisites: o FreeBSD-10.x. We recommend installing the following snapshot: ftp://ftp.freebsd.org/pub/FreeBSD/snapshots/amd64/amd64/ISO-IMAGES/ 10.0/FreeBSD-10.0-CURRENT-amd64-20130302-r247640-release.iso o Install the devel/subversion port with the default options: cd /usr/ports/devel/subversion/ make install clean To obtain the correct revision of the FreeBSD source tree that this patch applies to, and store it in the local directory "/path/to/src", run: svn co -r 248226 http://svn.freebsd.org/base/head We have developed and tested the patch against this revision of FreeBSD-10. Our patches might apply to later revisions, but we cannot be sure they will apply cleanly. Issuing the following commands will build and install the mptcp-enabled distribution: cd fetch http://caia.swin.edu.au/urp/newtcp/mptcp/tools/mptcp_v0.2_10.x.248226.patch patch -p1 < mptcp_v0.2_10.x.248226.patch make -j`sysctl -n hw.ncpu` buildworld buildkernel installkernel installworld mergemaster -iF -m shutdown -r now Upon reboot MPTCP will be enabled by default, and the host will attempt to use MPTCP when setting up new connections. ---------------------------------------------- RUN TIME CONFIGURATION ---------------------------------------------- There are three sysctl variables that provide configuration options: net.inet.tcp.mptcp.linux_compat Changes some code paths to allow for improved interoperability with the Linux implementation. Enabled by default. net.inet.tcp.mptcp.mp_addresses Additional addresses are made available using this variable. A list of addresses are provided as input, and these will be advertised to the remote host when a multipath connection becomes established. This setting can be left empty if you only wish to use a single address on the local host (the default address, or master subflow address, is determined by the route table). For example, on a host with two addresses: 192.168.0.10 and 192.168.0.11, you can add the '.11' address to be used as a slave subflow in multipath connections with the following command: sysctl net.inet.tcp.mptcp.mp_addresses="192.168.0.11" In this case '.10' will act as the primary subflow, while '.11' will be advertised with ADD_ADDR once multipath is established, and then an MP_JOIN will be sent. Multiple addresses can be added as a space delimited string: sysctl net.inet.tcp.mptcp.mp_addresses="192.168.0.11 10.0.0.20" net.inet.tcp.mptcp.max_subflows Specifies the maximum number of subflows that can be attached to a single multipath connection. The default value is 8. net.inet.tcp.mptcp.mp_debug The kernel features multi-level debugging info, the depth and class of which is set using this sysctl variable. There are currently three classes of debug info that can be displayed: MPSESSION - General session information (such as hashes and keys) DSMAP - data-sequence map info (e.g. map lengths etc) SBSTATUS - the status of the socket buffers REASS - reassembly-related information Each of these classes has a level of verbosity, which ranges from 0 (no output) to 5 (fully verbose). An example of usage is shown below (enables full verbosity DSMAP): sysctl net.inet.tcp.mptcp.mp_debug="DSMAP:5" In this case we use the format to enable debugging. To turn off debugging, the following command would be issued: sysctl net.inet.tcp.mptcp.mp_debug="DSMAP:0" Entering the following will print a string with the current debug configuration: sysctl net.inet.tcp.mptcp.mp_debug ---------------------------------------------- EXAMPLE USAGE ---------------------------------------------- This patch supports FreeBSD-to-FreeBSD multipath sessions. MP_JOINs are issued from the 'active opener' (client) side of the connection. ADD_ADDR options are sent immediately after the MP session becomes established. After all addresses have been advertised, SYN packets with an MP_JOIN option are sent. A simple scenario is illustrated below. In the figure, the Host A has two addresses and is connected to Host B via a switch. In this case both hosts are on the same subnet. A - ACTIVE B - PASSIVE CLIENT SWITCH (SERVER) +----+ +----+ | A1| <=======> +---+ | | | | | | <=========> |B1 | | A2| <-------> +---+ | | +----+ +----+ Host A opens a connection to host B via address A1. A multipath connection is then established between addresses A1 and B1. Once established, Host A sends an ADD_ADDR option to Host B, with the details of address A2. Host B2 associates this address with the established connection. Host A then sends a SYN from A2 to B1, with the MP_JOIN option. Host B recognises that this address is associated with the existing MPTCP session, and a handshake occurs that adds this new path to the connection. This scenario can be tested with any TCP application, for example with Iperf*: (1) Set addresses on Host A: A1 = 192.168.0.10, A2 = 192.168.0.11 ifconfig A1 inet 192.168.0.10/24 ifconfig A2 inet 192.168.0.11/24 (2) Set address Host B: B1 = 192.168.0.20 ifconfig B1 inet 192.168.0.20/24 (3) On Host A use the sysctl variable to set 192.168.0.11 as a slave subflow: sysctl net.inet.tcp.mptcp.mp_addresses="192.168.0.11" (4) Run Iperf on Host B (iperf server): iperf -s (5) Run Iperf on the Host A (the client), connecting to the server for 5 seconds: iperf -c 192.168.0.20 -t 5 * Iperf is a network throughput testing utility. It can be installed from ports: ports/benchmarks/iperf ---------------------------------------------- CHANGES TO THE KERNEL ---------------------------------------------- Enabling MPTCP support in the FreeBSD kernel required substantial changes to the TCP stack, in particular the TCP connection setup, input and output paths and socket buffer access methods. The changes in brief (CAPABILITIES AND FEATURES provides some additional depth): o Creation of multipath Session Control Block (SCB) and the redefinition of existing TCP Control Block (TCPCB) to act as a MPTCP subflow. o Changes to how control blocks are attached and detached from a socket. A single socket can now support multiple IP and TCP control blocks. o Changes to socket buffer access routines and accounting. Mechanisms for shared access to socket buffers from multiple TCP subflows. o Option adding and parsing code for MPTCP in input and output paths. o Locking mechanisms to handle concurrent access to data-structures used in MPTCP connections. ---------------------------------------------- CAPABILITIES AND FEATURES ---------------------------------------------- o Compatible with Standard TCP: The implementation can establish standard TCP connections with non-MPTCP hosts. o Multipath Capable: Can establish, add additional subflows to, and terminate a multipath session. o Basic Linux interoperability: Can establish and carry out a single-subflow MPTCP connection, without Data-level FIN handshake. o MPTCP signalling: MP_CAPABLE, MP_ADD_ADDR, MP_JOIN and DSS exchanges are implemented and functional. Other options are currently parsed but not acted upon. o Deferred Reassembly: TCP segment reassembly lists have been replaced with a single data-structure. Data-level reassembly is deferred to when data is copied out of the receive buffer and into the application. o Mediated Socket Buffer Access: Access to the socket and socket buffers is now restricted to the multipath control block. The multipath control block 'subdivides' the socket buffers and allocates portions to individual subflows. o Multi-packet DSS Maps: A DSN can cover multiple segments, including up to the size of the send buffer. o ADD_ADDR and JOIN issued by the active opener after connection is established ---------------------------------------------- KNOWN LIMITATIONS ---------------------------------------------- o TCP Segmentation Offload (TSO) disabled The MPTCP code has not been tested and debugged with TSO enabled, thus it has been disabled by default. Enabling TSO may cause unpredictable behaviour. o Delayed ACK is disabled Multipath connections will work with or without Delayed ACK enabled. However when testing with smaller socket send buffer sizes delayed ACK can severely limit the throughput of a connection. o Fallback to 'infinite map' not handled A fully established multipath connection will not fall back into standard TCP "infinite map" mode if an error is detected. o Data Sequence closing states not fully implemented The Data-FIN closing sequence is not carried out in the current implementation. When an application calls a close() on the socket, each of the subflows is disconnected (standard TCP close) and the connection is closed. We do not wait for outstanding data-level segments to be acknowledged (however subflows are able to finish sending any data that has already been mapped). o There is no connection timeout at the Data Level. The connection may stall indefinitely if all subflows have stalled. o Can initiate but do not teardown slave subflows (a) The implementation will issue ADD_ADDR and JOIN signals, but will not attempt to remove advertised addresses, and does not close any subflows once they have been established. (b) Misbehaving subflows are not RST during a multipath session. o No automated path discovery, basic path management (a) Addresses are not automatically discovered. They are added via a sysctl variable (see usage details above). Setting this sysctl makes the address available to any multipath connection that becomes established. (b) Addresses learnt during a connection (via the ADD_ADDR option) are stored locally in the 'multipath layer', rather than in an independent, globally accessible path manager. o No coupled congestion control Coupled congestion control, as defined in [4], is not implemented. o Security (hmacs, etc) only at most basic level for operation Hashes and keys are generated and exchanged where required, but are not validated internally. The third packet of the MP_JOIN handshake does not carry a 160-bit HMAC. o Only 32-Bit DSNs on the wire Data sequence numbers are tracked as 64-bit values internally, but only the lower 32-bits are sent over the wire. o Checksumming is disabled Checksumming is not implemented in this version of the patch. o IPv4 only IPv6 code paths have not been fully implemented and tested as of this version. o Basic packet scheduler The current packet scheduler features no 'intelligence' and assigns data to subflows on a first-come first-served basis. A single subflow may monopolise sending data when using small send buffer sizes. o Simultaneous Opens/Closes These have not been tested and may result in unpredictable behaviour o Performance not optimised Performance testing and profiling is in-progress. o Sequence number wrapping Not all sequence number wraps are accounted for. Sequence number wraps that are not handled will call panic(). o Occasional Stalls (with multiple subflows) o Occasional panic/KASSERT trigger Some KASSERT and panic conditions will occasionally trigger and break the system to gdb. ---------------------------------------------- ACKNOWLEDGEMENTS ---------------------------------------------- This project has been made possible in part by a gift from The Cisco University Research Program Fund, a corporate advised fund of Silicon Valley Community Foundation. ---------------------------------------------- RELATED READING ---------------------------------------------- This software was developed at Swinburne University's Centre for Advanced Internet Architectures, under the umbrella of the NewTCP research project. More information on the project is available at: http://caia.swin.edu.au/urp/newtcp/ The FreeBSD MPTCP implementation homepage can be found at: http://caia.swin.edu.au/urp/newtcp/mptcp [5] outlines the process of designing the MPTCP protocol, by the RFC authors. ---------------------------------------------- REFERENCES ---------------------------------------------- [1] Ford, A. et al, "TCP Extensions for Multipath Operation with Multiple Addresses", RFC 6824, January 2013. [2] Postel, J., "Transmission Control Protocol", RFC 793, September 1981. [3] "MultiPath TCP - Linux Kernel implementation", Homepage, http://multipath-tcp.org/, March 2013 [4] Raiciu, C. et al, "Coupled Congestion Control for Multipath Transport Protocols", RFC 6356, October 2011. [5] Raiciu, C. et al, "How Hard Can It Be? Designing and Implementing a Deployable Multipath TCP", USENIX Symposium of Networked Systems Design and Implementation (NSDI'12), San Jose (CA), 2012. ---------------------------------------------- AUTHORS ---------------------------------------------- The FreeBSD MPTCP implementation was first released in 2013 by Nigel Williams and Lawrence Stewart whilst working on the Multipath TCP research project at Swinburne University's Centre for Advanced Internet Architectures, Melbourne, Australia.