||Exploring High Speed TCP Incast Congestion Issues in Large Data Centres
||EN615, Level 6, EN Building
||Transmission Control Protocol (TCP) is the defacto reliable transport layer protocol that carries the bulk of all traffic across most common types of data networks, including those in today’s high-speed and low-latency data centres. The main problem faced by commodity data centres is TCP incast congestion. TCP incast is identified as a network transport pathology that affects many-to-one communication patterns in data centres. In an incast communication pattern, client queries are distributed by front-end nodes over TCP connections to multiple back-end servers, and the replies coming back from multiple synchronized servers or responders tend to be highly correlated in time. This highly correlated reply traffic traverses the backplane of the Ethernet switch in a many-to-one fashion thus causing microsecond-timescale congestion (microburst congestion) in the multi-gigabit Ethernet switch buffers along the return path. This synchronized response from multiple servers can result in packets overflowing the switch buffers, resulting in packet losses. TCP reacts far too slowly to packet losses in the context of high bandwidth and low RTT data centre networks. Consequently, any packet losses caused by microburst congestion will trigger TCP retransmissions and will stall the higher layers' data gathering pipeline thus unnecessarily increasing the query response completion time. This results in catastrophic TCP Throughput Collapse as the queuing delay of the flows increases and the perceived application-level throughput collapses far below the link bandwidth.
This seminar introduces the nature of TCP incast traffic and presents how CAIA's NS-3/FreeBSD Network Stack VM Appliance (Version 0.1) can be used as a test bed in simulating various TCP incast scenarios. It also presents experimental results and analysis of the TCP incast simulations, showing how the query response completion time is affected by the increasing number of concurrent responders, the response size to queue buffer size ratios and packet losses. It also explores possible solutions and shows how the M/D/1 queuing model is used for theoretical analysis of system under load in the context of TCP incast.
||Jonathan is enrolled in the Bachelor of Engineering (Telecommunication and Network Engineering) at Swinburne University of Technology and he is currently in his final year of studies. He is currently working on the Incast TCP Congestion Control research project as part of the 2013/2014 summer internship program at the Centre for Advanced Internet Architectures (CAIA).