TCP Pace Yourself

TCP congestion control strives to optimize network goodput while minimizing packet loss by moderating transmission speed. Unfortunately, the way that congestion control moderates transmission often results in microbursts which can overflow buffers on switches and routers. This packet loss in turn triggers congestion control to moderate transmission speeds by reducing the congestion window; TCP then ramps up and the cycle repeats. This oscillation between under and over utilization of the network, combined with retransmission delays due to packet loss, cause application performance challenges and poor user experience.

How does congestion control work?

TCP moderates its transmission speed by calculating a target number of packets to keep in flight. This is the congestion window (cwnd). Typically, the congestion window starts out small and is increased each time the remote system acknowledges receipt of the packet. In addition to adjusting the congestion window, TCP compares the congestion window to the number of packets in flight on the network. In the event that the congestion window is greater than the number of packets in flight, the stack will transmit the difference between the two.

When do packet bursts occur?

Stretch ACKs are acknowledgements that cover 2 or more segments of unacknowledged data. It is possible based on network conditions that a stretch ACK will result in the acknowledgement of all the in-flight packets. When this happens, the TCP stack’s accounting of the number of packets in flight goes to zero and the stack transmits a sudden burst of traffic to bring the in-flight total back up to the calculated congestion window value.

The exponential growth phase of TCP can also trigger packet bursts. During the exponential growth phase, the TCP stack doubles the number of packets in flight each roundtrip time. On high bandwidth high delay networks, these packets travel as a tightly packed group. Once they arrive at the receiver, ACKs will be generated in response. Because of the close arrival time of each of the data packets, the ACKs will also travel as a group. When this tight cluster of the returning ACKs arrives back at the sending system, a burst twice as large as the previous one is generated.

Rate Pacing to the Rescue

To mitigate this bursty behavior, F5 introduced rate pacing to TCP Express in v11.5 of BIG-IP. Rate pacing  analyzes the traffic on a per flow basis to determine the best speed at which to transmit packets. Rate pacing sends packets at the rate of the slowest draining buffer, resulting in a much smother packet transmission as illustrated to the right. By sending data at a steady pace, large bursts of packets are prevented from traversing the network and causing buffer overflows.






Published Mar 14, 2014
Version 1.0

Was this article helpful?


  • What about increasing initial congestion windows size to 10?



    Would it have the same effect?



  • Jose, Rate pacing works in conjunction with the initial congestion window. Many of our TCP profiles have the initial congestion set to 10 as that does help significantly with the transmission of short-lived TCP connections like web transactions. Changing the initial congestion window results in a faster TCP start up process as many web transactions complete before exiting slow start. Traffic bursts can still occur on the network with the initial congestion window set to 10. I propose using both an initial congestion window of 10 and rate pacing for the best user experience
  • Paul_Szabo_9016's avatar
    Historic F5 Account
    If you're trying to transmit anything larger than ~15 kbytes, then that's where rate pacing helps significantly when the network has congestion. Below that is where you care about initial window.



    For example, I've seen that with this update to our TCP stack that with 8:1 congestion, connected to a white-box TOR switch that has a very small buffer size and no RED and no flow control the switch that we caused the switch to drop ~ .01% packets at fairly high concurrency. A TCP stack without rate pacing and with a small switch buffer would perform >50x worse than that at 8:1 congestion.



    This was constant, overdriven load of 80G of traffic pointed at a 10G interface on a white-box TOR switch.