The TCP Send Buffer, In-Depth

Earlier this year, my guide to TCP Profile tuning set out some guidelines on how to set send-buffer-size in the TCP profile. Today I'll dive a little deeper into how the send buffer works, and why it's important. I also want to call your attention to cases where the setting doesn't do what you probably think it does.

What is the TCP Send Buffer?

The TCP send buffer contains all data sent to the remote host but not yet acknowledged by that host. With a few isolated exceptions*, data not yet sent is not in the buffer and remains in the proxy buffer, which is in the subject of different profile parameters.

The send buffer exists because sent data might need to be retransmitted. When an acknowledgment for some data arrives, there will be no retransmission and it can free that data.**

Each TCP connection will only take system memory when it has data to store in the buffer, but the profile sets a limit called send-buffer-size to cap the memory footprint of any one connection.

Note that there are two send buffers in most connections, as indicated in the figure above: one for data sent to the client, regulated by the clientside TCP profile; and one for data sent to the server, regulated by the serverside TCP profile.

Cases Where the Configured Send Buffer Limit Doesn't Apply

Through TMOS v12.1, there are important cases where the configured send buffer limit is not operative. It does not apply when the system variable tm.tcpprogressive.autobuffertuning is enabled, which is the default, AND at least one of the following attributes is set in the TCP profile:

  • MPTCP enabled
  • Rate Pacing enabled
  • Tail Loss Probe enabled
  • TCP Fast Open is enabled
  • Nagle's Algorithm in 'Auto' Mode
  • Congestion Metrics Cache Timeout > 0
  • The Congestion Control algorithm is Vegas, Illinois, Woodside, CHD, CDG, Cubic, or Westwood
  • The virtual server executes an iRule with the 'TCP::autowin enable' command.
  • The system variable tm.tcpprogressive is set to 'enable' or 'mptcp'. (The default value is 'negotiate').

Note that none of these settings apply to the default TCP profile, so the default profile enforces the send buffer limit.

Given the conditions above, the send buffer maximum is one of two values, instead of the configured one:

  1. If the configured send buffer size AND the configured receive buffer size are 64K or less, the maximum send buffer size is 64K.
  2. Otherwise, the maximum send buffer size is equal to the system variable tm.tcpprogressive.sndbufmax, which defaults to 16MB.

We fully recognize that this not an intuitive way to operate, and have plans to streamline it soon. However, note that you can force the configured send buffer limit to always apply by setting tm.tcpprogressive.autobuffertuning to 'disabled,' or force it to never apply by enabling tm.tcpprogressive.

What if send-buffer-size is too small?

The Send Buffer size is a practical limit on how much data can be in flight at once. Say you have 10 packets to send (allowed by both congestion control and the peer's receive window) but only 5 spaces in the send buffer. Then the other 5 will have to wait in the proxy buffer until at least the first 5 are acknowledged, which will take one full Round Trip Time (RTT).

Generally, this means your sending rate is firmly limited to

(Sending Rate) = (send-buffer-size) / RTT

regardless of whatever available bandwidth there happens to be, congestion and peer receive windows, and so on. Therefore, we recommend that your send buffer be set to at least your (maximum achievable sending rate) * RTT, more generally known as the Banwidth-Delay Product (BDP). There's more on getting the proper RTT below.

What if send-buffer-size is too large?

If the configured size is larger than the bandwidth-delay product, your BIG-IP may use more memory per connection than it can use at any given time, reducing the capacity of your system.

A sending rate that exceeds the uncongested BDP of the path will cause router queues to build up and possibly overflow, resulting in packet losses. Although this is intrinsic to TCP's design, a sufficiently low send buffer size prevents TCP congestion control from reaching sending rates where it will obviously cause congestion losses.

An over-large send buffer may not matter depending on the remote host's advertised receive window. BIG-IP will not bring data into the send buffer if the receive window says it can't be sent. The size of that receive window is limited by the TCP window scale option (See RFC 7323, Section 2.2) in the SYN packet.

How Do I Get the Bandwidth-Delay Product?

The profile tuning article gives some pointers on using iRules to figure out the bandwidth and RTT on certain paths, which I won't repeat here. TCP Analytics can also generate some useful data here. And you may have various third party tools (the most simple of which is "ping") to get one or both of these metrics.

When computing BDP, beware of the highest RTTs you observe on a path. Why? Bufferbloat. Over some intervals, TCP will send data faster than the bottleneck bandwidth, which fills up queues and adds to RTT. As a result, TCP's peak bandwidth will exceed the path's, and the highest RTT will include a lot of queueing time. This isn't good. A sending rate that includes self-induced queueing delay isn't getting data there any faster; instead, it's just increasing latency for itself and everybody else.

I wish I could give you more precise advice, but there are no easy answers here. To the extent you can probe the characteristics of the networks you operate on, you want to take (max bandwidth) * (min RTT) to find each path's BDP, and take the maximum of all those path BDPs. Easier said then done! But perhaps this article has given you enough intuition about the problem to create a better send-buffer-size setting.

In case a measurement program is not in the cards, I'll leave you with the chart below, which plots the BDP of 64KB, 128KB, and 256KB buffer sizes against representative BDPs of various link types.

* If TCP blocks transmission due to the Nagle Algorithm or Rate Pacing, the unsent data will already be in the send buffer.

** A client can renege on a SACK (Selective Acknowledgment), so this is not sufficient to free the SACKed data.
Published Oct 05, 2016
Version 1.0
  • JG's avatar
    JG
    Icon for Cumulonimbus rankCumulonimbus

    "I'll leave you with the chart below,"

     

    There is no chart to be seen.

     

  • I"m not sure what to tell you. I'm able to see the log/log chart on both my desktop and my phone.

     

  • JG's avatar
    JG
    Icon for Cumulonimbus rankCumulonimbus

    I can see the chart at the bottom now.