series-the-tcp-profile
21 TopicsTCP Configuration Just Got Easier: Autobuffer Tuning
One of the hardest things about configuring a TCP profile for optimum performance is picking the right buffer sizes. Guess too small, and your connection can't utilize the available bandwidth. Guess too large, and you're wasting system memory, and potentially adding to path latency through the phenomenon of "bufferbloat." But if you get the path Bandwidth-Delay Product right, you're in Nirvana: close to full link utilization without packet loss or latency spikesdue to overflowing queues. Beginning in F5® TMOS® 13.0, help has arrived with F5's new 'autobuffer tuning' feature. Click the "Auto Proxy Buffer", "Auto Receive Window", and "Auto Send Buffer" boxes in your TCP profile configuration, and you need not worry about those buffer sizes any more. What it Does The concept is simple. To get a bandwidth-delay product, we need the bandwidth and delay. We have a good idea of the delay from TCP's round-trip-time (RTT) measurement. In particular, the minimum observed RTT is a good indicator ofthe delay when queues aren't built up from over-aggressive flows. The bandwidth is a little trickier to measure. For the send buffer, the algorithm looks at long term averages of arriving acks to estimate how quickly data is arriving at the destination. For the receive buffer, it's fairly straightforward to count the incoming bytes. The buffers start at 64 KB. When the Bandwidth-Delay Product (BDP) calculation suggests that's not enough, the algorithm increments the buffers upwards and takes new measurements. After a few iterations, your connection buffer sizes should converge on something approaching the path BDP, plus a small bonus to cover measurement imprecision and leave space for later bandwidth increases. Knobs! Lots of Knobs! There are no configuration options in the profile to control autotuning except to turn it on and off. We figure you don't want to tune your autotuning! However, for inveterate optimizers, there are some sys db variables under the hood to make this feature behave exactly howyou want. For send buffers,the algorithm computes bandwidthand updates the buffer size every tm.tcpprogressive.sndbufminintervalmilliseconds (default 100). The send buffer size is determined by (bandwidth_max * RTTmin) * tm.tcpprogressive.sndbufbdpmultiplier + tm.tcpprogressive.sndbufincr. The defaults for the multiplier and increment are 1 and 64KB, respectively. Both of these quantities exist to provide a little "wiggle room" to discovernewly available bandwidth and provision for measurement imprecision. The initial send buffer size starts attm.tcpprogressive.sndbufmin(default 64KB) and is limited to tm.tcpprogressive.sndbufmax (default 16MB). For receive buffers, replace 'sndbuf' with 'rcvbuf' above. For proxy buffers, the high watermarkisMAX(send buffer size, peer TCP's receive window + tm.tcpprogressive.proxybufoffset) and the low watermarkis (proxy buffer high) - tm.tcpprogressive.proxybufoffset. The proxy buffer high is limited by tm.tcpprogressive.proxybufmin (default 64KB) andtm.tcpprogressive.proxybufmax (default 2MB). When send or receive buffers change, proxy buffers are updated too. This May Not Be For SomeUsers Some of you out there already have a great understanding of your network, have solid estimates of BDPs, and have configured your buffers accordingly. You may be better off sticking with your carefully measured settings.Autobuffer tuning starts out with no knowledge of the network and converges on the right setting. That's inferior to knowing the correct setting beforehand and going right to it. Autotuning Simplifies TCP Configuration We've heard from the field that many people find TCP profiles too hard to configure. Together with the Autonagle option, autobuffer tuning is designed to take some of the pain of getting the most out of your TCP stack. If you don't know where to start with setting buffer sizes, turn on autotuning and let the BIG-IP® set them for you.1.8KViews4likes3CommentsThe TCP Proxy Buffer
The proxy buffer is probably the least intuitive of the three TCP buffer sizes that you can configure in F5's TCP Optimization offering. Today I'll describe what it does, and how to set the "high" and "low" buffer limits in the profile. The proxy buffer is the place BIG-IP stores data that isn't ready to go out to the remote host. The send buffer, by definition, is data already sent but unacknowledged. Everything else is in the proxy buffer. That's really all there is to it. From this description, it should be clear why we need limits on the size of this buffer. Probably the most common deployment of a BIG-IP has a connection to the server that is way faster than the connection to the client. In these cases, data will simply accumulate at the BIG-IP as it waits to pass through the bottleneck of the client connection. This consumes precious resources on the BIG-IP, instead of commodity servers. So proxy-buffer-high is simply a limit where the BIG-IP will tell the server, "enough." proxy-buffer-low is when it will tell the server to start sending data again. The gap between the two is simply hysteresis: if proxy-buffer-high were the same as proxy-buffer-low, we'd generate tons of start/stop signals to the server as the buffer level bounced above and below the threshold. We like that gap to be about 64KB, as a rule of thumb. So how does it tell the server to stop? TCP simply stops increasing the receive window: once advertised bytes avaiable have been sent, TCP will advertise a zero receive window. This stops server transmissions (except for some probes) until the BIG-IP signals it is ready again by sending an acknowledgment with a non-zero receive window advertisement. Setting a very large proxy-buffer-high will obviously increase the potential memory footprint of each connection. But what is the impact of setting a low one? On the sending side, the worst-case scenario is that a large chunk of the send buffer clears at once, probably because a retransmitted packet allows acknowledgement of a missing packet and a bunch of previously received data. At worst, this could cause the entire send buffer to empty and cause the sending TCP to ask the proxy buffer to accept a whole send buffer's worth of data. So if you're not that worried about the memory footprint, the safe thing is to set proxy-buffer-high to the same size as the send buffer. The limits on proxy-buffer-low are somewhat more complicated to derive, but the issue is that if a proxy buffer at proxy-buffer-low suddenly drains, it will take oneserversideRound Trip Time (RTT) to send the window update and start getting data again. So the total amount of data that has to be in the proxy buffer at the low point is the RTT of the serverside times the bandwidth of the clientside. If the proxy buffer is filling up, the serverside rate generally exceeds the clientside data rate, so that will be sufficient. If you're not deeply concerned about the memory footprint of connections, the minimum proxy buffer settings that will prevent any impairment of throughput are as follows for the clientside: proxy-buffer-high = send-buffer-size = (clientside bandwidth) * (clientside RTT) proxy-buffer-low = (clientside bandwidth) * (serverside RTT) proxy-buffer-low must be sufficiently below proxy-buffer-high to avoid flapping. If youarerunning up against memory limits, then cutting back on these settings will only hurt you in the cases above. Economizing on proxy buffer space is definitely preferable to limiting the send rate by making the send buffer too small.4.3KViews3likes14CommentsIs TCP's Nagle Algorithm Right for Me?
Of all the settings in the TCP profile, the Nagle algorithm may get the most questions. Designed to avoid sending small packets wherever possible, the question of whether it's right for your application rarely has an easy, standard answer. What does Nagle do? Without the Nagle algorithm, in some circumstances TCP might send tiny packets. In the case of BIG-IP®, this would usually happen because the server delivers packets that are small relative to the clientside Maximum Transmission Unit (MTU). If Nagle is disabled, BIG-IP will simply send them, even though waiting for a few milliseconds would allow TCP to aggregate data into larger packets. The result can be pernicious. Every TCP/IP packet has at least 40 bytes of header overhead, and in most cases 52 bytes. If payloads are small enough, most of the your network traffic will be overhead and reduce the effective throughput of your connection. Second, clients with battery limitations really don't appreciate turning on their radios to send and receive packets more frequently than necessary. Lastly, some routers in the field give preferential treatment to smaller packets. If your data has a series of differently-sized packets, and the misfortune to encounter one of these routers, it will experience severe packet reordering, which can trigger unnecessary retransmissions and severely degrade performance. Specified in RFC 896 all the way back in 1984, the Nagle algorithm gets around this problem by holding sub-MTU-sized data until the receiver has acked all outstanding data. In most cases, the next chunk of data is coming up right behind, and the delay is minimal. What are the Drawbacks? The benefits of aggregating data in fewer packets are pretty intuitive. But under certain circumstances, Nagle can cause problems: In a proxy like BIG-IP, rewriting arriving packets in memory into a different, larger, spot in memory taxes the CPU more than simply passing payloads through without modification. If an application is "chatty," with message traffic passing back and forth, the added delay could add up to a lot of time. For example, imagine a network has a 1500 Byte MTU and the application needs a reply from the client after each 2000 Byte message. In the figure at right, the left diagram shows the exchange without Nagle. BIG-IP sends all the data in one shot, and the reply comes in one round trip, allowing it to deliver four messages in four round trips. On the right is the same exchange with Nagle enabled. Nagle withholds the 500 byte packet until the client acks the 1500 byte packet, meaning it takes two round trips to get the reply that allows the application to proceed. Thus sending four messages takes eight round trips. This scenario is a somewhat contrived worst case, but if your application is more like this than not, then Nagle is poor choice. If the client is using delayed acks (RFC 1122), it might not send an acknowledgment until up to 500ms after receipt of the packet. That's time BIG-IP is holding your data, waiting for acknowledgment. This multiplies the effect on chatty applications described above. F5 Has Improved on Nagle The drawbacks described above sound really scary, but I don't want to talk you out of using Nagle at all. The benefits are real, particularly if your application servers deliver data in small pieces and the application isn't very chatty. More importantly, F5® has made a number of enhancements that remove a lot of the pain while keeping the gain: Nagle-aware HTTP Profiles: all TMOS HTTP profiles send a special control message to TCP when they have no more data to send. This tells TCP to send what it has without waiting for more data to fill out a packet. Autonagle:in TMOS v12.0, users can configure Nagle as "autotuned" instead of simply enabling or disabling it in their TCP profile. This mechanism starts out not executing the Nagle algorithm, but uses heuristics to test if the receiver is using delayed acknowledgments on a connection; if not, it applies Nagle for the remainder of the connection. If delayed acks are in use, TCP will not wait to send packets but will still try to concatenate small packets into MSS-size packets when all are available. [UPDATE:v13.0 substantially improves this feature.] One small packet allowed per RTT: beginning with TMOS® v12.0, when in 'auto' mode that has enabled Nagle, TCP will allow one unacknowledged undersize packet at a time, rather than zero. This speeds up sending the sub-MTU tail of any message while not allowing a continuous stream of undersized packets. This averts the nightmare scenario above completely. Given these improvements, the Nagle algorithm is suitable for a wide variety of applications and environments. It's worth looking at both your applications and the behavior of your servers to see if Nagle is right for you.1.3KViews2likes5CommentsStop Using the Base TCP Profile!
[Update 1 Mar 2017:F5 has new built-in profiles in TMOS v13.0. Although the default profile settings still haven't changed, there is good news on that from as well.] If the customer data I've seen is any indication, the vast majority of our customers are using the base 'tcp' profile to configure their TCP optimization. This haspoor performance consequencesand I strongly encourage you to replace it immediately. What's wrong with it? The Buffers are too small.Both the receive and send buffers are limited to 64KB, and the proxy buffer won't exceed 48K . If the bandwidth/delay product of your connection exceeds the send or receive buffer, which it will in most of today's internet for all but the smallest files and shortest delays, your applications will be limited not by the available bandwidth but by an arbitrary memory limitation. The Initial Congestion Window is too small.As the early thin-pipe, small-buffer days of the internet recede, the Internet Engineering Task Force (see IETFRFC 6928) increased the allowed size of a sender's initial burst. This allows more file transfers to complete in single round trip time and allows TCP to discover the true available bandwidth faster. Delayed ACKs.The base profile enables Delayed ACK, which tries to reduce ACK traffic by waiting 200ms to see if more data comes in. This incurs a serious performance penalty on SSL, among other upper-layer protocols. What should you do instead? The best answer is to build a custom profile based on your specific environment and requirements. But we recognize that some of you will find that daunting! So we've created a variety of profiles customized for different environments. Frankly, we should do some work to improve these profiles, but even today there are much better choices than base 'tcp'. If you have an HTTP profile attached to the virtual, we recommend you use tcp-mobile-optimized. This is trueeven if your clients aren't mobile. The name is misleading! As I said, the default profiles need work. If you're just a bit more adventurous with your virtual with an HTTP profile, then mptcp-mobile-optimizedwill likely outperform the above. Besides enabling Multipath TCP (MPTCP)for clients that ask for it, it uses a more advanced congestion control ("Illinois") and rate pacing. We recognize, however, that if you're still using the base 'tcp' profile today then you're probably not comfortable with the newest, most innovative enhancements to TCP. So plain old tcp-mobile-optimized might be a more gentle step forward. If your virtual doesn't have an HTTP profile, the best decision is to use a modified version of tcp-mobile-optimized or mptcp-mobile-optimized. Just derive a profile from whichever you prefer and disable the Nagle algorithm. That's it! If you are absolutely dead set against modifying a default profile, then wam-tcp-lan-optimized is the next best choice. It doesn't really matter if the attached network is actually a LAN or the open internet. Why did we create a default profile with undesirable settings? That answer is lost in the mists of time. But now it's hard to change: altering the profile from which all other profiles are derived will cause sudden changes in customer TCP behavior when they upgrade. Most would benefit, and many may not even notice, but we try to not to surprise people. Nevertheless, if you want a quick, cheap, and easy boost to your application performance, simply switch your TCP profile from the base to one of our other defaults. You won't regret it.4KViews1like27CommentsF5 Unveils New Built-In TCP Profiles
[Update 3/17:Some representative performance results are at the bottom] Longtime readers know thatF5's built-in TCP profileswere in need of a refresh. I'm pleased to announce that inTMOS® version13.0, available now, there are substantial improvements to the built-in profile scheme. Users expect defaults to reflect best common practice, and we've made a huge step towards that being true. New Built-in Profiles We've kept virtually all of the old built-in profiles, for those of you who are happy with them, or have built other profiles that derive from them. But there are four new ones to load directly into your virtual servers or use a basis for your own tuning. The first three are optimized for particular network use cases: f5-tcp-wan, f5-tcp-lan, and f5-tcp-mobile are updated versions of tcp-wan-optimized, tcp-lan-optimized, and tcp-mobile-optimized. These adapt all settings to the appropriate link types, except that they don't enable the very newest features. If the hosts you're communicating with tend to use one kind of link, these are a great choice. The fourth isf5-tcp-progressive.This is meant to be a general-use profile (like the tcp default), but it contains the very latest features for early adopters. In our benchmark testing, we had the following criteria: f5-tcp-wan, f5-tcp-lan, and f5-tcp-mobile achieved throughput at least as high, and often better, than the default tcp profile for that link type. f5-tcp-progressive had equal or higher throughput than default TCP across all representative network types. The relative performance of f5-tcp-wan/lan/mobile and progressive in each link type will vary given the new features that f5-tcp-progressive enables. Living, Read-Only Profiles These four new profiles,and the default 'tcp' profile,are now "living." This means that we'll continually update them with best practices as they evolve. Brand-new features, if they are generally applicable, will immediately appear in f5-tcp-progressive. For our more conservative users, these new features will appear in the other four living profiles after a couple of releases. The default tcp profile hasn't changed yet, but it will in future releases! These five profiles are also now read-only, meaning that to make modifications you'll have to create a new profile that descends from these. This will aid in troubleshooting. If there are any settings that you like so much that you never want them to change, simply click the "custom" button in the child profile and the changes we push out in the future won't affect your settings. How This Affects Your Existing Custom Profiles If you've put thought into your TCP profiles, we aren't going to mess with it. If your profile descends from any of the previous built-ins besides default 'tcp,' there is no change to settings whatsoever. Upgrades to 13.0 will automatically prevent disruptions to your configuration.We've copied all of the default tcp profile settings to tcp-legacy, which is not a "living" profile. All of the old built-in profiles (like tcp-wan-optimized), and any custom profiles descended from default tcp, will now descend instead from tcp-legacy, and never change due to upgrades from F5. tcp-legacy will also include any modifications you made to the default tcp profile, as this profile is not read-only. Our data shows that few, if any, users are using the current (TMOS 12.1 and before) tcp-legacy settings.If you are, it is wise to make a note of those settings before you upgrade. How This Affects Your Existing Virtual Servers As the section above describes, if your virtual server uses any profile other than default 'tcp' or tcp-legacy, there will be no settings change at all. Given the weaknesses of the current default settings, we believe most users who use virtuals with the TCP default are not carefully considering their settings. Those virtuals will continue to use the default profile, and therefore settings will begin to evolve as we modernize the default profile in 13.1 and later releases. If you very much like the default TCP profile, perhaps because you customized it when it wasn't read-only, you should manually change the virtual to use tcp-legacy with no change in behavior. Use the New Profiles for Better Performance The internet changes. Bandwidths increase, we develop better algorithms to automatically tune your settings, and the TCP standard itself evolves. If you use the new profile framework, you'll keep up with the state of the art and maximize the throughput your applications receive. Below, I've included some throughput measurements from our in-house testing. We used parameters representative of seven different link types and measured the throughput using some relevant built-in profiles. Obviously, the performance in your deployment may vary. Aside from LANs, where frankly tuning isn't all that hard, the benefits are pretty clear.4.4KViews1like9CommentsInvestigating the LTM TCP Profile: Max Syn Retransmissions & Idle Timeout
Introduction The LTM TCP profile has over thirty settings that can be manipulated to enhance the experience between client and server. Because the TCP profile is applied to the virtual server, the flexibility exists to customize the stack (in both client & server directions) for every application delivered by the LTM. In this series, we will dive into several of the configurable options and discuss the pros and cons of their inclusion in delivering applications. Nagle's Algorithm Max Syn Retransmissions & Idle Timeout Windows & Buffers Timers QoS Slow Start Congestion Control Algorithms Acknowledgements Extended Congestion Notification & Limited Transmit Recovery The Finish Line Quick aside for those unfamiliar with TCP: the transmission controlprotocol (layer4) rides on top of the internetprotocol (layer3) and is responsible for establishing connections between clients and servers so data can be exchanged reliably between them. Normal TCP communication consists of a client and a server, a 3-way handshake, reliable data exchange, and a four-way close. With the LTM as an intermediary in the client/server architecture, the session setup/teardown is duplicated, with the LTM playing the role of server to the client and client to the server. These sessions are completely independent, even though the LTM can duplicate the tcp source port over to the server side connection in most cases, and depending on your underlying network architecture, can also duplicate the source IP. Max Syn Retransmission This option specifies the maximum number of times the LTM will resend a SYN packet without receiving a corresponding SYN ACK from the server. The default value was four in versions 9.0 - 9.3, and is three in versions 9.4+. This option has iRules considerations with the LB_FAILED event. One of the triggers for the event is an unresponsive server, but the timeliness of this trigger is directly related to the max syn retransmission setting. The back-off timer algorithm for SYN packets effectively doubles the wait time from the previous SYN, so the delay grows excessive with each additional retransmission allowed before the LTM closes the connection: Retransmission Timers v9.0-v9.3 v9.4 Custom-2 Custom-1 Initial SYN 0s 0s 0s 0s 1st Retransmitted SYN 3s 3s 3s 3s 2nd Retransmitted SYN 6s 6s 6s NA 3rd Retransmitted SYN 12s 12s NA NA 4th Retransmitted SYN 24s NA NA NA LB_FAILED triggered 45s 21s 9s 3s Tuning this option down may result in faster response on your LB_FAILED trigger, but keep in mind the opportunity for false positives if your server gets too busy. Note that monitors are the primary means to ensure available services, but the max syn retransmission setting can assist. If the LB_FAILED event does trigger, you can check the monitor status in your iRule, and if the monitor has not yet been marked down, you can do so to prevent other new connections from waiting: when LB_FAILED { if { [LB::status pool [LB::server pool] member [LB::server addr] eq "up"] } { LB::down } } Idle Timeout The explanation of the idle timeout is fairly intuitive. This setting controls the number of seconds the connection remains idle before the LTM closes it. For most applications, the default 300 seconds is more than enough, but for applications with long-lived connections like remote desktop protocol, the user may want to leave the desk and get a cup of coffee without getting dumped but the administrators don't want to enable keepalives. The option can be configured with a numeric setting in seconds, or can be set to indefinite, in which case the abandoned connections will sit idle until a reaper reclaims them or services are restarted. I try to isolate applications onto their own virtual servers so I can maximize the profile settings, but in the case where a wildcard virtual is utilized, the idle timeout can be set in an iRule with the IP::idle_timeout command: when CLIENT_ACCEPTED { switch [TCP::local_port] { "22" { IP::idle_timeout 600 } "23" { IP::idle_timeout 600 } "3389" { IP::idle_timeout 3600 } default { IP::idle_timeout 120 } } If you look at the connection table, the current and the maximum (in parentheses) idle values are shown: b conn client 10.1.1.1 show all | grep –v pkts VIRTUAL 10.1.1.100:3389 <-> NODE any6:any CLIENTSIDE 10.1.1.1:36023 <-> 10.1.1.100:3389 SERVERSIDE 10.1.1.100:36023 <-> 10.1.1.50:3389 PROTOCOL tcp UNIT 1 IDLE 124 (3600) LASTHOP 1 00:02:0a:13:ef:80 Next week, we'll take a look at windows and buffers.1.2KViews1like1Comment