series-the-tcp-profile
21 TopicsThe TCP Send Buffer, In-Depth
Earlier this year, my guide to TCP Profile tuning set out some guidelines on how to set send-buffer-size in the TCP profile. Today I'll dive a little deeper into how the send buffer works, and why it's important. I also want to call your attention to cases where the setting doesn't do what you probably think it does. What is the TCP Send Buffer? The TCP send buffer contains all data sent to the remote host but not yet acknowledged by that host. With a few isolated exceptions*, data not yet sent is not in the buffer and remains in the proxy buffer, which is in the subject of different profile parameters. The send buffer exists because sent data might need to be retransmitted. When an acknowledgment for some data arrives, there will be no retransmission and it can free that data.** Each TCP connection will only take system memory when it has data to store in the buffer, but the profile sets a limit called send-buffer-size to cap the memory footprint of any one connection. Note that there are two send buffers in most connections, as indicated in the figure above: one for data sent to the client, regulated by the clientside TCP profile; and one for data sent to the server, regulated by the serverside TCP profile. Cases Where the Configured Send Buffer Limit Doesn't Apply Through TMOS v12.1, there are important cases where the configured send buffer limit is not operative. It does not apply when the system variable tm.tcpprogressive.autobuffertuning is enabled,which is the default, ANDat least oneof the following attributes is set in the TCP profile: MPTCP enabled Rate Pacing enabled Tail Loss Probe enabled TCP Fast Open is enabled Nagle's Algorithm in 'Auto' Mode Congestion Metrics Cache Timeout > 0 The Congestion Control algorithm is Vegas, Illinois, Woodside, CHD, CDG, Cubic, or Westwood The virtual server executes an iRule with the 'TCP::autowin enable' command. The system variable tm.tcpprogressive is set to 'enable' or 'mptcp'. (The default value is 'negotiate'). Note that none of these settings apply to the default TCP profile, so the default profile enforces the send buffer limit. Given the conditions above, the send buffer maximum is one of two values, instead of the configured one: If the configured send buffer size AND the configured receive buffer size are 64K or less, the maximum send buffer size is 64K. Otherwise, the maximum send buffer size is equal to the system variable tm.tcpprogressive.sndbufmax, which defaults to 16MB. We fully recognize that this not an intuitive way to operate, and have plans to streamline it soon. However, note that you can force the configured send buffer limit to always apply by setting tm.tcpprogressive.autobuffertuning to 'disabled,' or force it to never apply by enabling tm.tcpprogressive. What if send-buffer-size is too small? The Send Buffer size is a practical limit on how much data can be in flight at once. Say you have 10 packets to send (allowed by both congestion control and the peer's receive window) but only 5 spaces in the send buffer. Then the other 5 will have to wait in the proxy buffer until at least the first 5 are acknowledged, which will take one full Round Trip Time (RTT). Generally, this means your sending rate is firmly limited to (Sending Rate) = (send-buffer-size) / RTT regardless of whatever available bandwidth there happens to be, congestion and peer receive windows, and so on. Therefore, we recommend that your send buffer be set to at least your (maximum achievable sending rate) * RTT, more generally known as the Banwidth-Delay Product (BDP). There's more on getting the proper RTT below. What if send-buffer-size is too large? If the configured size is larger than the bandwidth-delay product, your BIG-IP may use more memory per connection than it can use at any given time, reducing the capacity of your system. A sending rate that exceeds the uncongested BDP of the path will cause router queues to build up and possibly overflow, resulting in packet losses. Although this is intrinsic to TCP's design, a sufficiently low send buffer size prevents TCP congestion control from reaching sending rates where it will obviously cause congestion losses. An over-large send buffer may not matter depending on the remote host's advertised receive window. BIG-IP will not bring data into the send buffer if the receive window says it can't be sent. The size of that receive window is limited by the TCP window scale option (See RFC 7323, Section 2.2) in the SYN packet. How Do I Get the Bandwidth-Delay Product? The profile tuning article gives some pointers on using iRules to figure out the bandwidth and RTT on certain paths, which I won't repeat here. TCP Analytics can also generate some useful data here. And you may have various third party tools (the most simple of which is "ping") to get one or both of these metrics. When computing BDP, beware of the highest RTTs you observe on a path. Why? Bufferbloat. Over some intervals, TCP will send data faster than the bottleneck bandwidth, which fills up queues and adds to RTT. As a result, TCP's peak bandwidth will exceed the path's, and the highest RTT will include a lot of queueing time.This isn't good.A sending rate that includes self-induced queueing delay isn't getting data there any faster; instead, it's just increasing latency for itself and everybody else. I wish I could give you more precise advice, but there are no easy answers here. To the extent you can probe the characteristics of the networks you operate on, you want to take (max bandwidth) * (min RTT) to find each path's BDP, and take the maximum of all those path BDPs. Easier said then done! But perhaps this article has given you enough intuition about the problem to create a better send-buffer-size setting. In case a measurement program is not in the cards, I'll leave you with the chart below, which plots the BDP of 64KB, 128KB, and 256KB buffer sizes against representative BDPs of various link types. * If TCP blocks transmission due to the Nagle Algorithmor Rate Pacing, the unsent data will already be in the send buffer. ** A client can renege on a SACK (Selective Acknowledgment), so this is not sufficient to free the SACKed data.10KViews0likes3CommentsF5 Unveils New Built-In TCP Profiles
[Update 3/17:Some representative performance results are at the bottom] Longtime readers know thatF5's built-in TCP profileswere in need of a refresh. I'm pleased to announce that inTMOS® version13.0, available now, there are substantial improvements to the built-in profile scheme. Users expect defaults to reflect best common practice, and we've made a huge step towards that being true. New Built-in Profiles We've kept virtually all of the old built-in profiles, for those of you who are happy with them, or have built other profiles that derive from them. But there are four new ones to load directly into your virtual servers or use a basis for your own tuning. The first three are optimized for particular network use cases: f5-tcp-wan, f5-tcp-lan, and f5-tcp-mobile are updated versions of tcp-wan-optimized, tcp-lan-optimized, and tcp-mobile-optimized. These adapt all settings to the appropriate link types, except that they don't enable the very newest features. If the hosts you're communicating with tend to use one kind of link, these are a great choice. The fourth isf5-tcp-progressive.This is meant to be a general-use profile (like the tcp default), but it contains the very latest features for early adopters. In our benchmark testing, we had the following criteria: f5-tcp-wan, f5-tcp-lan, and f5-tcp-mobile achieved throughput at least as high, and often better, than the default tcp profile for that link type. f5-tcp-progressive had equal or higher throughput than default TCP across all representative network types. The relative performance of f5-tcp-wan/lan/mobile and progressive in each link type will vary given the new features that f5-tcp-progressive enables. Living, Read-Only Profiles These four new profiles,and the default 'tcp' profile,are now "living." This means that we'll continually update them with best practices as they evolve. Brand-new features, if they are generally applicable, will immediately appear in f5-tcp-progressive. For our more conservative users, these new features will appear in the other four living profiles after a couple of releases. The default tcp profile hasn't changed yet, but it will in future releases! These five profiles are also now read-only, meaning that to make modifications you'll have to create a new profile that descends from these. This will aid in troubleshooting. If there are any settings that you like so much that you never want them to change, simply click the "custom" button in the child profile and the changes we push out in the future won't affect your settings. How This Affects Your Existing Custom Profiles If you've put thought into your TCP profiles, we aren't going to mess with it. If your profile descends from any of the previous built-ins besides default 'tcp,' there is no change to settings whatsoever. Upgrades to 13.0 will automatically prevent disruptions to your configuration.We've copied all of the default tcp profile settings to tcp-legacy, which is not a "living" profile. All of the old built-in profiles (like tcp-wan-optimized), and any custom profiles descended from default tcp, will now descend instead from tcp-legacy, and never change due to upgrades from F5. tcp-legacy will also include any modifications you made to the default tcp profile, as this profile is not read-only. Our data shows that few, if any, users are using the current (TMOS 12.1 and before) tcp-legacy settings.If you are, it is wise to make a note of those settings before you upgrade. How This Affects Your Existing Virtual Servers As the section above describes, if your virtual server uses any profile other than default 'tcp' or tcp-legacy, there will be no settings change at all. Given the weaknesses of the current default settings, we believe most users who use virtuals with the TCP default are not carefully considering their settings. Those virtuals will continue to use the default profile, and therefore settings will begin to evolve as we modernize the default profile in 13.1 and later releases. If you very much like the default TCP profile, perhaps because you customized it when it wasn't read-only, you should manually change the virtual to use tcp-legacy with no change in behavior. Use the New Profiles for Better Performance The internet changes. Bandwidths increase, we develop better algorithms to automatically tune your settings, and the TCP standard itself evolves. If you use the new profile framework, you'll keep up with the state of the art and maximize the throughput your applications receive. Below, I've included some throughput measurements from our in-house testing. We used parameters representative of seven different link types and measured the throughput using some relevant built-in profiles. Obviously, the performance in your deployment may vary. Aside from LANs, where frankly tuning isn't all that hard, the benefits are pretty clear.4.4KViews1like9CommentsThe TCP Proxy Buffer
The proxy buffer is probably the least intuitive of the three TCP buffer sizes that you can configure in F5's TCP Optimization offering. Today I'll describe what it does, and how to set the "high" and "low" buffer limits in the profile. The proxy buffer is the place BIG-IP stores data that isn't ready to go out to the remote host. The send buffer, by definition, is data already sent but unacknowledged. Everything else is in the proxy buffer. That's really all there is to it. From this description, it should be clear why we need limits on the size of this buffer. Probably the most common deployment of a BIG-IP has a connection to the server that is way faster than the connection to the client. In these cases, data will simply accumulate at the BIG-IP as it waits to pass through the bottleneck of the client connection. This consumes precious resources on the BIG-IP, instead of commodity servers. So proxy-buffer-high is simply a limit where the BIG-IP will tell the server, "enough." proxy-buffer-low is when it will tell the server to start sending data again. The gap between the two is simply hysteresis: if proxy-buffer-high were the same as proxy-buffer-low, we'd generate tons of start/stop signals to the server as the buffer level bounced above and below the threshold. We like that gap to be about 64KB, as a rule of thumb. So how does it tell the server to stop? TCP simply stops increasing the receive window: once advertised bytes avaiable have been sent, TCP will advertise a zero receive window. This stops server transmissions (except for some probes) until the BIG-IP signals it is ready again by sending an acknowledgment with a non-zero receive window advertisement. Setting a very large proxy-buffer-high will obviously increase the potential memory footprint of each connection. But what is the impact of setting a low one? On the sending side, the worst-case scenario is that a large chunk of the send buffer clears at once, probably because a retransmitted packet allows acknowledgement of a missing packet and a bunch of previously received data. At worst, this could cause the entire send buffer to empty and cause the sending TCP to ask the proxy buffer to accept a whole send buffer's worth of data. So if you're not that worried about the memory footprint, the safe thing is to set proxy-buffer-high to the same size as the send buffer. The limits on proxy-buffer-low are somewhat more complicated to derive, but the issue is that if a proxy buffer at proxy-buffer-low suddenly drains, it will take oneserversideRound Trip Time (RTT) to send the window update and start getting data again. So the total amount of data that has to be in the proxy buffer at the low point is the RTT of the serverside times the bandwidth of the clientside. If the proxy buffer is filling up, the serverside rate generally exceeds the clientside data rate, so that will be sufficient. If you're not deeply concerned about the memory footprint of connections, the minimum proxy buffer settings that will prevent any impairment of throughput are as follows for the clientside: proxy-buffer-high = send-buffer-size = (clientside bandwidth) * (clientside RTT) proxy-buffer-low = (clientside bandwidth) * (serverside RTT) proxy-buffer-low must be sufficiently below proxy-buffer-high to avoid flapping. If youarerunning up against memory limits, then cutting back on these settings will only hurt you in the cases above. Economizing on proxy buffer space is definitely preferable to limiting the send rate by making the send buffer too small.4.3KViews3likes14CommentsStop Using the Base TCP Profile!
[Update 1 Mar 2017:F5 has new built-in profiles in TMOS v13.0. Although the default profile settings still haven't changed, there is good news on that from as well.] If the customer data I've seen is any indication, the vast majority of our customers are using the base 'tcp' profile to configure their TCP optimization. This haspoor performance consequencesand I strongly encourage you to replace it immediately. What's wrong with it? The Buffers are too small.Both the receive and send buffers are limited to 64KB, and the proxy buffer won't exceed 48K . If the bandwidth/delay product of your connection exceeds the send or receive buffer, which it will in most of today's internet for all but the smallest files and shortest delays, your applications will be limited not by the available bandwidth but by an arbitrary memory limitation. The Initial Congestion Window is too small.As the early thin-pipe, small-buffer days of the internet recede, the Internet Engineering Task Force (see IETFRFC 6928) increased the allowed size of a sender's initial burst. This allows more file transfers to complete in single round trip time and allows TCP to discover the true available bandwidth faster. Delayed ACKs.The base profile enables Delayed ACK, which tries to reduce ACK traffic by waiting 200ms to see if more data comes in. This incurs a serious performance penalty on SSL, among other upper-layer protocols. What should you do instead? The best answer is to build a custom profile based on your specific environment and requirements. But we recognize that some of you will find that daunting! So we've created a variety of profiles customized for different environments. Frankly, we should do some work to improve these profiles, but even today there are much better choices than base 'tcp'. If you have an HTTP profile attached to the virtual, we recommend you use tcp-mobile-optimized. This is trueeven if your clients aren't mobile. The name is misleading! As I said, the default profiles need work. If you're just a bit more adventurous with your virtual with an HTTP profile, then mptcp-mobile-optimizedwill likely outperform the above. Besides enabling Multipath TCP (MPTCP)for clients that ask for it, it uses a more advanced congestion control ("Illinois") and rate pacing. We recognize, however, that if you're still using the base 'tcp' profile today then you're probably not comfortable with the newest, most innovative enhancements to TCP. So plain old tcp-mobile-optimized might be a more gentle step forward. If your virtual doesn't have an HTTP profile, the best decision is to use a modified version of tcp-mobile-optimized or mptcp-mobile-optimized. Just derive a profile from whichever you prefer and disable the Nagle algorithm. That's it! If you are absolutely dead set against modifying a default profile, then wam-tcp-lan-optimized is the next best choice. It doesn't really matter if the attached network is actually a LAN or the open internet. Why did we create a default profile with undesirable settings? That answer is lost in the mists of time. But now it's hard to change: altering the profile from which all other profiles are derived will cause sudden changes in customer TCP behavior when they upgrade. Most would benefit, and many may not even notice, but we try to not to surprise people. Nevertheless, if you want a quick, cheap, and easy boost to your application performance, simply switch your TCP profile from the base to one of our other defaults. You won't regret it.3.9KViews1like27CommentsTuning the TCP Profile, Part One
A few months ago I pointed out some problems with the existing F5-provided TCP profiles, especially the default one. Today I'll begin a pass through the (long) TCP profile to point out the latest thinking on how to get the most performance for your applications. We'll go in the order you see these profile options in the GUI. But first, a note about programmability: in many cases below, I'm going to ask you to generalize about the clients or servers you interact with, and the nature of the paths to those hosts. In a perfect world, we'd detect that stuff automatically and set it for you, and in fact we're rolling that out setting by setting. In the meantime, you can customize your TCP parameters on a per-connection basis using iRules for many of the settings described below, something I'll explain further where applicable. In general, when I refer to "performance" below, I'm referring to the speed at which your customer gets her data. Performance can also refer to the scalability of your application delivery due to CPU and memory limitations, and when that's what I mean, I'll say so. Timer Management The one here with a big performance impact isMinimum RTO. When TCP computes its Retransmission Timeout (RTO), it takes the average measured Round Trip Time (RTT) and adds a few standard deviations to make sure it doesn't falsely detect loss. (False detections have very negative performance implications.) But if RTT is low and stable that RTO may betoolow, and the minimum is designed to catch known fluctuations in RTT that the connection may not have observed. Set Minimum RTO too low, and TCP may improperly enter congestion response and reduce the sending rate all the way down to one packet per round trip. Set it too high, and TCP sits idle when it ought to retransmit lost data. So what's the right value? Obviously, if you have a sense of the maximum RTT to your clients (which you can get with the ping command), that's a floor for your value. Furthermore, many clients and servers will implement some sort of Delayed ACK, which reduces ACK volume by sometimes holding them back for up to 200ms to see if it can aggregate more data in the ACK. RFC 5681 actually allows delays of up to 500ms, but this is less common. So take the maximum RTT and add 200 to 500 ms. Another group of settings aren't really about throughput,but to help clients and servers to close gracefully, at the cost of consuming some system resources. Long Close Wait, Fin Wait 1, Fin Wait 2, and Time Wait timers will keep connection state alive to make sure the remote host got all the connection close messages. Enabling Reset On Timeout sends a message that tells the peer to tear down the connection. Similarly, disabling Time Wait Recycle will prevent new connections from using the same address/port combination, making sure that the old connection with that combination gets a full close. The last group of settingskeeps possibly dead connections alive,using system resources to maintain state in case they come back to life. Idle Timeout and Zero Window Timeout commit resources until the timer expires. If you set Keep Alive Interval to a valuelessthan the Idle Timeout, then on the clientside BIG-IP will keep the connection alive as long as the client keeps responding to keepalive and the server doesn't terminate the connection itself. In theory, this could be forever! Memory Management In terms of high throughput performance, you want all of these settings to be as large as possible up to a point. The tradeoff is that setting them too high may waste memory and reduce the number of supportable concurrent connections. I say "may" waste because these are limitson memory use, and BIG-IP doesn't allocate the memory until it needs it for buffered data.Even so, the trick is to set the limits large enough that there are no performance penalties, but no larger. Send Buffer and Receive Window are easy to set in principle, but can be tricky in practice. For both, answer these questions: What is the maximum bandwidth (Bytes/second) that BIG-IP might experience sending or receiving? Out of all paths data might travel, what minimum delay among those paths is the highest? (What is the "maximum of the minimums"?) Then you simply multiply Bytes/second by seconds of delay to get a number of bytes. This is the maximum amount of data that TCP ought to have in flight at any one time, which should be enough to prevent TCP connections from idling for lack of memory. If your application doesn't involve sending or receiving much data on that side of the proxy, you can probably get away with lowering the corresponding buffer size to save on memory. For example, a traditional HTTP proxy's clientside probably can afford to have a smaller receive buffer if memory-constrained. There are three principles to follow in setting Proxy Buffer Limits: Proxy Buffer High should be at least as big as the Send Buffer. Otherwise, if a large ACK clears the send buffer all at once there may be less data available than TCP can send. Proxy Buffer Low should be at least as big as the Receive Window on the peer TCP profile(i.e. for the clientside profile, use the receive window on the serverside profile). If not, when the peer connection exits the zero-window state, new data may not arrive before BIG-IP sends all the data it has. Proxy Buffer High should be significantly larger than Proxy Buffer Low (we like to use a 64 KB gap) to avoid constant flapping to and from the zero-window state on the receive side. Obviously, figuring out bandwidth and delay before a deployment can be tricky. This is a place where some iRule mojo can really come in handy. The TCP::rtt and TCP::bandwidth* commands can give you estimates of both quantities you need, even though the RTT isn't a minimum RTT. Alternatively, if you've enabled cmetrics-cache in the profile, you can also obtain historical data for a destination using the ROUTE::cwnd* command, which is a good (possibly low) guess at the value you should plug into the send and receive buffers. You can then set buffer limits directly usingTCP::sendbuf**,TCP::recvwnd**, and TCP::proxybuffer**. Getting this to work very well will be difficult, and I don't have any examples where someone worked it through and proved a benefit. But if your application travels highly varied paths and you have the inclination to tinker, you could end up with an optimized configuration. If not, set the buffer sizes using conservatively high inputs and carry on. *These iRule commands only supported in TMOS® version 12.0.0 and later. **These iRule commands only supported inTMOS® version 11.6.0and later.3.3KViews0likes6CommentsIntroducing TCP Analytics
I'm pleased to announce that F5® TMOS® 12.1 is the first release to contain ourTCP Analytics package. If you would like to measure how well TCP is delivering application data, perform A/B testing, or get diagnostic help for slow data delivery, this tool is a new and powerful way to do those things. TCP Analytics works with virtual servers that use either a TCP profile or a FastL4 profile. What stats are available? We're collecting several different statistics, some of them conventional TCP metrics and some of them brand new concepts. BIG-IP collects statistics over five minute intervals and then reports them through our AVR (Application Visibility and Reporting) module. RTT Minimum, Mean, and Maximum measures the best-case, worst-case, and average delay (Round Trip Time, or RTT) experienced by packets. Although packet delay is meaningful to users in its own right, if the mean RTT is closer to maximum than minimum RTT, it's an indication router queues in the path are filling up. This congestion might be due to cross traffic, or caused by the profile settings on your BIG-IP. RTTVAR Meanis the mean value of the "RTT Variation," which is a concept defined in the TCP specification. RTT Variation is a close analogue of the statistical variance but is easier to compute. The RTTVAR is a measurement of the variation in delay, or "jitter", that your applications experience. High jitter can have all sorts of negative consequences, but is most damaging to streaming applications where sudden jumps in latency deliver packets too late given the amount of buffered audio or video. Goodputis the total amount of data delivered to the upper layer, which might be SSL, HTTP, or some other application. This is the aggregate goodput for the set of connections you choose to display, not the goodput experienced by a particular connection. In other words, if there are more connections there is usually more goodput. The definition of goodput specifically excludes data that TCP retransmits or cannot deliver due to gaps in received packets. Connections Opened and Closed; Packets Sent, Received, and Lostare self-explanatory. TCP Analytics measures lost packets by tracking packet retransmissions. In isolated cases, retransmissions turn out to be unnecessary, but Analytics will report the packet as lost anyway. Mean Connection Timemeasures the average time from completion of the three way handshake to the remote host's acknowledgment of the BIG-IP FIN flag, unless an iRule causes collection to start late or finish early. Delay State Analysisis an attempt to answer the question "why is my TCP so slow?". It divides the entire life of a TCP connection into a total of 10 states based on why the connection is sending (or, more accurately, not sending) data at any given time. By reporting the time connections spend in each state, users can immediately see what the principal causes of delay are. In some cases, users can easily address these causes by tuning their TCP profile. In other cases, the root cause is a client, the network path itself, or an application delay outside the scope of TCP. (Delay State Analysis is not available with FastL4 profiles.) Aggregate stats aren't that useful to me. How can I look at specific sets of connections? Many BIG-IPs deal with millions of connections at a time, so storing statistics for every connection individually uses a lot of memory. However, the AVR engine is designed to hash relevant connection attributes and attach it to its data storage, so that users can pull out statistics for particular subsets of data. In the TCP Analytics profile you attach to a virtual, you decide what connection attributes you want to store. We don't track them all automatically, because in many cases a BIG-IP with many connections and storing all attributes will run into memory limitations and lose some attribute granularity. The connection attributes available are virtual server, IP Address or /24 IP subnet address, nexthop MAC address, geolocation, and whether the connection is clientside or serverside. In addition, you can use the power of iRules to be more selective. First of all, you can turn on and off statistics collection via iRule, meaning you can collect stats only for connections that meet specific conditions laid out in the iRule. Secondly, iRules can attach an arbitrary string as an attribute. In other words, you can create arbitrary connection categories based on any iRule logic you like, and then display separate statistics for those categories. For example, you could separate download connections out by file size and display the stats for them separately. This is great. How do I get started? There are four main steps: 1) Provision AVR on your BIG-IP under System > Resource Provisioning. If you are licensed for LTM, you are licensed for AVR. 2) Create a TCP Analytics Profile. 3) Attach the TCP Analytics Profile to one or more virtual servers. 4) Watch the stats come in! There are many more details at the online help for the product. Obviously, there are many nuances we could explore in this space. Based on your feedback, I may dive into those nuances in a future article. Until then, enjoy!2.3KViews0likes20CommentsInvestigating the LTM TCP Profile: Windows & Buffers
Introduction The LTM TCP profile has over thirty settings that can be manipulated to enhance the experience between client and server. Because the TCP profile is applied to the virtual server, the flexibility exists to customize the stack (in both client & server directions) for every application delivered by the LTM. In this series, we will dive into several of the configurable options and discuss the pros and cons of their inclusion in delivering applications. Nagle's Algorithm Max Syn Retransmissions & Idle Timeout Windows & Buffers Timers QoS Slow Start Congestion Control Algorithms Acknowledgements Extended Congestion Notification & Limited Transmit Recovery The Finish Line Quick aside for those unfamiliar with TCP: the transmission controlprotocol (layer4) rides on top of the internetprotocol (layer3) and is responsible for establishing connections between clients and servers so data can be exchanged reliably between them. Normal TCP communication consists of a client and a server, a 3-way handshake, reliable data exchange, and a four-way close. With the LTM as an intermediary in the client/server architecture, the session setup/teardown is duplicated, with the LTM playing the role of server to the client and client to the server. These sessions are completely independent, even though the LTM can duplicate the tcp source port over to theserver-sideconnection in most cases, and depending on your underlying network architecture, can also duplicate the source IP. TCP Windows The window field is a flow control mechanism built into TCP that limits the amount of unacknowledged data on the wire. Without the concept of a window, every packet sent would have to be acknowledged before sending another one, so the max transmission speed would be MaxSegmentSize / RoundTripTime. For example, my max MSS is 1490 (1472+28 for the ping overhead), and the RTT to ping google is 37ms. You can see below when setting the don't fragment flag the segment size where the data can no longer be passed. C:\Documents and Settings\rahm>ping -f www.google.com -l 1472 -n 2 Pinging www.l.google.com [74.125.95.104] with 1472 bytes of data: Reply from 74.125.95.104: bytes=56 (sent 1472) time=38ms TTL=241 Reply from 74.125.95.104: bytes=56 (sent 1472) time=36ms TTL=241 Ping statistics for 74.125.95.104: Packets: Sent = 2, Received = 2, Lost = 0 (0% loss), Approximate round trip times in milli-seconds: Minimum = 36ms, Maximum = 38ms, Average = 37ms C:\Documents and Settings\rahm>ping -f www.google.com -l 1473 -n 2 Pinging www.l.google.com [74.125.95.104] with 1473 bytes of data: Packet needs to be fragmented but DF set. Packet needs to be fragmented but DF set. So the max transmission speed without windows would be 40.27 KB/sec. Not a terribly efficient use of my cable internet pipe. The window is a 16-bit field (offset 14 in the TCP header), so the max window is 64k (2^16=65536). RFC 1323 introduced a window scaling option that extends window sizes from a max of 64k to a max of 1G. This extension is enabled by default with the Extensions for High Performance (RFC 1323) checkbox in the profile. If we stay in the original window sizes, you can see that as latency increases, the max transmission speeds decrease significantly(numbers in Mb/s): TCP Max Throughput - Fast Ethernet Latency (RTT in ms) 0.1 1 10 100 Window Size 4k 73.605 24.359 3.167 0.327 8k 82.918 38.770 6.130 0.651 16k 88.518 55.055 11.517 1.293 32k 91.611 69.692 20.542 2.551 64k 93.240 80.376 33.775 4.968 Larger window sizes are possible, but remember the LTM is a proxy for the client and server, and must sustain connections for both sidesfor each connection the LTM is services.Increasing the max window size is a potential increase in the memory utilization per connection. The send buffer setting is the maximum amount of data the LTM will send before receiving an acknowledgement, and the receive window setting is the maximum size window the LTM will advertise. This is true for each side of the proxy. The connection speed can be quite different betweenthe client and the server, and this is where the proxy buffer comes in. Proxy Buffers For equally fast clients and servers, there is no need to buffer content between them. However, if the client or server falls behind in acknowledging data, or there are lossy conditions, the proxy will begin buffering data. The proxy buffer high setting is the threshold at which the LTM stops advancing the receive window. The proxy buffer low setting is a falling trigger (from the proxy high setting) that will re-open the receive window once passed. Like the window, increasing the proxy buffer high setting will increase the potential for additional memory utilization per connection. Typically the clientside of a connection is slower than the serverside, and without buffering the data the client forces the server to slow down the delivery. Buffering the data on the LTM allows the server to deliver its data so it can move on to service other connections while the LTM feeds the data to the client as quickly as possible. This is also true the other way in a fast client/slow server scenario. Optimized profiles for the LAN & WAN environments With version 9.3, the LTMbegan shippingwith pre-configured optimized tcp profiles for the WAN & LAN environments. The send buffer and the receive window maximums are both set to the max non-scaled window size at 64k (65535), and the proxy buffer high is set at 131072. For the tcp-lan-optimized profile, the proxy buffer low is set at 98304 and for the tcp-wan-optimized, the proxy buffer low is set the same as the high at 131072. So for the LAN optimized profile, the receive window for the server is not opened until there is less than 98304 bytes to send to the client, whereas in the WAN optimized profile, the server receive window is opened as soon as any data is sent to the client. Again, this is good for WAN environments where the clients are typically slower. Conclusion Hopefully this has given some insight into the inner workings of the tcp window and the proxy buffers. If you want to do some additional research, I highly recommend the TCP/IP Illustrated volumes by W. Richard Stephens, and a very useful TCP tutorial at http://www.tcpipguide.com/.2KViews0likes3CommentsTCP Configuration Just Got Easier: Autobuffer Tuning
One of the hardest things about configuring a TCP profile for optimum performance is picking the right buffer sizes. Guess too small, and your connection can't utilize the available bandwidth. Guess too large, and you're wasting system memory, and potentially adding to path latency through the phenomenon of "bufferbloat." But if you get the path Bandwidth-Delay Product right, you're in Nirvana: close to full link utilization without packet loss or latency spikesdue to overflowing queues. Beginning in F5® TMOS® 13.0, help has arrived with F5's new 'autobuffer tuning' feature. Click the "Auto Proxy Buffer", "Auto Receive Window", and "Auto Send Buffer" boxes in your TCP profile configuration, and you need not worry about those buffer sizes any more. What it Does The concept is simple. To get a bandwidth-delay product, we need the bandwidth and delay. We have a good idea of the delay from TCP's round-trip-time (RTT) measurement. In particular, the minimum observed RTT is a good indicator ofthe delay when queues aren't built up from over-aggressive flows. The bandwidth is a little trickier to measure. For the send buffer, the algorithm looks at long term averages of arriving acks to estimate how quickly data is arriving at the destination. For the receive buffer, it's fairly straightforward to count the incoming bytes. The buffers start at 64 KB. When the Bandwidth-Delay Product (BDP) calculation suggests that's not enough, the algorithm increments the buffers upwards and takes new measurements. After a few iterations, your connection buffer sizes should converge on something approaching the path BDP, plus a small bonus to cover measurement imprecision and leave space for later bandwidth increases. Knobs! Lots of Knobs! There are no configuration options in the profile to control autotuning except to turn it on and off. We figure you don't want to tune your autotuning! However, for inveterate optimizers, there are some sys db variables under the hood to make this feature behave exactly howyou want. For send buffers,the algorithm computes bandwidthand updates the buffer size every tm.tcpprogressive.sndbufminintervalmilliseconds (default 100). The send buffer size is determined by (bandwidth_max * RTTmin) * tm.tcpprogressive.sndbufbdpmultiplier + tm.tcpprogressive.sndbufincr. The defaults for the multiplier and increment are 1 and 64KB, respectively. Both of these quantities exist to provide a little "wiggle room" to discovernewly available bandwidth and provision for measurement imprecision. The initial send buffer size starts attm.tcpprogressive.sndbufmin(default 64KB) and is limited to tm.tcpprogressive.sndbufmax (default 16MB). For receive buffers, replace 'sndbuf' with 'rcvbuf' above. For proxy buffers, the high watermarkisMAX(send buffer size, peer TCP's receive window + tm.tcpprogressive.proxybufoffset) and the low watermarkis (proxy buffer high) - tm.tcpprogressive.proxybufoffset. The proxy buffer high is limited by tm.tcpprogressive.proxybufmin (default 64KB) andtm.tcpprogressive.proxybufmax (default 2MB). When send or receive buffers change, proxy buffers are updated too. This May Not Be For SomeUsers Some of you out there already have a great understanding of your network, have solid estimates of BDPs, and have configured your buffers accordingly. You may be better off sticking with your carefully measured settings.Autobuffer tuning starts out with no knowledge of the network and converges on the right setting. That's inferior to knowing the correct setting beforehand and going right to it. Autotuning Simplifies TCP Configuration We've heard from the field that many people find TCP profiles too hard to configure. Together with the Autonagle option, autobuffer tuning is designed to take some of the pain of getting the most out of your TCP stack. If you don't know where to start with setting buffer sizes, turn on autotuning and let the BIG-IP® set them for you.1.7KViews4likes3CommentsInvestigating the LTM TCP Profile: Nagle’s Algorithm
Introduction The LTM TCP profile has over thirty settings that can be manipulated to enhance the experience between client and server. Because the TCP profile is applied to the virtual server, the flexibility exists to customize the stack (in both client & server directions) for every application delivered by the LTM. In this series, we will dive into several of the configurable options and discuss the pros and cons of their inclusion in delivering applications. Nagle's Algorithm Max Syn Retransmissions & Idle Timeout Windows & Buffers Timers QoS Slow Start Congestion Control Algorithms Acknowledgements Extended Congestion Notification & Limited Transmit Recovery The Finish Line Quick aside for those unfamiliar with TCP: the transmission control protocol (layer 4) rides on top of the internet protocol (layer 3) and is responsible for establishing connections between clients and servers so data can be exchanged reliably between them. Normal TCP communication consists of a client and a server, a 3-way handshake, reliable data exchange, and a four-way close. With the LTM as an intermediary in the client/server architecture, the session setup/teardown is duplicated, with the LTM playing the role of server to the client and client to the server. These sessions are completely independent, even though the LTM can duplicate the tcp source port over to the server side connection in most cases, and depending on your underlying network architecture, can also duplicate the source IP. Nagle's Algorithm, defined in RFC896, is a congestion control mechanism designed to bundle smaller chunks of data for delivery in one big packet. The algorithm: if there is new data to send if the window size >= MSS and available data is >= MSS send complete MSS segment now else if there is unconfirmed data still in the pipe enqueue data in the buffer until an acknowledge is received else send data immediately end if end if end if Sending packets with 40 bytes of overhead to carry little data is very inefficient, and Nagle's was created to address this inefficiency. Efficiency, however, is not the only consideration. Delay-sensitive applications such as remote desktop protocol can be severely impacted by Nagle's. An RDP user connecting to a terminal server expects real-time movement on the desktop presentation, but with Nagle's enabled, the sending station will queue the content if there is additional data coming, which can be perceived as the network being slow, when in actuality, it is performing as expected. Even for non-real-time applications, there can be a noticable difference on the wire, even if the end user is oblivious to the performance gain. This can come in to play with automated performance scripts that enable thresholds. For example, in one installation a first generation load balancer was scheduled to be replaced. All TCP was simply passed by the load balancer, so the controlled optimization points were isolated to the servers. The server TCP stacks were tuned with the help of a couple monitoring tools: one that measured the time to paint the front page of the application, and one to perform a transaction within the application. During testing, inserting the LTM with the default tcp profile negated the optimizations performed on the server TCP stacks and the tools alerted the administrators accordingly with a twofold drop in performance. Disabling Nagle's alone resulted in a significant improvement from the default profile, but the final configuration included additional options, which will be discussed in the coming weeks. One warning: Nagle's and delayed acknowledgements do not play well in the same sandbox. There's a good analysis here and a commentary on their interactivity by Mr Nagle himself here. In conclusion, Nagle's algorithm can make your bandwidth utilization more effective in relation to packet overhead, but a careful analysis of the overall architecture will help in deciding if you should enable it.1.6KViews0likes7CommentsTuning the TCP Profile: Part Three
Since February I've walked you throughmost of the TCP profile configuration page. This month I'll finish this tour. As I said previously,when I refer to "performance" below, I'm referring to the speed at which your customer gets her data. Performance can also refer to the scalability of your application delivery due to CPU and memory limitations, and when that's what I mean, I'll say so. Congestion Control Appropriate Byte Countingincreases the congestion window based on the number of bytes received rather than the number of acks received. If there are multiple acks per full-size packet, ABC will grow the congestion window less agressively; if the remote host is using delayed acks, ABC will grow more agressively. In any case, ABC means TCP is using a more genuine estimate of the demonstrated link capcity. Congestion Metrics Cache deserves a whole article to explain in its intracacies. However, it basicallystores historic congestion data from previous connections with that IP address. In general, your connections to converge more quickly on the maximum sustainable data rate, although in paths with low bandwidth-delay products and large queues, that may not be the case. TheCongestion Metrics Cache Timeoutallows you to define the length of time that old congestion data is likely to be useful (default 600 seconds). Congestion Control is also an extensive subject. Itallows selection of a congestion control algorithm for sending data. Except for the F5-proprietary "Woodside" congestion control, all of these have their designs described either in an IETF document or in a peer-reviewed paper. In general, we'd recommend Woodside -- especially if your path is partly wireless -- and use Highspeed in cases where the path incurs a total delay of only a couple of milliseconds. Delay Window Controlis a means of adding delay awareness to congestion control protocols, so that they back off if the current sending rate appears to be filling queues instead of unlocked unused bandwidth. In all honesty, it's probably better to use a congestion control algorithm that has delay sensitivity built-in, like Illinois or Woodside. Explicit Congestion Notificationallows routers to mark packets, instead of dropping them, during periods of congestion. TCP adjusts its send rate as if the packet were lost, which in theory would allow the internet to largely avoid packet losses. In practice, most routers are not configured to support ECN marking. The Initial Congestion Window Sizecontrols the amount of data TCP will send at connection start, if there is no congestion metrics cache entry. In practice, large initial cwnd values will simply limit the initial burst to the advertised receive window of the remote host. As internet speeds increase, typical initial cwnds are creeping upwards. It can make a big difference in the number of round trips necessary to deliver small files. However, on a congested path a large initial window may cause multiple packet losses.Slow Startis the normal way that TCP operates. Turning it off means placesno restrictionson send rate except the remote host's advertised receive window, effectively setting initial cwnd to infinity. Disabling slow start is not recommended except in private LANs where congestion is not an issue. Packet LossParametersare appropriate in situations where there is a known probability of packet losses not due to congestion effects. When enabled, TCP will retransmit many lost packets without reducing its sending rate. Rate Pacingspreads out packet transmissions over time instead of sending all at once and overloading the queue, using a rate based on detected packet losses. It reduces queue overflow without having significant downsides.Rate Pace Max Rateplaces a ceiling on the computed rate. The max rate should not be any more than the maximum bandwidth of your path. Timestampsare a nearly universally adopted extension to TCP. F5's delay-based congestion control protocols, including Illinois and Woodside, don't use timestamps for round-trip-time estimation, so this setting has no effect on effective data transfer. However, timestamps may still be useful for the remote host, so we strongly recommend supporting this option. Loss Detection and Recovery Limited Transmit Recovery and Selective ACKsare extremely common TCP extensions that take appropriate opportunities to be more aggressive, and better determine which packets are missing, respectively. Selective ACKs consume slightly more resources per connection. D-SACKuses the SACK framework to help TCP identify duplicate packets; BIG-IP doesn't really need or use this info, but the remote host might. I wrote aboutEarly Retransmit and Tail Loss Probelast year. Maximum Syn RetransmissionsandMaximum Segment Retransmissionstell BIG-IP how many times to keep trying on a connection it hasn't heard from. They represent a tradeoff between connection success rate and resources devoted to dead remote hosts. The concisely namedInitial Retransmission Timeout Base Multiplier for SYN Retransmissionis just the timeout applied to the first SYN or SYN-ACK packet; again, it trades off success against resources. Selective NACKis an F5-specific option to enhance communication between BIG-IPs. It is not recommended you enable this option unless directed by F5. Security and MPTCP IP ToSandLink ToSset bits in IP and link headers sent for this connection to execute various preferential forwarding behaviors. Hardware SYN Cookie Protection, which you will only see on platforms that support it, uses hardware to handle the computational load of responding to SYN flood attacks. SYN Cookie White Listis useful in cases where packets from the server do not pass through the BIG-IP. Otherwise, just leave it disabled. MD5 SignatureandMD5 Signature Passphrasesupport RFC 2385, a security mechanism for Border Gateway Protocol (BGP). Multipath TCPis another rich subject I can't treat fully here. If your clients are so enabled, MPTCP allows connections to survive an IP address change at your client. ************** That concludes our pass through the TCP profile. If there are any features you'd like to see get the full-article treatment, please say so in the comments.1.4KViews0likes4Comments