tcp
60 TopicsTCP Configuration Just Got Easier: Autobuffer Tuning
One of the hardest things about configuring a TCP profile for optimum performance is picking the right buffer sizes. Guess too small, and your connection can't utilize the available bandwidth. Guess too large, and you're wasting system memory, and potentially adding to path latency through the phenomenon of "bufferbloat." But if you get the path Bandwidth-Delay Product right, you're in Nirvana: close to full link utilization without packet loss or latency spikesdue to overflowing queues. Beginning in F5® TMOS® 13.0, help has arrived with F5's new 'autobuffer tuning' feature. Click the "Auto Proxy Buffer", "Auto Receive Window", and "Auto Send Buffer" boxes in your TCP profile configuration, and you need not worry about those buffer sizes any more. What it Does The concept is simple. To get a bandwidth-delay product, we need the bandwidth and delay. We have a good idea of the delay from TCP's round-trip-time (RTT) measurement. In particular, the minimum observed RTT is a good indicator ofthe delay when queues aren't built up from over-aggressive flows. The bandwidth is a little trickier to measure. For the send buffer, the algorithm looks at long term averages of arriving acks to estimate how quickly data is arriving at the destination. For the receive buffer, it's fairly straightforward to count the incoming bytes. The buffers start at 64 KB. When the Bandwidth-Delay Product (BDP) calculation suggests that's not enough, the algorithm increments the buffers upwards and takes new measurements. After a few iterations, your connection buffer sizes should converge on something approaching the path BDP, plus a small bonus to cover measurement imprecision and leave space for later bandwidth increases. Knobs! Lots of Knobs! There are no configuration options in the profile to control autotuning except to turn it on and off. We figure you don't want to tune your autotuning! However, for inveterate optimizers, there are some sys db variables under the hood to make this feature behave exactly howyou want. For send buffers,the algorithm computes bandwidthand updates the buffer size every tm.tcpprogressive.sndbufminintervalmilliseconds (default 100). The send buffer size is determined by (bandwidth_max * RTTmin) * tm.tcpprogressive.sndbufbdpmultiplier + tm.tcpprogressive.sndbufincr. The defaults for the multiplier and increment are 1 and 64KB, respectively. Both of these quantities exist to provide a little "wiggle room" to discovernewly available bandwidth and provision for measurement imprecision. The initial send buffer size starts attm.tcpprogressive.sndbufmin(default 64KB) and is limited to tm.tcpprogressive.sndbufmax (default 16MB). For receive buffers, replace 'sndbuf' with 'rcvbuf' above. For proxy buffers, the high watermarkisMAX(send buffer size, peer TCP's receive window + tm.tcpprogressive.proxybufoffset) and the low watermarkis (proxy buffer high) - tm.tcpprogressive.proxybufoffset. The proxy buffer high is limited by tm.tcpprogressive.proxybufmin (default 64KB) andtm.tcpprogressive.proxybufmax (default 2MB). When send or receive buffers change, proxy buffers are updated too. This May Not Be For SomeUsers Some of you out there already have a great understanding of your network, have solid estimates of BDPs, and have configured your buffers accordingly. You may be better off sticking with your carefully measured settings.Autobuffer tuning starts out with no knowledge of the network and converges on the right setting. That's inferior to knowing the correct setting beforehand and going right to it. Autotuning Simplifies TCP Configuration We've heard from the field that many people find TCP profiles too hard to configure. Together with the Autonagle option, autobuffer tuning is designed to take some of the pain of getting the most out of your TCP stack. If you don't know where to start with setting buffer sizes, turn on autotuning and let the BIG-IP® set them for you.1.8KViews4likes3CommentsTCP Internals: 3-way Handshake and Sequence Numbers Explained
In this article, I will explain and show you what really happens during a TCP 3-way handshake as captured by tcpdump tool. We'll go deeper into details of TCP 3-way handshake (SYN, SYN/ACK and ACK) and how Sequence Numbers and Acknowledgement Numbers actually work. Moreover, I'll also briefly explain using real data how TCP Receive Window and Maximum Segment Size play an important role in TCP connection. As a side note, I will not touchTCP SACKandTCP Timestampsthis time as they should be covered in a future article about TCP retransmissions. FYI, the TCP capture was generated by a simpleHTTP GETrequest to BIG-IP to get hold of a file on/cgi-bin/directory calledscript.plusingHTTP/1.1protocol: BIG-IP then responds withHTTP/1.1 200 OKwith the requested data. This is not very relevant as we'll be looking at TCP layer but it's good to understand the capture's context to fully understand what's going on. This is what a TCP 3-way handshake looks like on Wireshark: Aswe can see, the first 3 packets are exchanged less than 1 second apart from each other. TheIN/OUTportion ofInfofield on BIG-IP's capture tells us if the packet is coming IN or being sent OUT by BIG-IP (as capture was taken on BIG-IP). As this is a slightly more in-depth explanation of TCP internals, I am assuming you know at least what a TCP 3-way handshake is conceptually. The TCP SYN, SYN/ACK and ACK Segments We can see that first packet is[SYN], second one is[SYN/ACK]and last one is[SYN/ACK]as displayed on Wireshark. TheInfosection as a whole only shows the summary of the most relevant fields copied from the TCP header. It is just enough to make us understand the context of the TCP segment. Let's now have a look what these fields mean with the exception ofSACK_PERMandTSval. When we double click on the[SYN]packet below, we find the same information again in the actual TCP header: The most important thing to understand here is that[SYN],[SYN/ACK]and[ACK]are all part of theFlagsheader above. They're just 1's and 0's. WhenSYNflag is enabled (i.e its value is 1), the receiving end (in this case BIG-IP) should automatically understand that someone (my client PC in this case) is trying to establish aTCPconnection. The response from BIG-IP (SYN/ACK) is an acknowledgement to theSYNpacket and therefore it has bothSYNandACKflags set to 1. Client's last response is just anACKas seen below: As per RFC, both sides should now assume a TCP connection is established. For plain-textHTTP/1.1protocol, there should now be a GET request in another layer as a payload of (or encapsulated by) TCP layer. If our traffic it is protected byTLSthenTLSlayer should come first as the payload of TCP layer and HTTP would be the payload of TLS layer. Does it make sense? That's how things work in the real world. TCP Sequence numbers A side note,Wireshark shows that our first SYN segment's Sequence number is 0 (Seq=0): It also shows that it isrelativesequence numberbut this is not the real TCP sequence number. Wireshark automatically zeroes it for you to make it easier to visualise and/or troubleshoot. In reality, the real sequence number is a much longer number that is calculated by your OS using current time and other random parameters for security purposes. This is how we see the real sequence number in Wireshark: Now back to business. Some people say if Client sends a TCP segment to BIG-IP, BIG-IP's ACK should be client's sequence number + 1 right? Wrong! Instead of +1 it should be+ number of bytes last received from peer or +1 if SYN or FIN segments. To clarify, here's thefull Flow Graphof our capture using relativesequence numbersto make it easier to grasp (.135= Client and .143 =BIG-IP): On 4th segment above (PSH, ACK - Len: 93), client sends TCP segment withSeq = 1and TCP payload data length (comprised of HTTP layer) of93 bytes. In this case, BIG-IP's response isnotACK = 2 (1 + 1) as some might think. Instead, BIG-IP responds with whatever client's last Sequence number wasplusnumber of bytes last received. As last sequence number was 1 and client also sent a TCP payload of 93 bytes, thenACKis 94! This is the most important concept to grasp for understanding sequence numbers and ACKs. SEQsandACKsonly increment whenthere is a TCP payload involved(by the number of bytes). SYN, FIN or ZeroWindow segments count as 1 byte for SEQs/ACKs. I added a full analysis using real TCPSEQs/ACKsto anAppendixsection if you'd like to go deeper into it. For the moment let's shift our attention towardsTCP Receive Window. TCP Receive Window and Maximum Segment Size (MSS) During 3-way handshake, the Receive Window (Window size valueon Wireshark) tells each side of the connection the maximum receiving buffer in bytes each side can handle: So it's literally like this (read red lines first please): [1]→ Hey, BIG-IP! My receiving buffer size is 29200 bytes. That means, you caninitiallysend me up to 29200 bytes before you even bother waiting for an ACK from me to send further data. [2]→ This should be the same as[1], unless Window Scale TCP Option is active. Window Scale should be the subject of a different article but I briefly touch it on[3]. [3]→ Original TCP Window Size field is limited to 16 bits so maximum buffer size is just65,535 bytes which is too little for today's speedy connections. This option extends the 16-bit window to 32-bit window but because BIG-IP did not advertise Window Scale option for this connection, it is disabled as both sides must support it for it to be used. [4]→ Hey, client! My receiving buffer size is 4380 bytes. That means, you caninitiallysend me up to 4328 bytesbefore you even bother waiting for an ACK from me to send further data. The reason why the wordinitiallyisunderlined on [1] and [3] is because Window size typically changes during the connection. For example, client's initial window size is 29200 bytes, right? This means that if it receives 200 bytes from BIG-IP it should go down to 2900 bytes. Easy, eh? But that's not whatalwayshappens in real life. In fact, in our capture it's the opposite! Bytes in flightcolumn shows the data BIG-IP (*.143) is sending in bytes to our client (*.135) that has not yet been acknowledged. I've added a column withWindow Size valueto make it easier to spot how variable this field is: It is the OS TCP Flow control implementation that dictates theReceive Windowsize taking into account the current "health" of its TCP stack and of course your configuration. Yes, in many cases, especially in the middle of a connection, the Window Size does decrease based on amount of data received/buffered so our first explanation also makes sense! How does BIG-IP know that client has freed up it's buffer again? As we can see above, when Client ACKs the receipt of BIG-IP's data, it also informs the size of its buffer in theWindow Size valuefield. That's how BIG-IP knows how much data it can send to Client before it receives another ACK. What about the Maximum Segment Size? Each side also displays aTCP Option - Maximum Segment sizeof 1460 bytes. This informs the maximum size of the TCP payload each side can send at a time (per TCP segment). Looking at the picture above, BIG-IP sent 334 bytes of TCP payload to client, right? In theory, this could've been up to 1460 bytes as it's also within client's initial buffer of 29200 bytes. So apart from informing each other about the maximum buffer, the maximum size of TCP segment is also informed. TCP Len vs Bytes in Flight Column (BIF) If we look at our last picture, we can see that whatever is inLenfield matches what's in ourBIFcolumn, right? Are they the same? No! Lenshows the current size of TCP payload (excluding the size of TCP header). Remember that TCP payload in this case is the whole HTTP portion that our TCP segment is carrying. Bytes in flightis not really part of TCP header but that's something Wireshark adds to make it easier for us to troubleshoot. It just means the number of bytes sent that have not yet been acknowledged by receiver. In our capture, data is acknowledged immediately so bothLenandBIFare the same. I've picked a different capture here where there are 3 TCP segments sent with no acknowledgement soBIFcolumn increments for each unacknowledged data segment but goes back to zero as soon as anACKis received by receiver: Notice thatBIFvalues now differ from TCP payload (the equivalent toLeninInfocolumn). That's it for now. The next article would be about TCP retransmission. Appendix - Going in depth into TCP sequence numbers! Here's a full explanation about what actually takes place on TCP layer from the point of view of BIG-IP: Just follow along from [1] to [10]. That's it.9.7KViews4likes1CommentThe TCP Proxy Buffer
The proxy buffer is probably the least intuitive of the three TCP buffer sizes that you can configure in F5's TCP Optimization offering. Today I'll describe what it does, and how to set the "high" and "low" buffer limits in the profile. The proxy buffer is the place BIG-IP stores data that isn't ready to go out to the remote host. The send buffer, by definition, is data already sent but unacknowledged. Everything else is in the proxy buffer. That's really all there is to it. From this description, it should be clear why we need limits on the size of this buffer. Probably the most common deployment of a BIG-IP has a connection to the server that is way faster than the connection to the client. In these cases, data will simply accumulate at the BIG-IP as it waits to pass through the bottleneck of the client connection. This consumes precious resources on the BIG-IP, instead of commodity servers. So proxy-buffer-high is simply a limit where the BIG-IP will tell the server, "enough." proxy-buffer-low is when it will tell the server to start sending data again. The gap between the two is simply hysteresis: if proxy-buffer-high were the same as proxy-buffer-low, we'd generate tons of start/stop signals to the server as the buffer level bounced above and below the threshold. We like that gap to be about 64KB, as a rule of thumb. So how does it tell the server to stop? TCP simply stops increasing the receive window: once advertised bytes avaiable have been sent, TCP will advertise a zero receive window. This stops server transmissions (except for some probes) until the BIG-IP signals it is ready again by sending an acknowledgment with a non-zero receive window advertisement. Setting a very large proxy-buffer-high will obviously increase the potential memory footprint of each connection. But what is the impact of setting a low one? On the sending side, the worst-case scenario is that a large chunk of the send buffer clears at once, probably because a retransmitted packet allows acknowledgement of a missing packet and a bunch of previously received data. At worst, this could cause the entire send buffer to empty and cause the sending TCP to ask the proxy buffer to accept a whole send buffer's worth of data. So if you're not that worried about the memory footprint, the safe thing is to set proxy-buffer-high to the same size as the send buffer. The limits on proxy-buffer-low are somewhat more complicated to derive, but the issue is that if a proxy buffer at proxy-buffer-low suddenly drains, it will take oneserversideRound Trip Time (RTT) to send the window update and start getting data again. So the total amount of data that has to be in the proxy buffer at the low point is the RTT of the serverside times the bandwidth of the clientside. If the proxy buffer is filling up, the serverside rate generally exceeds the clientside data rate, so that will be sufficient. If you're not deeply concerned about the memory footprint of connections, the minimum proxy buffer settings that will prevent any impairment of throughput are as follows for the clientside: proxy-buffer-high = send-buffer-size = (clientside bandwidth) * (clientside RTT) proxy-buffer-low = (clientside bandwidth) * (serverside RTT) proxy-buffer-low must be sufficiently below proxy-buffer-high to avoid flapping. If youarerunning up against memory limits, then cutting back on these settings will only hurt you in the cases above. Economizing on proxy buffer space is definitely preferable to limiting the send rate by making the send buffer too small.4.3KViews3likes14CommentsIs TCP's Nagle Algorithm Right for Me?
Of all the settings in the TCP profile, the Nagle algorithm may get the most questions. Designed to avoid sending small packets wherever possible, the question of whether it's right for your application rarely has an easy, standard answer. What does Nagle do? Without the Nagle algorithm, in some circumstances TCP might send tiny packets. In the case of BIG-IP®, this would usually happen because the server delivers packets that are small relative to the clientside Maximum Transmission Unit (MTU). If Nagle is disabled, BIG-IP will simply send them, even though waiting for a few milliseconds would allow TCP to aggregate data into larger packets. The result can be pernicious. Every TCP/IP packet has at least 40 bytes of header overhead, and in most cases 52 bytes. If payloads are small enough, most of the your network traffic will be overhead and reduce the effective throughput of your connection. Second, clients with battery limitations really don't appreciate turning on their radios to send and receive packets more frequently than necessary. Lastly, some routers in the field give preferential treatment to smaller packets. If your data has a series of differently-sized packets, and the misfortune to encounter one of these routers, it will experience severe packet reordering, which can trigger unnecessary retransmissions and severely degrade performance. Specified in RFC 896 all the way back in 1984, the Nagle algorithm gets around this problem by holding sub-MTU-sized data until the receiver has acked all outstanding data. In most cases, the next chunk of data is coming up right behind, and the delay is minimal. What are the Drawbacks? The benefits of aggregating data in fewer packets are pretty intuitive. But under certain circumstances, Nagle can cause problems: In a proxy like BIG-IP, rewriting arriving packets in memory into a different, larger, spot in memory taxes the CPU more than simply passing payloads through without modification. If an application is "chatty," with message traffic passing back and forth, the added delay could add up to a lot of time. For example, imagine a network has a 1500 Byte MTU and the application needs a reply from the client after each 2000 Byte message. In the figure at right, the left diagram shows the exchange without Nagle. BIG-IP sends all the data in one shot, and the reply comes in one round trip, allowing it to deliver four messages in four round trips. On the right is the same exchange with Nagle enabled. Nagle withholds the 500 byte packet until the client acks the 1500 byte packet, meaning it takes two round trips to get the reply that allows the application to proceed. Thus sending four messages takes eight round trips. This scenario is a somewhat contrived worst case, but if your application is more like this than not, then Nagle is poor choice. If the client is using delayed acks (RFC 1122), it might not send an acknowledgment until up to 500ms after receipt of the packet. That's time BIG-IP is holding your data, waiting for acknowledgment. This multiplies the effect on chatty applications described above. F5 Has Improved on Nagle The drawbacks described above sound really scary, but I don't want to talk you out of using Nagle at all. The benefits are real, particularly if your application servers deliver data in small pieces and the application isn't very chatty. More importantly, F5® has made a number of enhancements that remove a lot of the pain while keeping the gain: Nagle-aware HTTP Profiles: all TMOS HTTP profiles send a special control message to TCP when they have no more data to send. This tells TCP to send what it has without waiting for more data to fill out a packet. Autonagle:in TMOS v12.0, users can configure Nagle as "autotuned" instead of simply enabling or disabling it in their TCP profile. This mechanism starts out not executing the Nagle algorithm, but uses heuristics to test if the receiver is using delayed acknowledgments on a connection; if not, it applies Nagle for the remainder of the connection. If delayed acks are in use, TCP will not wait to send packets but will still try to concatenate small packets into MSS-size packets when all are available. [UPDATE:v13.0 substantially improves this feature.] One small packet allowed per RTT: beginning with TMOS® v12.0, when in 'auto' mode that has enabled Nagle, TCP will allow one unacknowledged undersize packet at a time, rather than zero. This speeds up sending the sub-MTU tail of any message while not allowing a continuous stream of undersized packets. This averts the nightmare scenario above completely. Given these improvements, the Nagle algorithm is suitable for a wide variety of applications and environments. It's worth looking at both your applications and the behavior of your servers to see if Nagle is right for you.1.3KViews2likes5CommentsStop Using the Base TCP Profile!
[Update 1 Mar 2017:F5 has new built-in profiles in TMOS v13.0. Although the default profile settings still haven't changed, there is good news on that from as well.] If the customer data I've seen is any indication, the vast majority of our customers are using the base 'tcp' profile to configure their TCP optimization. This haspoor performance consequencesand I strongly encourage you to replace it immediately. What's wrong with it? The Buffers are too small.Both the receive and send buffers are limited to 64KB, and the proxy buffer won't exceed 48K . If the bandwidth/delay product of your connection exceeds the send or receive buffer, which it will in most of today's internet for all but the smallest files and shortest delays, your applications will be limited not by the available bandwidth but by an arbitrary memory limitation. The Initial Congestion Window is too small.As the early thin-pipe, small-buffer days of the internet recede, the Internet Engineering Task Force (see IETFRFC 6928) increased the allowed size of a sender's initial burst. This allows more file transfers to complete in single round trip time and allows TCP to discover the true available bandwidth faster. Delayed ACKs.The base profile enables Delayed ACK, which tries to reduce ACK traffic by waiting 200ms to see if more data comes in. This incurs a serious performance penalty on SSL, among other upper-layer protocols. What should you do instead? The best answer is to build a custom profile based on your specific environment and requirements. But we recognize that some of you will find that daunting! So we've created a variety of profiles customized for different environments. Frankly, we should do some work to improve these profiles, but even today there are much better choices than base 'tcp'. If you have an HTTP profile attached to the virtual, we recommend you use tcp-mobile-optimized. This is trueeven if your clients aren't mobile. The name is misleading! As I said, the default profiles need work. If you're just a bit more adventurous with your virtual with an HTTP profile, then mptcp-mobile-optimizedwill likely outperform the above. Besides enabling Multipath TCP (MPTCP)for clients that ask for it, it uses a more advanced congestion control ("Illinois") and rate pacing. We recognize, however, that if you're still using the base 'tcp' profile today then you're probably not comfortable with the newest, most innovative enhancements to TCP. So plain old tcp-mobile-optimized might be a more gentle step forward. If your virtual doesn't have an HTTP profile, the best decision is to use a modified version of tcp-mobile-optimized or mptcp-mobile-optimized. Just derive a profile from whichever you prefer and disable the Nagle algorithm. That's it! If you are absolutely dead set against modifying a default profile, then wam-tcp-lan-optimized is the next best choice. It doesn't really matter if the attached network is actually a LAN or the open internet. Why did we create a default profile with undesirable settings? That answer is lost in the mists of time. But now it's hard to change: altering the profile from which all other profiles are derived will cause sudden changes in customer TCP behavior when they upgrade. Most would benefit, and many may not even notice, but we try to not to surprise people. Nevertheless, if you want a quick, cheap, and easy boost to your application performance, simply switch your TCP profile from the base to one of our other defaults. You won't regret it.4KViews1like27CommentsF5 Unveils New Built-In TCP Profiles
[Update 3/17:Some representative performance results are at the bottom] Longtime readers know thatF5's built-in TCP profileswere in need of a refresh. I'm pleased to announce that inTMOS® version13.0, available now, there are substantial improvements to the built-in profile scheme. Users expect defaults to reflect best common practice, and we've made a huge step towards that being true. New Built-in Profiles We've kept virtually all of the old built-in profiles, for those of you who are happy with them, or have built other profiles that derive from them. But there are four new ones to load directly into your virtual servers or use a basis for your own tuning. The first three are optimized for particular network use cases: f5-tcp-wan, f5-tcp-lan, and f5-tcp-mobile are updated versions of tcp-wan-optimized, tcp-lan-optimized, and tcp-mobile-optimized. These adapt all settings to the appropriate link types, except that they don't enable the very newest features. If the hosts you're communicating with tend to use one kind of link, these are a great choice. The fourth isf5-tcp-progressive.This is meant to be a general-use profile (like the tcp default), but it contains the very latest features for early adopters. In our benchmark testing, we had the following criteria: f5-tcp-wan, f5-tcp-lan, and f5-tcp-mobile achieved throughput at least as high, and often better, than the default tcp profile for that link type. f5-tcp-progressive had equal or higher throughput than default TCP across all representative network types. The relative performance of f5-tcp-wan/lan/mobile and progressive in each link type will vary given the new features that f5-tcp-progressive enables. Living, Read-Only Profiles These four new profiles,and the default 'tcp' profile,are now "living." This means that we'll continually update them with best practices as they evolve. Brand-new features, if they are generally applicable, will immediately appear in f5-tcp-progressive. For our more conservative users, these new features will appear in the other four living profiles after a couple of releases. The default tcp profile hasn't changed yet, but it will in future releases! These five profiles are also now read-only, meaning that to make modifications you'll have to create a new profile that descends from these. This will aid in troubleshooting. If there are any settings that you like so much that you never want them to change, simply click the "custom" button in the child profile and the changes we push out in the future won't affect your settings. How This Affects Your Existing Custom Profiles If you've put thought into your TCP profiles, we aren't going to mess with it. If your profile descends from any of the previous built-ins besides default 'tcp,' there is no change to settings whatsoever. Upgrades to 13.0 will automatically prevent disruptions to your configuration.We've copied all of the default tcp profile settings to tcp-legacy, which is not a "living" profile. All of the old built-in profiles (like tcp-wan-optimized), and any custom profiles descended from default tcp, will now descend instead from tcp-legacy, and never change due to upgrades from F5. tcp-legacy will also include any modifications you made to the default tcp profile, as this profile is not read-only. Our data shows that few, if any, users are using the current (TMOS 12.1 and before) tcp-legacy settings.If you are, it is wise to make a note of those settings before you upgrade. How This Affects Your Existing Virtual Servers As the section above describes, if your virtual server uses any profile other than default 'tcp' or tcp-legacy, there will be no settings change at all. Given the weaknesses of the current default settings, we believe most users who use virtuals with the TCP default are not carefully considering their settings. Those virtuals will continue to use the default profile, and therefore settings will begin to evolve as we modernize the default profile in 13.1 and later releases. If you very much like the default TCP profile, perhaps because you customized it when it wasn't read-only, you should manually change the virtual to use tcp-legacy with no change in behavior. Use the New Profiles for Better Performance The internet changes. Bandwidths increase, we develop better algorithms to automatically tune your settings, and the TCP standard itself evolves. If you use the new profile framework, you'll keep up with the state of the art and maximize the throughput your applications receive. Below, I've included some throughput measurements from our in-house testing. We used parameters representative of seven different link types and measured the throughput using some relevant built-in profiles. Obviously, the performance in your deployment may vary. Aside from LANs, where frankly tuning isn't all that hard, the benefits are pretty clear.4.4KViews1like9CommentsIntroducing QUIC and HTTP/3
QUIC [1] is a new transport protocol that provides similar service guarantees to TCP, and then some, operating over a UDP substrate. It has important advantages over TCP: Streams: QUIC provides multiple reliable ordered byte streams, which has several advantages for user experience and loss response over the single stream in TCP. The stream concept was used in HTTP/2, but moving it into the transport further amplifies the benefits. Latency: QUIC can complete the transport and TLS handshakes in a single round trip. Under some conditions, it can complete the application handshake (e.g. HTTP requests) in a single round-trip as well. Privacy and Security: QUIC always uses TLS 1.3, the latest standard in application security, and hides much more data about the connection from prying eyes. Moreover, it is much more resistant than TCP to various attacks on the protocol, because almost all of its packets are authenticated. Mobility: If put in the right sort of data center infrastructure, QUIC seamlessly adjusts to changes in IP address without losing connectivity. [2] Extensibility: Innovation in TCP is frequently hobbled by middleboxes peering into packets and dropping anything that seems non-standard. QUIC’s encryption, authentication, and versioning should make it much easier to evolve the transport as the internet evolves. Google started experimenting with early versions of QUIC in 2012, eventually deploying it on Chrome browsers, their mobile apps, and most of their server applications. Anyone using these tools together has been using QUIC for years! The Internet Engineering Task Force (IETF) has been working to standardize it since 2016, and we expect that work to complete in a series of Internet Requests for Comment (RFCs) standards documents in late 2020. The first application to take advantage of QUIC is HTTP. The HTTP/3 standard will publish at the same time as QUIC, and primarily revises HTTP/2 to move the stream multiplexing down into the transport. F5 has been tracking the development of the internet standard. In TMOS 15.1.0.1, we released clientside support for draft-24 of the standard. That is, BIG-IP can proxy your HTTP/1 and HTTP/2 servers so that they communicate with HTTP/3 clients. We rolled out support for draft-25 in 15.1.0.2 and draft-27 in 15.1.0.3. While earlier drafts are available in Chrome Canary and other experimental browser builds, draft-27 is expected to see wide deployment across the internet. While we won’t support all drafts indefinitely going forward, our policy will be to support two drafts in any given maintenance release. For example, 15.1.0.2 supports both draft-24 and draft-25. If you’re delivering HTTP applications, I hope you take a look at the cutting edge and give HTTP/3 a try! You can learn more about deploying HTTP/3 on BIG-IP on our support page at K60235402: Overview of the BIG-IP HTTP/3 and QUIC profiles. -----[1] Despite rumors to the contrary, QUIC is not an acronym. [2] F5 doesn’t yet support QUIC mobility features. We're still in the midst of rolling out improvements.1.2KViews1like0CommentsInvestigating the LTM TCP Profile: Max Syn Retransmissions & Idle Timeout
Introduction The LTM TCP profile has over thirty settings that can be manipulated to enhance the experience between client and server. Because the TCP profile is applied to the virtual server, the flexibility exists to customize the stack (in both client & server directions) for every application delivered by the LTM. In this series, we will dive into several of the configurable options and discuss the pros and cons of their inclusion in delivering applications. Nagle's Algorithm Max Syn Retransmissions & Idle Timeout Windows & Buffers Timers QoS Slow Start Congestion Control Algorithms Acknowledgements Extended Congestion Notification & Limited Transmit Recovery The Finish Line Quick aside for those unfamiliar with TCP: the transmission controlprotocol (layer4) rides on top of the internetprotocol (layer3) and is responsible for establishing connections between clients and servers so data can be exchanged reliably between them. Normal TCP communication consists of a client and a server, a 3-way handshake, reliable data exchange, and a four-way close. With the LTM as an intermediary in the client/server architecture, the session setup/teardown is duplicated, with the LTM playing the role of server to the client and client to the server. These sessions are completely independent, even though the LTM can duplicate the tcp source port over to the server side connection in most cases, and depending on your underlying network architecture, can also duplicate the source IP. Max Syn Retransmission This option specifies the maximum number of times the LTM will resend a SYN packet without receiving a corresponding SYN ACK from the server. The default value was four in versions 9.0 - 9.3, and is three in versions 9.4+. This option has iRules considerations with the LB_FAILED event. One of the triggers for the event is an unresponsive server, but the timeliness of this trigger is directly related to the max syn retransmission setting. The back-off timer algorithm for SYN packets effectively doubles the wait time from the previous SYN, so the delay grows excessive with each additional retransmission allowed before the LTM closes the connection: Retransmission Timers v9.0-v9.3 v9.4 Custom-2 Custom-1 Initial SYN 0s 0s 0s 0s 1st Retransmitted SYN 3s 3s 3s 3s 2nd Retransmitted SYN 6s 6s 6s NA 3rd Retransmitted SYN 12s 12s NA NA 4th Retransmitted SYN 24s NA NA NA LB_FAILED triggered 45s 21s 9s 3s Tuning this option down may result in faster response on your LB_FAILED trigger, but keep in mind the opportunity for false positives if your server gets too busy. Note that monitors are the primary means to ensure available services, but the max syn retransmission setting can assist. If the LB_FAILED event does trigger, you can check the monitor status in your iRule, and if the monitor has not yet been marked down, you can do so to prevent other new connections from waiting: when LB_FAILED { if { [LB::status pool [LB::server pool] member [LB::server addr] eq "up"] } { LB::down } } Idle Timeout The explanation of the idle timeout is fairly intuitive. This setting controls the number of seconds the connection remains idle before the LTM closes it. For most applications, the default 300 seconds is more than enough, but for applications with long-lived connections like remote desktop protocol, the user may want to leave the desk and get a cup of coffee without getting dumped but the administrators don't want to enable keepalives. The option can be configured with a numeric setting in seconds, or can be set to indefinite, in which case the abandoned connections will sit idle until a reaper reclaims them or services are restarted. I try to isolate applications onto their own virtual servers so I can maximize the profile settings, but in the case where a wildcard virtual is utilized, the idle timeout can be set in an iRule with the IP::idle_timeout command: when CLIENT_ACCEPTED { switch [TCP::local_port] { "22" { IP::idle_timeout 600 } "23" { IP::idle_timeout 600 } "3389" { IP::idle_timeout 3600 } default { IP::idle_timeout 120 } } If you look at the connection table, the current and the maximum (in parentheses) idle values are shown: b conn client 10.1.1.1 show all | grep –v pkts VIRTUAL 10.1.1.100:3389 <-> NODE any6:any CLIENTSIDE 10.1.1.1:36023 <-> 10.1.1.100:3389 SERVERSIDE 10.1.1.100:36023 <-> 10.1.1.50:3389 PROTOCOL tcp UNIT 1 IDLE 124 (3600) LASTHOP 1 00:02:0a:13:ef:80 Next week, we'll take a look at windows and buffers.1.2KViews1like1CommentTuning TCP
In the previous few posts I’ve discussed the new congestion control algorithms and rate pacing features that are available in BIG-IP 11.5; but if you’re not ready to move to 11.5 there is still plenty that you can do to optimize your TCP profiles. Adjusting the Initial Congestion Window Size The initial congestion window is a key component of slow start, or the exponential growth phase. Historically the initial congestion window has been 2, which means that slow start ramps up from 2 to 4 to 8 to 16 to 32. In this scenario, it takes 3 round trips before 8 segments are transmitted. The majority of web transactions are on the small side (under 16 Kb), very short lived, and are completed before slow start has a chance to ramp up. As a result of the significant increase in bandwidth available to users, increasing the initial congestion window to 10 has been proposed. We have increased the initial congestion window size in some of our default profiles (depending on version); but this is a modification you can easily make to any profile in any version. Ignoring Packet Loss By default, all packet loss events are passed directly to congestion control, which considers this an indicator of congestion and slow down transmission. In some networks, such as wireless, packet loss may not be a reliable indicator of congestion. In such cases, ignoring a small percentage of background loss can be beneficial. There are two settings in the BIG-IP TCP profile that can be adjusted to ignore loss: Packet loss ignore rate and packet loss ignore burst. Packet loss ignore rate allows you to specify the percentage of loss to ignore to prevent congestion control from kicking in and altering the transmission of packets. This is very useful when there is intermittent or stray packet drops on an uncongested network. If the network normally experiences a high degree of congestion, it is not recommended to configure this as it can be too aggressive and cause more packet loss. The packet loss ignore burst setting provides an exception value for the ignore rate loss parameter. If the connection sees a consecutive number of packet drops but the rate has not exceeded the ignore rate value; congestion control should kick in. A consecutive number of packet drops is evidence of a tail drop and should be considered a loss event. When configuring packet loss ignore rate, it is suggested that the packet loss ignore burst be set to a value between 6 and 12. Adjusting for SSL TCP profiles on SSL virtual servers should have Nagle and delayed ACKs disabled as this can create stalls. If your virtual server has a mixture of SSL and non-SSL traffic, Nagle can be disabled via an iRule when the SSL connection is detected.616Views1like3Comments