Investigating the LTM TCP Profile: Nagle’s Algorithm

Introduction

The LTM TCP profile has over thirty settings that can be manipulated to enhance the experience between client and server.  Because the TCP profile is applied to the virtual server, the flexibility exists to customize the stack (in both client & server directions) for every application delivered by the LTM.  In this series, we will dive into several of the configurable options and discuss the pros and cons of their inclusion in delivering applications.

  1. Nagle's Algorithm
  2. Max Syn Retransmissions & Idle Timeout
  3. Windows & Buffers
  4. Timers
  5. QoS
  6. Slow Start
  7. Congestion Control Algorithms
  8. Acknowledgements
  9. Extended Congestion Notification & Limited Transmit Recovery
  10. The Finish Line

Quick aside for those unfamiliar with TCP: the transmission control protocol (layer 4) rides on top of the internet protocol (layer 3) and is responsible for establishing connections between clients and servers so data can be exchanged reliably between them. 

Normal TCP communication consists of a client and a server, a 3-way handshake, reliable data exchange, and a four-way close.  With the LTM as an intermediary in the client/server architecture, the session setup/teardown is duplicated, with the LTM playing the role of server to the client and client to the server.  These sessions are completely independent, even though the LTM can duplicate the tcp source port over to the server side connection in most cases, and depending on your underlying network architecture, can also duplicate the source IP.

Nagle's Algorithm, defined in RFC896, is a congestion control mechanism designed to bundle smaller chunks of data for delivery in one big packet. The algorithm: 


if there is new data to send
  if the window size >= MSS and available data is >= MSS
    send complete MSS segment now
  else
    if there is unconfirmed data still in the pipe
      enqueue data in the buffer until an acknowledge is received
    else
       send data immediately
     end if
   end if
end if


Sending packets with 40 bytes of overhead to carry little data is very inefficient, and Nagle's was created to address this inefficiency.  Efficiency, however, is not the only consideration.  Delay-sensitive applications such as remote desktop protocol can be severely impacted by Nagle's.  An RDP user connecting to a terminal server expects real-time movement on the desktop presentation, but with Nagle's enabled, the sending station will queue the content if there is additional data coming, which can be perceived as the network being slow, when in actuality, it is performing as expected. 

Even for non-real-time applications, there can be a noticable difference on the wire, even if the end user is oblivious to the performance gain.  This can come in to play with automated performance scripts that enable thresholds.  For example, in one installation a first generation load balancer was scheduled to be replaced.  All TCP was simply passed by the load balancer, so the controlled optimization points were isolated to the servers.  The server TCP stacks were tuned with the help of a couple monitoring tools: one that measured the time to paint the front page of the application, and one to perform a transaction within the application.  During testing, inserting the LTM with the default tcp profile negated the optimizations performed on the server TCP stacks and the tools alerted the administrators accordingly with a twofold drop in performance.  Disabling Nagle's alone resulted in a significant improvement from the default profile, but the final configuration included additional options, which will be discussed in the coming weeks.

One warning: Nagle's and delayed acknowledgements do not play well in the same sandbox.  There's a good analysis here and a commentary on their interactivity by Mr Nagle himself here.

In conclusion, Nagle's algorithm can make your bandwidth utilization more effective in relation to packet overhead, but a careful analysis of the overall architecture will help in deciding if you should enable it.

 

Updated Nov 30, 2023
Version 3.0
  • Nice write up Jason, as usual. One note - the real big issue with Nagling is that delayed ACK problem.

     

     

    Check out line 5 of the algorithm above. The line "if there is unconfirmed data still in the pipe" is the specific bit that doesn't play well with delayed ACKs: if there's an ACK outstanding Nagle will buffer. If it gets one, it'll immediately send the data. But now, you've got delayed ACK waiting for data before it sends an ACK! Then you're stuck - you're buffering on the send side (Nagle), but waiting for that delayed ACK timer to fire on the receive side.

     

     

    Also, that delayed ACK timer is set on bootstrap and it can fire at any time between 1-500ms (RFC states no more than 500) when the ACK is being delayed on the receive side. So you may not get totally clean and predictable stalls - you'll just know that it's not performing well.

     

     

    One last bit, regarding BigIP. If you disable Nagle, you may also want to enable "Acknowledge on Push" and test. I've seen dramatic improvements when these are done together...

     

     

    --Matt
  • If Delayed ACK and Nagle are better not enabled simultaneously, why is it that in TMOS 11.1 in the tcp-wan-optimized profile both options are enabled?

     

     

    --Frank
  • That's a good question, Frank. My experience has been to keep nagle's disabled, but the product development team tests far more scenarios than I do on what makes an overall more stable and performant stack. Best practice is to test with your app and tune as necessary. Each app impacts tcp differently.
  • This should really be incorporated into or mentioned in the F5 - RDP deployment guide.

     

    It definitely makes a huge difference with the RDP experience if Nagle’s Algorithm is disabled.
  • Hi Joe, I'll pass along your recommendation to the group responsible for the deployment guides
  • Hi Joe, the deployment guides are being updated based on your feedback! Nice work!
  • Thanks a lot for this post.

     

    Nagles caused us a lot of issues over several months. Finally I managed to find this post mention that it could have severe impact on remote desktop.

     

    Once disabled it, it all works smoothly, no latency issues.

     

     

    The whitepapers / deployment guide mention that a optimized lan profile should be used. By some reason we changed this back to default tcp profile while trouble-shooting not respond issued with MS Exchange on the RDS-servers. And by that me introduced new issues without really realize it.

     

     

    Once again, thanks for an excellent post and information on what the impact might be.