performance
2002 TopicsTroubleshooting high CPU utilisation on BIG-IP systems
Introduction This is not really a step-by-step troubleshooting guide. What I'm sharing here is the result of reverse engineering the kind of knowledge that led me to succeed on troubleshooting CPU issues during the time I worked for Engineering Services department at F5. Here's what I'll cover sequentially with a mix of what we should know and where to find the problem: Know what HyperThreading (HT) is Know how HT is used within F5 Find out if F5 box supports HyperThreading (HT) Know the difference between Forwarding plane (TMM) vs Control plane (Linux) CPU consumption Confirm if the problem is TMM or another daemon Where to look further when TMM CPU is high What if it's a control plane daemon? Learn how to interpret graphs High CPU in non-HT boxes High CPU in HT+ boxes Use scripts when necessary to collect real time data 1. Know what HyperThreading (HT) is Physical core, as the name implies, is a physical CPU core connected to mothership's socket Physical CPU core has several execution units (modules) capable of performing different tasks e.g. basic integer maths, another for more advanced maths, loading and storing data from/to memory, etc. HT uses 2 or more logical CPU cores to use execution units that are not being utilised by process A, so process B can use them if needed. When 2 programs want to use the same part of the physical core, then it's inevitable that one of them will have to wait The Operating System (OS) scheduler decides which process gets execution priority in this case This is when 2 (or more) actual physical cores would perform better as this limitation is not present i.e. 2 physical cores would be able to concurrently perform tasks using their own execution units 2. Know how HT is used within F5 Before BIG-IP v11.5.0 on systems with HyperThreading (HT) Technology, we would have: 1 TMM per logical core Each logical core processes both data plane (TMM) and control plane (Linux) tasks v11.5.0+ (affects only processors with HT Technology) Data plane (TMM) reside in even-numbered cores (0, 2, 4, etc) Control plane cores (Linux) reside in odd-numbered cores (1, 3, 5, etc) When TMM reaches 80% of actual CPU utilisation, odd-numbered cores limit control plane tasks so they can only use up to 20% of CPU capacity, allowing remaining to be used by overloaded forwarding plane (TMM). vCMP host must also be using v11.5.0+ or newer in order for guests to use HTSplit technology. We can disable it manually by issuing the following command: 3 Find out if your box supports HyperThreading (HT) The hardware boxes listed with HT+ in K14358, all support HyperThreading technology. Here's how to check the number of cores in a given BIG-IP box (this is a VIPRION C2200 chassis with 2250 blade installed): The above box is able to run 2 threads per physical core (Thread(s) per core) with a total of 10 physical cores (Core(s) per socket) and a total of 20 (logical) cores (CPU(s)). Here's the same output from a 3900 series box that does not support HT: The above box is able to run 1 thread per physical core (Thread(s) per core) with a total of 4 physical cores (Core(s) per socket) and a total of 4 cores (CPU(s)). 4 Know the difference between Forwarding plane (TMM) vs Control plane (Linux) CPU consumption 4.1 Confirming if it's TMM or Linux BIG-IP's forwarding plane is TMM. TMM is a daemon/process within Linux space. If tmm CPU usage is high, then we know high CPU utilisation is a forwarding plane issue. The other daemons are part of BIG-IP's control plane (e.g. bigd - monitoring daemon). In this example, both tmm (102.3%) and bigd (51.8%) are high here: If TMM CPU utilisation is high, we will need to troubleshoot CPU usage of internal TMM components. For other daemons, there are different places to look. For example, for bigd (monitoring daemon), we need to check BIG-IP's monitors. AskF5 has a nice how-to guide here. Here's a list of BIG-IP daemons. 4.2 TMM CPU utilisation or forwarding plane CPU utilisation Checktmsh show ltm virtual<virtual server name> to confirm if there is a particular virtual server eating up tmm CPU cycles: Check iRules Checktmsh sys tmm-infoto see the breakdown of TMM cpu utilisation per tmm: 4.3 Linux CPU utilisation or data plane CPU utilisation For anything else apart from TMM,topoutput is your best friend for confirmation of which daemon is the culprit. tmsh show sys proc-infois also another command we can use to gather process specific CPU information. Here I'm checking bigd's monitoring daemon information: 5. Learn how to interpret graphs 5.1 High CPU in non-HT boxes The below graph is just an example taken from 3900 box that doesn't have HT split Because graphs are generated based on average cpu utilisation then we can assume that cpu utilisation is very high at times Because there is no HT-split the below cpu utilisation can be either due to TMM or due some other Linux daemon We can confirm usingtopcommand In the below graph it was due to bothtmmandbigd to confirm normal usage we always try to match with other numbers in the graph (e.g. active connections, etc) Note: this is a graph as seen in qkview (Clicking on System > Support) which takes a snapshot of the system. It can then be uploaded to ihealth and is mostly used to sharing snapshot of BIG-IP systems with F5 support. However, the graph here is used for illustrative purposes to understand CPU utilisation as seen in graphs. 5.2 High CPU in HT+ boxes This other graph here was taken from a 4200 series box which has HT split enabled Notice that CPU cores 0, 2, 4 and 6 (tmm/data plane) show CPU at about 60% Cores 1, 3, 5 and 7 show very minimal CPU utilisation with some spikes Spikes can be due to AVR/ASM daemons described inK16469andK15606 Or because TMM has reached 80% of cpu utilisation and is now using control plane's cores This is an example of mostly normal/regular cpu utilisation When HT is enabled and TMM cores use less than 80% of cpu, then data-plane cores remain mostly 'quiet'. 6. Use scripts when necessary to collect real time data Sometimes just by looking at the graphs and commands is not enough to determine why CPU is high. Here's an example of a script to collect real-time TMM/Linux CPU stats on BIG-IP every 60 seconds and copy output to /var/log/cpu-average.log top command output is also copied to /var/log/top-output.log: Output should be similar to this: The number after "Counter64" is the percentage value representing how busy each CPU core is. For example, TMM0.0 and TMM0.1 are both at 1% of capacity. We can add H to top command (e.g. top -Hcbn 1) in the script above to show the individual threads of a process, including TMM threads. When opening a support case with F5, it may be useful to include the full tmctl table as it contains roughly all raw data about everything we can possibly find on BIG-IP system. The below is an example of a script that collects all tmctl information every 5 seconds: Apart from knowing where to look, understanding the CPU usage pattern when it comes to our own organisation's production traffic is really important. It enables us to compare, for example, the number of active connections with a spike in CPU in the graphs to understand if the spike is related to a sudden and sharp increase in traffic.17KViews6likes3Commentstcpdump portrange option
Hi everyone, I'm trying to capture traffic directed to a certain range of tcp ports with tcpdump. When using the "portrange" expression I get a syntax error: tcpdump -i -s0 -w capture_file.trc portrange 8080-8082 tcpdump: syntax error in filter expression Is this expression supported on BIG-IP (1600 10.2.4 HF5)? Thanks in advance, Regards. moog67Solved13KViews0likes8CommentsURL rewrite through iRule
Hi Guys, i have one "Performance (HTTP)" virtual server on F5-1600 series, and i want to change the URL "http://www.abc.com" to "http://partner.abc.com/xyz". i have tried all below scripts : 1- when HTTP_REQUEST { if {([string tolower [HTTP::host]] equals "http://www.abc.com")}{ HTTP::header replace Host "http://partner.abc.com/xyz" } } 2- when HTTP_REQUEST { if { not ([HTTP::uri] starts_with "/xyz") } { HTTP::uri /xyz[HTTP::uri] } } 3- when HTTP_REQUEST { if {[HTTP::uri] equals {http://www.abc.com}} {HTTP::uri {http://partner.abc.com/xyz} } } but i wasn't successful! can anyone help me how can i do this through iRule ?Solved9.1KViews0likes27CommentsOffline (Enabled) - The children pool member(s) are down
Hi Friends, I am novice to F5 and following CBT Nuggets to understand LTM in a better way. I have completed basic configuration i.e defined Nodes, defined Pool and assigned Pool Members to my Pool. Now the problem is that I have enabled "http" health monitor and right after I click 'finished' the icon Transitions from 'Blue Square' to 'Rectangle Red' - Offline(Enabled) - The children pool member(s) are down when I hover over the Pool in 'Pool List'. Now this is a very basic setup with 3 .OVA web servers pre configured which I received in my Nuggetlabs. I am able to login to the servers using my browser, telnet 10.2.0.11 80 and curl http://10.2.0.11 commands but the Servers are showing as Offline(Enabled) - Pool member has been marked down by a monitor in 'Members' list. I need your help to proceed further please. Thanks in advance, SagarSolved8.5KViews0likes10CommentsPriority Group Activation, why use it?
If you have 3 members that are active within a pool and you select round robin LB method and set the priority group activation for less than 2, the virtual server is only going to use two of three nodes to round robin, why on earth would you configure this? this is an example in the f5 university. Wouldn't the goal be to use all active nodes within the pool and not limit it to use two of the three?7.2KViews0likes8CommentsThe Disadvantages of DSR (Direct Server Return)
I read a very nice blog post yesterday discussing some of the traditional pros and cons of load-balancing configurations. The author comes to the conclusion that if you can use direct server return, you should. I agree with the author's list of pros and cons; DSR is the least intrusive method of deploying a load-balancer in terms of network configuration. But there are quite a few disadvantages missing from the author's list. Author's List of Disadvantages of DSR The disadvantages of Direct Routing are: Backend server must respond to both its own IP (for health checks) and the virtual IP (for load balanced traffic) Port translation or cookie insertion cannot be implemented. The backend server must not reply to ARP requests for the VIP (otherwise it will steal all the traffic from the load balancer) Prior to Windows Server 2008 some odd routing behavior could occur in In some situations either the application or the operating system cannot be modified to utilse Direct Routing. Some additional disadvantages: Protocol sanitization can't be performed. This means vulnerabilities introduced due to manipulation of lax enforcement of RFCs and protocol specifications can't be addressed. Application acceleration can't be applied. Even the simplest of acceleration techniques, e.g. compression, can't be applied because the traffic is bypassing the load-balancer (a.k.a. application delivery controller). Implementing caching solutions become more complex. With a DSR configuration the routing that makes it so easy to implement requires that caching solutions be deployed elsewhere, such as via WCCP on the router. This requires additional configuration and changes to the routing infrastructure, and introduces another point of failure as well as an additional hop, increasing latency. Error/Exception/SOAP fault handling can't be implemented. In order to address failures in applications such as missing files (404) and SOAP Faults (500) it is necessary for the load-balancer to inspect outbound messages. Using a DSR configuration this ability is lost, which means errors are passed directly back to the user without the ability to retry a request, write an entry in the log, or notify an administrator. Data Leak Prevention can't be accomplished. Without the ability to inspect outbound messages, you can't prevent sensitive data (SSN, credit card numbers) from leaving the building. Connection Optimization functionality is lost. TCP multiplexing can't be accomplished in a DSR configuration because it relies on separating client connections from server connections. This reduces the efficiency of your servers and minimizes the value added to your network by a load balancer. There are more disadvantages than you're likely willing to read, so I'll stop there. Suffice to say that the problem with the suggestion to use DSR whenever possible is that if you're an application-aware network administrator you know that most of the time, DSR isn't the right solution because it restricts the ability of the load-balancer (application delivery controller) to perform additional functions that improve the security, performance, and availability of the applications it is delivering. DSR is well-suited, and always has been, to UDP-based streaming applications such as audio and video delivered via RTSP. However, in the increasingly sensitive environment that is application infrastructure, it is necessary to do more than just "load balancing" to improve the performance and reliability of applications. Additional application delivery techniques are an integral component to a well-performing, efficient application infrastructure. DSR may be easier to implement and, in some cases, may be the right solution. But in most cases, it's going to leave you simply serving applications, instead of delivering them. Just because you can, doesn't mean you should.5.9KViews0likes4CommentsApplication Web Pages Not Being Served Correctly by F5
Hi, One of our customers has an application that doesn't appear to perform very well when load-balanced by the F5. The application is currently using a Standard VS profile, which is not doing SSL offload, uses cookie persistence and a SNAT pool with a single IP address and pretty much everything else is default. We have recently applied a Web Acceleration profile to the VS to attempt to address the problem but it doesn't appear to have solved anything. The WA profile is only set to cache and serve up static CSS and JS files. The major issue, we believe, is that the client fails to receive some of the Javascript that is necessary for the page to render correctly. This was the case prior to the WA profile being applied as well as after. The application used to be load-balanced, in a very rudimentary way, by iptables and these issues were not seen then. I'm very keen to find any clue as to where to look on the F5 for what could be causing the problem. I'm considering changing the profile to Perf L4 to see if it helps but there are two problems with that: 1. I don't get to learn what was causing the problem 2. I think the client wants to have the F5 do SSL offload in the near future Any help would be greatly appreciated. Thanks in advance, Ben4.5KViews0likes22CommentsSlow application performance when using BIG-IP LTM VE for load balancing.
The Question Is there some limitation on the BIG-IP F5 LTM that prevents it from being able to support a large user load and/or are there any configurations that could specifically interfere with performance of an application? Supporting Information I am load testing our application, and wanting to utilize our virtual F5 as the load balancer for the various clusters. Currently I am seeing slow response times when using a Virtual F5 as the load balancer. The slow performance is most apparent when the system is under load. While I can see slow performance in single user tests, it is not as extreme as when the system is under a large user load. When using other load balancing options (in our specific case, NLB) I am not seeing the same slow performance. I have tried a variaty of debugging steps, but none have really helped me to put a finger on a solution to the problem. 1. Checking VM resource usage All VMs show acceptable resource consumption and availability (no VMs appear to be strained... this includes the F5 VM.) 2. Checking ESX host usage All ESX hosts show acceptable resource consumption and availability (no ESX hosts appear to be strained.) 3. Adjustments to F5 configuration Disabled oneConnect for the F5 virtual server configuration. There was an issue discovered earlier with using oneConnect with our application and the 11.x versions of F5. This change did not have any obvious affect on test results. 4. Test validation: We have ran the same test with another load balancer set-up. This set-up shows acceptable response times with that configuration. Is there perhaps a limitation with the virtual server as to the max number of users we can expect to support? If so what might this number be? 5. Comparison against other environments I have compared configurations between an environment using a physical F5 to the the virtual F5 setup I am having issues with. I am not seeing any noticeable differences that would potentially cause issues with performance. It should be noted that using the physical F5 is yielding expected response time performance. 6. Analytics from the F5 console Monitoring Latency of pool members (while under load) is showing an average latency of 1,000+ ms. This seems high... particularly for a virtualized environment. Is this perhaps an F5 VM limitation, or is there perhaps something at an F5 configuration level that we are overlooking, that could cause this? 7. Changing vNIC type It had been suggested that the particular vNIC being used may be a possible source of bad behavior. As such, the vNIC type was changed from Intel to vmxnet3. Current test executions thus far have not shown any noticeable change, but there are additional tests to be executed in this avenue. 8. DynaTrace Analysis This is the latest testing being done, and as such the results are still under analysis. However, initial test runs suggest that the majority of the extra time is being spent in two points: Requests between load test agents and Requests between Java Tier and IIS host This seems to suggest that the F5 is somehow bottle-necking the request. Version Information F5 Version: BIG-IP v11.3.0 (Build 2806.0) Node OS version (Windows 2008 r2) VMWare Tools Version: Version 9.4.0, build-1280544 Thank you in advance for any help that can be provided.Solved4.5KViews0likes7CommentsHTTP Pipelining: A security risk without real performance benefits
Everyone wants web sites and applications to load faster, and there’s no shortage of folks out there looking for ways to do just that. But all that glitters is not gold, and not all acceleration techniques actually do all that much to accelerate the delivery of web sites and applications. Worse, some actual incur risk in the form of leaving servers open to exploitation. A BRIEF HISTORY Back in the day when HTTP was still evolving, someone came up with the concept of persistent connections. See, in ancient times – when administrators still wore togas in the data center – HTTP 1.0 required one TCP connection for every object on a page. That was okay, until pages started comprising ten, twenty, and more objects. So someone added an HTTP header, Keep-Alive, which basically told the server not to close the TCP connection until (a) the browser told it to or (b) it didn’t hear from the browser for X number of seconds (a time out). This eventually became the default behavior when HTTP 1.1 was written and became a standard. I told you it was a brief history. This capability is known as a persistent connection, because the connection persists across multiple requests. This is not the same as pipelining, though the two are closely related. Pipelining takes the concept of persistent connections and then ignores the traditional request – reply relationship inherent in HTTP and throws it out the window. The general line of thought goes like this: “Whoa. What if we just shoved all the requests from a page at the server and then waited for them all to come back rather than doing it one at a time? We could make things even faster!” Tada! HTTP pipelining. In technical terms, HTTP pipelining is initiated by the browser by opening a connection to the server and then sending multiple requests to the server without waiting for a response. Once the requests are all sent then the browser starts listening for responses. The reason this is considered an acceleration technique is that by shoving all the requests at the server at once you essentially save the RTT (Round Trip Time) on the connection waiting for a response after each request is sent. WHY IT JUST DOESN’T MATTER ANYMORE (AND MAYBE NEVER DID) Unfortunately, pipelining was conceived of and implemented before broadband connections were widely utilized as a method of accessing the Internet. Back then, the RTT was significant enough to have a negative impact on application and web site performance and the overall user-experience was improved by the use of pipelining. Today, however, most folks have a comfortable speed at which they access the Internet and the RTT impact on most web application’s performance, despite the increasing number of objects per page, is relatively low. There is no arguing, however, that some reduction in time to load is better than none. Too, anyone who’s had to access the Internet via high latency links can tell you anything that makes that experience faster has got to be a Good Thing. So what’s the problem? The problem is that pipelining isn’t actually treated any differently on the server than regular old persistent connections. In fact, the HTTP 1.1 specification requires that a “server MUST send its responses to those requests in the same order that the requests were received.” In other words, the requests are return in serial, despite the fact that some web servers may actually process those requests in parallel. Because the server MUST return responses to requests in order that the server has to do some extra processing to ensure compliance with this part of the HTTP 1.1 specification. It has to queue up the responses and make certain responses are returned properly, which essentially negates the performance gained by reducing the number of round trips using pipelining. Depending on the order in which requests are sent, if a request requiring particularly lengthy processing – say a database query – were sent relatively early in the pipeline, this could actually cause a degradation in performance because all the other responses have to wait for the lengthy one to finish before the others can be sent back. Application intermediaries such as proxies, application delivery controllers, and general load-balancers can and do support pipelining, but they, too, will adhere to the protocol specification and return responses in the proper order according to how the requests were received. This limitation on the server side actually inhibits a potentially significant boost in performance because we know that processing dynamic requests takes longer than processing a request for static content. If this limitation were removed it is possible that the server would become more efficient and the user would experience non-trivial improvements in performance. Or, if intermediaries were smart enough to rearrange requests such that they their execution were optimized (I seem to recall I was required to design and implement a solution to a similar example in graduate school) then we’d maintain the performance benefits gained by pipelining. But that would require an understanding of the application that goes far beyond what even today’s most intelligent application delivery controllers are capable of providing. THE SILVER LINING At this point it may be fairly disappointing to learn that HTTP pipelining today does not result in as significant a performance gain as it might at first seem to offer (except over high latency links like satellite or dial-up, which are rapidly dwindling in usage). But that may very well be a good thing. As miscreants have become smarter and more intelligent about exploiting protocols and not just application code, they’ve learned to take advantage of the protocol to “trick” servers into believing their requests are legitimate, even though the desired result is usually malicious. In the case of pipelining, it would be a simple thing to exploit the capability to enact a layer 7 DoS attack on the server in question. Because pipelining assumes that requests will be sent one after the other and that the client is not waiting for the response until the end, it would have a difficult time distinguishing between someone attempting to consume resources and a legitimate request. Consider that the server has no understanding of a “page”. It understands individual requests. It has no way of knowing that a “page” consists of only 50 objects, and therefore a client pipelining requests for the maximum allowed – by default 100 for Apache – may not be seen as out of the ordinary. Several clients opening connections and pipelining hundreds or thousands of requests every second without caring if they receive any of the responses could quickly consume the server’s resources or available bandwidth and result in a denial of service to legitimate users. So perhaps the fact that pipelining is not really all that useful to most folks is a good thing, as server administrators can disable the feature without too much concern and thereby mitigate the risk of the feature being leveraged as an attack method against them. Pipelining as it is specified and implemented today is more of a security risk than it is a performance enhancement. There are, however, tweaks to the specification that could be made in the future that might make it more useful. Those tweaks do not address the potential security risk, however, so perhaps given that there are so many other optimizations and acceleration techniques that can be used to improve performance that incur no measurable security risk that we simply let sleeping dogs lie. IMAGES COURTESTY WIKIPEDIA COMMONS4.5KViews0likes5CommentsBack to Basics: Least Connections is Not Least Loaded
#webperf #ado When load balancing, "least connections" does not mean "least loaded" Performance is important, and that means it's important that our infrastructure support the need for speed. Load balancing algorithms are an integral piece of the performance equation and can both improve - or degrade - performance. That's why it's important to understand more about the algorithms than their general selection mechanism. Understanding that round robin is basically an iterative choice, traversing a list one by one is good - but understanding what that means in terms of performance and capacity on different types of applications and application workloads is even better. We last checked out "fastest response time" and today we're diving into "least connections" which, as stated above, does not mean "least loaded." INTRA-APPLICATION WORKLOADS The industry standard "Least connections" load balancing algorithm uses the number of current connections to each application instance (member) to make its load balancing decision. The member with the least number of active connections is chosen. Pretty simple, right? The premise of this algorithm is a general assumption that fewer connections (and thus fewer users) means less load and therefore better performance. That's operational axiom #2 at work - if performance decreases as load increases it stands to reason that performance increases as load decreases. That would be true (and in the early days of load balancing it was true) if all intra-application workloads required the same resources. Unfortunately, that's no longer true and the result is uneven load distribution that leads to unpredictable performance fluctuations as demand increases. Consider a simple example: a user logging into a system takes at least one if not more database queries to validate credentials and then update the system to indicate the activity. Depending on the nature of the application, other intra-application activities will require different quantities of resources. Some are RAM heavy, others CPU heavy, others file or database heavy. Furthermore, depending on the user in question, the usage pattern will vary greatly. One hundred users can be logged into the same system (requiring at a minimum ten connections) but if they're all relatively idle, the system will be lightly loaded and performing well. Conversely, another application instance may boast only 50 connections, but all fifty users are heavily active with database queries returning large volumes of data. The system is far more heavily loaded and performance may be already beginning to suffer. When the next request comes in, however, the load balancer using a "least connections" algorithm will choose the latter member, increasing the burden on that member and likely further degrading performance. The premise of the least connections algorithm is that the application instance with the fewest number of connections is the least loaded. Except, it's not. The only way to know which application instance is the least loaded is to monitor its system variables directly, gathering CPU utilization and memory and comparing it against known maximums. That generally requires either SNMP, agents, or other active monitoring mechanisms that can unduly tax the system in and of itself by virtue of consuming resources. This is a quandary for operations, because "application workload" is simply too broad a generalization. Certainly some applications are more I/O heavy than others, still others are more CPU or connection heavy. But all applications have both a general workload profile and an intra-application workload profile. Understanding the usage patterns - the intra-application workload profile - of an application is critical to being able to determine how best to not only choose a load balancing algorithm but specify any limitations that may provide better overall performance and use of capacity during execution. As always, being aware of the capabilities and the limitations of a given load balancing algorithm will assist in choosing one that is best able to meet the performance and availability requirements of an application (and thus the business).4.3KViews0likes2Comments