cpu
14 TopicsTroubleshooting high CPU utilisation on BIG-IP systems
Introduction This is not really a step-by-step troubleshooting guide. What I'm sharing here is the result of reverse engineering the kind of knowledge that led me to succeed on troubleshooting CPU issues during the time I worked for Engineering Services department at F5. Here's what I'll cover sequentially with a mix of what we should know and where to find the problem: Know what HyperThreading (HT) is Know how HT is used within F5 Find out if F5 box supports HyperThreading (HT) Know the difference between Forwarding plane (TMM) vs Control plane (Linux) CPU consumption Confirm if the problem is TMM or another daemon Where to look further when TMM CPU is high What if it's a control plane daemon? Learn how to interpret graphs High CPU in non-HT boxes High CPU in HT+ boxes Use scripts when necessary to collect real time data 1. Know what HyperThreading (HT) is Physical core, as the name implies, is a physical CPU core connected to mothership's socket Physical CPU core has several execution units (modules) capable of performing different tasks e.g. basic integer maths, another for more advanced maths, loading and storing data from/to memory, etc. HT uses 2 or more logical CPU cores to use execution units that are not being utilised by process A, so process B can use them if needed. When 2 programs want to use the same part of the physical core, then it's inevitable that one of them will have to wait The Operating System (OS) scheduler decides which process gets execution priority in this case This is when 2 (or more) actual physical cores would perform better as this limitation is not present i.e. 2 physical cores would be able to concurrently perform tasks using their own execution units 2. Know how HT is used within F5 Before BIG-IP v11.5.0 on systems with HyperThreading (HT) Technology, we would have: 1 TMM per logical core Each logical core processes both data plane (TMM) and control plane (Linux) tasks v11.5.0+ (affects only processors with HT Technology) Data plane (TMM) reside in even-numbered cores (0, 2, 4, etc) Control plane cores (Linux) reside in odd-numbered cores (1, 3, 5, etc) When TMM reaches 80% of actual CPU utilisation, odd-numbered cores limit control plane tasks so they can only use up to 20% of CPU capacity, allowing remaining to be used by overloaded forwarding plane (TMM). vCMP host must also be using v11.5.0+ or newer in order for guests to use HTSplit technology. We can disable it manually by issuing the following command: 3 Find out if your box supports HyperThreading (HT) The hardware boxes listed with HT+ in K14358, all support HyperThreading technology. Here's how to check the number of cores in a given BIG-IP box (this is a VIPRION C2200 chassis with 2250 blade installed): The above box is able to run 2 threads per physical core (Thread(s) per core) with a total of 10 physical cores (Core(s) per socket) and a total of 20 (logical) cores (CPU(s)). Here's the same output from a 3900 series box that does not support HT: The above box is able to run 1 thread per physical core (Thread(s) per core) with a total of 4 physical cores (Core(s) per socket) and a total of 4 cores (CPU(s)). 4 Know the difference between Forwarding plane (TMM) vs Control plane (Linux) CPU consumption 4.1 Confirming if it's TMM or Linux BIG-IP's forwarding plane is TMM. TMM is a daemon/process within Linux space. If tmm CPU usage is high, then we know high CPU utilisation is a forwarding plane issue. The other daemons are part of BIG-IP's control plane (e.g. bigd - monitoring daemon). In this example, both tmm (102.3%) and bigd (51.8%) are high here: If TMM CPU utilisation is high, we will need to troubleshoot CPU usage of internal TMM components. For other daemons, there are different places to look. For example, for bigd (monitoring daemon), we need to check BIG-IP's monitors. AskF5 has a nice how-to guide here. Here's a list of BIG-IP daemons. 4.2 TMM CPU utilisation or forwarding plane CPU utilisation Checktmsh show ltm virtual<virtual server name> to confirm if there is a particular virtual server eating up tmm CPU cycles: Check iRules Checktmsh sys tmm-infoto see the breakdown of TMM cpu utilisation per tmm: 4.3 Linux CPU utilisation or data plane CPU utilisation For anything else apart from TMM,topoutput is your best friend for confirmation of which daemon is the culprit. tmsh show sys proc-infois also another command we can use to gather process specific CPU information. Here I'm checking bigd's monitoring daemon information: 5. Learn how to interpret graphs 5.1 High CPU in non-HT boxes The below graph is just an example taken from 3900 box that doesn't have HT split Because graphs are generated based on average cpu utilisation then we can assume that cpu utilisation is very high at times Because there is no HT-split the below cpu utilisation can be either due to TMM or due some other Linux daemon We can confirm usingtopcommand In the below graph it was due to bothtmmandbigd to confirm normal usage we always try to match with other numbers in the graph (e.g. active connections, etc) Note: this is a graph as seen in qkview (Clicking on System > Support) which takes a snapshot of the system. It can then be uploaded to ihealth and is mostly used to sharing snapshot of BIG-IP systems with F5 support. However, the graph here is used for illustrative purposes to understand CPU utilisation as seen in graphs. 5.2 High CPU in HT+ boxes This other graph here was taken from a 4200 series box which has HT split enabled Notice that CPU cores 0, 2, 4 and 6 (tmm/data plane) show CPU at about 60% Cores 1, 3, 5 and 7 show very minimal CPU utilisation with some spikes Spikes can be due to AVR/ASM daemons described inK16469andK15606 Or because TMM has reached 80% of cpu utilisation and is now using control plane's cores This is an example of mostly normal/regular cpu utilisation When HT is enabled and TMM cores use less than 80% of cpu, then data-plane cores remain mostly 'quiet'. 6. Use scripts when necessary to collect real time data Sometimes just by looking at the graphs and commands is not enough to determine why CPU is high. Here's an example of a script to collect real-time TMM/Linux CPU stats on BIG-IP every 60 seconds and copy output to /var/log/cpu-average.log top command output is also copied to /var/log/top-output.log: Output should be similar to this: The number after "Counter64" is the percentage value representing how busy each CPU core is. For example, TMM0.0 and TMM0.1 are both at 1% of capacity. We can add H to top command (e.g. top -Hcbn 1) in the script above to show the individual threads of a process, including TMM threads. When opening a support case with F5, it may be useful to include the full tmctl table as it contains roughly all raw data about everything we can possibly find on BIG-IP system. The below is an example of a script that collects all tmctl information every 5 seconds: Apart from knowing where to look, understanding the CPU usage pattern when it comes to our own organisation's production traffic is really important. It enables us to compare, for example, the number of active connections with a spike in CPU in the graphs to understand if the spike is related to a sudden and sharp increase in traffic.17KViews6likes3Commentshigh cpu usage independent from Traffic
Hello, we've recognised since a few weeks every day for about 4 hours from 9 to 13 very high cpu-usage on Control-Plane and Analysis-Plane. Overall concurrent Client-side connections between 1200 and 1800 That's also on the standby-Machine, so it's independent from Traffic (this F5 is for Traffic from Web and terminates ssl) the hardware is i4800, but it's the same on our virtual Test-Machine Version: 16.1.3.3, on Test: 16.1.3.4 Any hint, where to look for the cause? Thank you KarlSolved4.2KViews0likes13CommentsUnexpected CPU utilization on LTM
Hello I am testing 2 different LTMs(VE) and would like to compere the CPU performance to select CPUs for our customer enviroment. BIG-IP 12.1.1 Build 0.0.184 Final 8core BIG-IP 13.1.1 Build 0.0.4 Final 12core During that load testing(1000TPS for SSL), I noticed that the CPU usage of Lower version LTM(8cores) is at 40%, but higher version LTM(12cores) is at 50%. I expected that 12 cores LTM had lower consumption of CPU than 8cores, these results were completely the opposit. Could someone tell me why did that turn out like this ? Thanks in advance for the help.999Views0likes6CommentsWhich CPU is the Analysis plane on F5 chassis ?
Based on this article https://support.f5.com/csp/article/K15468, All even CPUs are TMM, All odd CPUs are other process except the last one(analysis plane). Is the article valid for F5 chassis. I would like to know which blade's CPU on a chassis handle analysis plane ? Example My F5 CPUs F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."1".1 = Gauge32: 6 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."1".2 = Gauge32: 8 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."1".3 = Gauge32: 3 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."1".4 = Gauge32: 3 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."1".5 = Gauge32: 11 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."1".6 = Gauge32: 3 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."1".7 = Gauge32: 15 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."1".8 = Gauge32: 1 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."1".9 = Gauge32: 10 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."1".10 = Gauge32: 1 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."1".11 = Gauge32: 10 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."1".12 = Gauge32: 5 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."1".13 = Gauge32: 8 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."1".14 = Gauge32: 2 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."1".15 = Gauge32: 14 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."1".16 = Gauge32: 4 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."1".17 = Gauge32: 13 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."1".18 = Gauge32: 1 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."1".19 = Gauge32: 13 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."1".20 = Gauge32: 0 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."1".21 = Gauge32: 10 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."1".22 = Gauge32: 1 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."1".23 = Gauge32: 8 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."1".24 = Gauge32: 0 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."3".1 = Gauge32: 4 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."3".2 = Gauge32: 1 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."3".3 = Gauge32: 3 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."3".4 = Gauge32: 7 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."3".5 = Gauge32: 14 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."3".6 = Gauge32: 2 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."3".7 = Gauge32: 11 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."3".8 = Gauge32: 1 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."3".9 = Gauge32: 8 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."3".10 = Gauge32: 1 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."3".11 = Gauge32: 10 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."3".12 = Gauge32: 1 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."3".13 = Gauge32: 15 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."3".14 = Gauge32: 3 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."3".15 = Gauge32: 14 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."3".16 = Gauge32: 2 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."3".17 = Gauge32: 8 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."3".18 = Gauge32: 3 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."3".19 = Gauge32: 11 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."3".20 = Gauge32: 8 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."3".21 = Gauge32: 10 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."3".22 = Gauge32: 1 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."3".23 = Gauge32: 10 F5-BIGIP-SYSTEM-MIB::sysMultiHostCpuUser5s."3".24 = Gauge32: 0 Thanks,799Views0likes1CommentHigh CPU utilization : tmm.0 118.0%
Hi All , Is there any option to find which process is using the more cpu , as we can see for tmm.0 it shows 118% , can we find out what causing the tmm.0 to go up ? Swap: 5242876k total, 0k used, 5242876k free, 2956952k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 18298 root 1 -19 8549m 82m 61m R 118.8 0.5 79279:13 tmm.0 21723 root 39 19 195m 128m 8720 S 74.3 0.8 1:30.87 java 19329 root 20 0 2342m 277m 24m S 53.9 1.7 3389:31 bd 13582 root 20 0 48936 29m 15m S 36.5 0.2 67574:52 bigd 19346 root 20 0 115m 46m 31m S 29.7 0.3 421:18.84 pabnagd 6989 root 20 0 177m 117m 25m S 3.6 0.7 6823:17 mcpd 17797 root 20 0 175m 151m 20m S 3.6 0.9 4175:09 asm_start 5672 root 25 5 113m 27m 22m S 1.6 0.2 3715:39 merged cpu_id five_sec_avg.ratio ------ ------------------ 0 72 1 96 2 71 3 84 -------------------------- Sys::TMM: 0.0 -------------------------- Global TMM Process Id 18298 Running TMM Id 0 TMM Count 1 CPU Id 0 Memory (bytes) Total 6.5G Used 3.3G CPU Usage Ratio (%) Last 5 Seconds 61 Last 1 Minute 59 Last 5 Minutes 57 -------------------------- Sys::TMM: 0.2 -------------------------- Global TMM Process Id 18298 Running TMM Id 2 TMM Count 1 CPU Id 2 Memory (bytes) Total 0 Used 0 CPU Usage Ratio (%) Last 5 Seconds 53 Last 1 Minute 51 Last 5 Minutes 50727Views1like1CommentHigh CPU utilisation for process 'kwolker' of even cores
Top output top - 10:47:36 up 16 days, 14:56,1 user,load average: 6.61, 6.60, 6.55 Tasks: 348 total,1 running, 344 sleeping,0 stopped,3 zombie %Cpu0: 98.3 us,1.7 sy,0.0 ni,0.0 id,0.0 wa,0.0 hi,0.0 si,0.0 st %Cpu1:4.6 us,2.6 sy,0.0 ni, 78.4 id,0.3 wa,0.0 hi, 14.1 si,0.0 st %Cpu2: 98.3 us,1.7 sy,0.0 ni,0.0 id,0.0 wa,0.0 hi,0.0 si,0.0 st %Cpu3:2.7 us,2.7 sy,0.0 ni, 94.6 id,0.0 wa,0.0 hi,0.0 si,0.0 st KiB Mem : 21515168 total,136644 free, 19177572 used,2200952 buff/cache KiB Swap:1048572 total,636212 free,412360 used.279056 avail Mem PID USERPRNIVIRTRESSHR S%CPU %MEMTIME+ COMMAND 12795 root10 -10 29651722.0g1112 S63.59.910305:47 kwolker 29071 root10 -10 29651722.0g1184 S63.19.916356:45 kwolker 20101 root10 -10827056 2641761308 S62.81.28902:54 kwolker 21192 root1 -1913.0g1.6g1.6g S10.37.9 795:25.12 tmm.0470Views0likes0CommentsVirtual BIG-IQ uses a lot of cpu
Hi Is it normal that a virtual BIG-IQ uses almost 7ghz from a esxi host. Also the cpu is always at around 80-85 % if I check the cpu under monitor on the server itself. Its only used to manage licenses and not BIG-IP systems so it should idle most of the time. Best regards Daniel451Views0likes2CommentsCPU 100% System Crash
Hi my big ip system crash any time CPU reach 100% is there any way to debug this and find what cause the hight utilization? my system is using irules, HTTP Compression (with CPU Saver), stream profile, web acceleration it always happen when the connection & throughput is Hight my system is 11.3387Views0likes2Comments