cancel
Showing results for 
Search instead for 
Did you mean: 
Login & Join the DevCentral Connects Group to watch the Recorded LiveStream (May 12) on Basic iControl Security - show notes included.

Knowledge sharing: High CPU/Memory/Swap investigation/troubleshooting

Hello to All,

 

 

 

I will share some basic knowedge about troubleshooting and resolving high data plane or contol plane CPU. First there is an already great article, so first check it:

 

https://devcentral.f5.com/s/articles/Troubleshooting-High-CPU-Utilisation-on-F5-boxes

 

 

 

 

 

1.If the CPU 0, 2, 4 are high then it a data plane tmm issue and if the CPU 1, 3, 5 are high then it is control plane CPU issue. Please read:

 

https://support.f5.com/csp/article/K15468

 

 https://support.f5.com/csp/article/K16739

 

 

 

 

2.If the control plane CPU is constantly high just run linux "top" command see which process causes the issues and reboot the process and check for known bugs in google, askf5, the bug tracker, the release notes for known bugs and ihealth but be carefull as restarting critical processes may do some impact and if the process is not critical like the bigd then just restart it as I have seen bugs where the bigd or the snmpd have memory leakage and need restart.

 

 

https://support.f5.com/csp/article/K67197865

 

 https://support.f5.com/csp/article/K20060182

 

 

 

 

3.If the control plane CPU jumps from time to time then it is harder to catch the issue but if you see there is a patern check the logs, cron job any REST-API scripts that may run at the same time when the CPU jumps. For example the F5 ASM datasyncd may cause periodic jumps as mentioned in https://support.f5.com/csp/article/K02827102 or the ASM policy builder is enabled and learns too many thigs as mentioned in https://support.f5.com/csp/article/K58571155. Also if you see the top command has many "tmsh" processes that means there is a REST-API script that does not close the connection correctly that causes many tmsh sessions to hang causing high CPU and Memory(in this case configure tmsh timeout as it is not configured by default https://support.f5.com/csp/article/K9908). To catch an issue that runs at a random time then you may need to follow the below article or run the top command with some arguments like "top -n 10 -d 10 >> /var/tmp/top.txt" as this will run the top 10 times with interval of 10 seconds:

 

 

https://support.f5.com/csp/article/K40472403

 

 

 

 

 

4.If the CPU is high for the TMM process during peak working hours your system maybe overutlized then you may need increase the number of cores for virtual systems (if the license allows it) or VCMP (of there are free cores) or buy another device. Things like log messages in the /var/log/kern for the idle enforcer or the "clock advanced" messages in the /var/log/ltm may also indicate tmm cpu issues:

 

https://support.f5.com/csp/article/K10337613

 

https://support.f5.com/csp/article/K10095

 

https://support.f5.com/csp/article/K24427880

 

 

You my try some small optimizations like upgrading to the lates version, checking the /var/log/ltm and turning off any irule logging, forgoten TCP RST variables, resolving SSL handshakes, removing orphaned configuration objects, stopping an other debugs or modified system logging variables(better set the F5 to send the logs to external log server server with HSL as this can be done also in an irule with the "HSL::" command) etc.

 

https://support.f5.com/csp/article/K11058264

 

https://support.f5.com/csp/article/K15292

 

https://support.f5.com/csp/article/K13223

 

https://support.f5.com/csp/article/K55131641

 

https://devcentral.f5.com/s/articles/the101-irules-101-logging-amp-comments

 

 https://support.f5.com/csp/article/K15335

 

 

 

5.For memory issues don't forget that "top" command shows the memory for the date plane and "show sys memory" shows the memory for the F5 tmm subsystems. For example a bad irule is causing the "tcl" subsystem memory to go high. Also logs in /var/log/ltm for the memory sweeper are a good indication https://support.f5.com/csp/article/K13302777 and https://support.f5.com/csp/article/K15740. Also a DDOS may cause high memory so be carefull. For control plane memory don't forget that if you see many tmsh sessions opened in the top then check your REST-API scripts and automations and configure tmsh timeout as I have seen this to many times to even count. The memory for vCMP is increased by adding more cores if needed and for virtual edtions is much more easy.

 

Note: A high Other Used memory usage on the BIG-IP system Dashboard may not indicate an issue, as Linux kernel allocates memory to buffers and disk caching that can be released as needed.

 

https://support.f5.com/csp/article/K16419

https://support.f5.com/csp/article/K16562

 

 

 

Example high control plane memory:

 

https://support.f5.com/csp/article/K93325541

 

 

Examples for high tmm data plane memory:

 

K02620345

 

K13889

 

K09336400

 

K15245

 

ID633402.html

 

K44385170

 

 

 

6.For SWAP issues now you can enable the top to show you the process causing the issue or jst upload qkview to ihealth and see from there:

 

https://support.f5.com/csp/article/K40027012

 

https://support.f5.com/csp/article/K55227819

 

 

 

Also don't forget to check the hard disk as it can cause high CPU if the logs can't be written, because of full or faulty hard drive:

 

https://support.f5.com/csp/article/K93344414

 

https://support.f5.com/csp/article/K14403

0 REPLIES 0