Knowledge sharing: High CPU/Memory/Swap investigation/troubleshooting

I will share some basic knowedge about troubleshooting and resolving high data plane or contol plane CPU.
First there is an already great article, so first check it:

https://devcentral.f5.com/s/articles/Troubleshooting-High-CPU-Utilisation-on-F5-boxes

If the CPU 0, 2, 4 are high then it is a data plane tmm issue and if the CPU 1, 3, 5 are high then it is control plane CPU issue. Please read:

If the control plane CPU is constantly high just run linux "top" command see which process causes the issues and reboot the process and check for known bugs in google, askf5, the bug tracker, the release notes for known bugs and ihealth but be carefull as restarting critical processes may do some impact and if the process is not critical like the bigd then just restart it as I have seen bugs where the bigd or the snmpd have memory leakage and need restart.

If the control plane CPU jumps from time to time then it is harder to catch the issue but if you see there is a pattern check the logs, cron job any REST-API scripts that may run at the same time when the CPU jumps. For example the F5 ASM datasyncd may cause periodic jumps as mentioned in https://support.f5.com/csp/article/K02827102 or the ASM policy builder is enabled and learns too many thigs as mentioned in https://support.f5.com/csp/article/K58571155.

Also if you see the top command has many "tmsh" processes that means there is a REST-API script that does not close the connection correctly that causes many tmsh sessions to hang causing high CPU and Memory(in this case configure tmsh timeout as it is not configured by default https://support.f5.com/csp/article/K9908). To catch an issue that runs at a random time then you may need to follow the below article or run the top command with some arguments like "top -n 10 -d 10 >> /var/tmp/top.txt" as this will run the top 10 times with interval of 10 seconds:

https://support.f5.com/csp/article/K40472403

If the CPU is high for the TMM process during peak working hours your system maybe overutlized then you may need increase the number of cores for virtual systems (if the license allows it) or VCMP (of there are free cores) or buy another device. Things like log messages in the /var/log/kern for the idle enforcer or the "clock advanced" messages in the /var/log/ltm may also indicate tmm cpu issues:

You may try some small optimizations like:

upgrading to the latest version,
checking the /var/log/ltm and turning off any irule logging
forgotten TCP RST variables, resolving SSL handshakes
removing orphaned configuration objects
stopping any other debugs or modified system logging variables
(better set the F5 to send the logs to external log server server with HSL as this can be done also in an irule with the "HSL::" command) etc.
https://support.f5.com/csp/article/K11058264
https://support.f5.com/csp/article/K15292
https://support.f5.com/csp/article/K13223
https://support.f5.com/csp/article/K55131641
https://devcentral.f5.com/s/articles/the101-irules-101-logging-amp-comments
https://support.f5.com/csp/article/K15335

For memory issues don't forget that "top" command shows the memory for the date plane and "show sys memory" shows the memory for the F5 tmm subsystems. For example a bad irule is causing the "tcl" subsystem memory to go high. Also logs in /var/log/ltm for the memory sweeper are a good indication https://support.f5.com/csp/article/K13302777 and https://support.f5.com/csp/article/K15740. Also a DDOS may cause high memory so be carefull. For control plane memory don't forget that if you see many tmsh sessions opened in the top then check your REST-API scripts and automations and configure tmsh timeout as I have seen this to many times to even count. The memory for vCMP is increased by adding more cores if needed and for virtual edtions is much more easy.

Note: A high Other Used memory usage on the BIG-IP system Dashboard may not indicate an issue, as Linux kernel allocates memory to buffers and disk caching that can be released as needed.

Example high control plane memory: