cancel
Showing results for 
Search instead for 
Did you mean: 

High CPU 91.5%wa on top command with 1 zombie proccess

RichardGamboa92
Nimbostratus
Nimbostratus

What is the problem over here?

 

0691T000008tQ7eQAE.jpg

0691T000008tQ7jQAE.png

1 REPLY 1

I supposed there might be a process in Uninterruptible sleep (D state).

 

Check which process is this: ps -eo state,pid,cmd | grep "^D

 

This zombie process is weird and you should check which process is it.

 

The output you pasted doesn't show the whole picture.

 

If it's BIG-IP, you can try running this script:

#!/usr/bin/python import commands import time import re # 300 seconds = 5 minutes MAX_TIME_RUNNING = 600 start = time.time() def get_d_processes(): ''' Returns a List of Uninterrupted Processes. Each Uninterrupted Process is a List containing state, pid and cmd ''' output = commands.getoutput("ps -eo state,pid,cmd | grep '^D'").split('\n') for index, process in enumerate(output): output[index] = process.strip().split(' ', 2) output[index] = re.sub(r' +', ' ', ' '.join(output[index])).split(' ', 2) return output def get_rw_io_by_pid(pid): ''' Returns a tuple with read_bytes and write_bytes ''' return commands.getoutput('cat /proc/%s/io' % pid) def lsof(pid): ''' Returns ''' return commands.getoutput('lsof -p %s' % pid) while True: top = open('/shared/tmp/top.txt', 'ab') top.write('--------------------------\n') top.write(commands.getoutput('date') + '\n') top.write('--------------------------\n') top.write(commands.getoutput('top -Hcbn 1') + '\n') file = open('/shared/tmp/processes.txt', 'ab') all_d_processes = get_d_processes() # if there is no process to check, we skip below code if '' != all_d_processes[0][0]: file.write('--------------------------\n') file.write(commands.getoutput('date') + '\n') file.write('--------------------------\n') file.write ('All D processes: \n') file.write(commands.getoutput("ps -eo state,pid,cmd | grep '^D'") + '\n') file.write('--------------------------\n') file.write('current top IO: \n') file.write('--------------------------\n') file.write(commands.getoutput('top -Hcbn 1 | head -6') + '\n') for process in all_d_processes: file.write('--------------------------\n') file.write('Process: \n') file.write('--------------------------\n') file.write(' '.join(process) + '\n') file.write('--------------------------\n') file.write('cat /proc/%s/io output: \n' % process[1]) file.write('--------------------------\n') file.write(get_rw_io_by_pid(process[1]) + '\n') file.write('--------------------------\n') file.write('lsof -p %s output: \n' % process[1] + '\n') file.write('--------------------------\n') file.write(lsof(process[1]) + '\n') time.sleep(10) if time.time() > start + MAX_TIME_RUNNING: file.close() top.close() break

Have a look at /shared/tmp/processes.txt and the output should be something like this:

$ cat processes.txt -------------------------- Mon Oct 22 23:19:39 CEST 2018 -------------------------- All D processes: D 1547 [kjournald] D 9134 [kjournald] D 20838 asm_config_server_rpc_handler.pl -------------------------- current top IO: -------------------------- top - 23:19:40 up 16 days, 21:35, 2 users, load average: 4.01, 4.60, 3.71 Tasks: 755 total, 7 running, 746 sleeping, 0 stopped, 2 zombie Cpu(s): 27.1%us, 5.3%sy, 3.1%ni, 61.9%id, 2.0%wa, 0.2%hi, 0.4%si, 0.0%st Mem: 16528432k total, 15194320k used, 1334112k free, 78960k buffers Swap: 1048572k total, 808448k used, 240124k free, 3821212k cached -------------------------- Process: -------------------------- D 1547 [kjournald] -------------------------- cat /proc/1547/io output: -------------------------- rchar: 0 wchar: 0 syscr: 0 syscw: 0 read_bytes: 6504448 write_bytes: 3407814656 cancelled_write_bytes: 0 -------------------------- lsof -p 1547 output: -------------------------- COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME kjournald 1547 root cwd DIR 253,17 1024 2 / kjournald 1547 root rtd DIR 253,17 1024 2 / kjournald 1547 root txt unknown