Technical Forum
Ask questions. Discover Answers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Custom Alert Banner

High CPU 91.5%wa on top command with 1 zombie proccess

RichardGamboa92
Nimbostratus
Nimbostratus

What is the problem over here?

 

0691T000008tQ7eQAE.jpg

0691T000008tQ7jQAE.png

1 REPLY 1

I supposed there might be a process in Uninterruptible sleep (D state).

 

Check which process is this: ps -eo state,pid,cmd | grep "^D

 

This zombie process is weird and you should check which process is it.

 

The output you pasted doesn't show the whole picture.

 

If it's BIG-IP, you can try running this script:

#!/usr/bin/python
 
import commands
import time
import re
 
# 300 seconds = 5 minutes
MAX_TIME_RUNNING = 600
start = time.time()
 
def get_d_processes():
    '''
    Returns a List of Uninterrupted Processes.
    Each Uninterrupted Process is a List containing state, pid and cmd
    '''
    output = commands.getoutput("ps -eo state,pid,cmd | grep '^D'").split('\n')
    for index, process in enumerate(output):
        output[index] = process.strip().split(' ', 2)
        output[index] = re.sub(r' +', ' ', ' '.join(output[index])).split(' ', 2)
    return output
 
def get_rw_io_by_pid(pid):
    '''
    Returns a tuple with read_bytes and write_bytes
    '''
    return commands.getoutput('cat /proc/%s/io' % pid)
 
def lsof(pid):
    '''
    Returns
    '''
    return commands.getoutput('lsof -p %s' % pid)
 
while True:
    top = open('/shared/tmp/top.txt', 'ab')
    top.write('--------------------------\n')
    top.write(commands.getoutput('date') + '\n')
    top.write('--------------------------\n')
    top.write(commands.getoutput('top -Hcbn 1') + '\n')
    file = open('/shared/tmp/processes.txt', 'ab')
    all_d_processes = get_d_processes()
    # if there is no process to check, we skip below code
    if '' != all_d_processes[0][0]:
        file.write('--------------------------\n')
        file.write(commands.getoutput('date') + '\n')
        file.write('--------------------------\n')
        file.write ('All D processes: \n')
        file.write(commands.getoutput("ps -eo state,pid,cmd | grep '^D'") + '\n')
        file.write('--------------------------\n')
        file.write('current top IO: \n')
        file.write('--------------------------\n')
        file.write(commands.getoutput('top -Hcbn 1 | head -6') + '\n')
        for process in all_d_processes:
            file.write('--------------------------\n')
            file.write('Process: \n')
            file.write('--------------------------\n')
            file.write(' '.join(process) + '\n')
            file.write('--------------------------\n')
            file.write('cat /proc/%s/io output: \n' % process[1])
            file.write('--------------------------\n')
            file.write(get_rw_io_by_pid(process[1]) + '\n')
            file.write('--------------------------\n')
            file.write('lsof -p %s output: \n' % process[1] + '\n')
            file.write('--------------------------\n')
            file.write(lsof(process[1]) + '\n')
    time.sleep(10)
    if time.time() > start + MAX_TIME_RUNNING:
        file.close()
        top.close()
        break

Have a look at /shared/tmp/processes.txt and the output should be something like this:

$ cat processes.txt
--------------------------
Mon Oct 22 23:19:39 CEST 2018
--------------------------
All D processes:
D  1547 [kjournald]
D  9134 [kjournald]
D 20838 asm_config_server_rpc_handler.pl
--------------------------
current top IO:
--------------------------
top - 23:19:40 up 16 days, 21:35,  2 users,  load average: 4.01, 4.60, 3.71
Tasks: 755 total,   7 running, 746 sleeping,   0 stopped,   2 zombie
Cpu(s): 27.1%us,  5.3%sy,  3.1%ni, 61.9%id,  2.0%wa,  0.2%hi,  0.4%si,  0.0%st
Mem:  16528432k total, 15194320k used,  1334112k free,    78960k buffers
Swap:  1048572k total,   808448k used,   240124k free,  3821212k cached
 
--------------------------
Process:
--------------------------
D 1547 [kjournald]
--------------------------
cat /proc/1547/io output:
--------------------------
rchar: 0
wchar: 0
syscr: 0
syscw: 0
read_bytes: 6504448
write_bytes: 3407814656
cancelled_write_bytes: 0
--------------------------
lsof -p 1547 output:
 
--------------------------
COMMAND    PID USER   FD      TYPE DEVICE SIZE/OFF NODE NAME
kjournald 1547 root  cwd       DIR 253,17     1024    2 /
kjournald 1547 root  rtd       DIR 253,17     1024    2 /
kjournald 1547 root  txt   unknown