30-May-2020 12:09
01-Jun-2020
00:02
- last edited on
21-Nov-2022
16:22
by
JimmyPackets
I supposed there might be a process in Uninterruptible sleep (D state).
Check which process is this: ps -eo state,pid,cmd | grep "^D
This zombie process is weird and you should check which process is it.
The output you pasted doesn't show the whole picture.
If it's BIG-IP, you can try running this script:
#!/usr/bin/python
import commands
import time
import re
# 300 seconds = 5 minutes
MAX_TIME_RUNNING = 600
start = time.time()
def get_d_processes():
'''
Returns a List of Uninterrupted Processes.
Each Uninterrupted Process is a List containing state, pid and cmd
'''
output = commands.getoutput("ps -eo state,pid,cmd | grep '^D'").split('\n')
for index, process in enumerate(output):
output[index] = process.strip().split(' ', 2)
output[index] = re.sub(r' +', ' ', ' '.join(output[index])).split(' ', 2)
return output
def get_rw_io_by_pid(pid):
'''
Returns a tuple with read_bytes and write_bytes
'''
return commands.getoutput('cat /proc/%s/io' % pid)
def lsof(pid):
'''
Returns
'''
return commands.getoutput('lsof -p %s' % pid)
while True:
top = open('/shared/tmp/top.txt', 'ab')
top.write('--------------------------\n')
top.write(commands.getoutput('date') + '\n')
top.write('--------------------------\n')
top.write(commands.getoutput('top -Hcbn 1') + '\n')
file = open('/shared/tmp/processes.txt', 'ab')
all_d_processes = get_d_processes()
# if there is no process to check, we skip below code
if '' != all_d_processes[0][0]:
file.write('--------------------------\n')
file.write(commands.getoutput('date') + '\n')
file.write('--------------------------\n')
file.write ('All D processes: \n')
file.write(commands.getoutput("ps -eo state,pid,cmd | grep '^D'") + '\n')
file.write('--------------------------\n')
file.write('current top IO: \n')
file.write('--------------------------\n')
file.write(commands.getoutput('top -Hcbn 1 | head -6') + '\n')
for process in all_d_processes:
file.write('--------------------------\n')
file.write('Process: \n')
file.write('--------------------------\n')
file.write(' '.join(process) + '\n')
file.write('--------------------------\n')
file.write('cat /proc/%s/io output: \n' % process[1])
file.write('--------------------------\n')
file.write(get_rw_io_by_pid(process[1]) + '\n')
file.write('--------------------------\n')
file.write('lsof -p %s output: \n' % process[1] + '\n')
file.write('--------------------------\n')
file.write(lsof(process[1]) + '\n')
time.sleep(10)
if time.time() > start + MAX_TIME_RUNNING:
file.close()
top.close()
break
Have a look at /shared/tmp/processes.txt and the output should be something like this:
$ cat processes.txt
--------------------------
Mon Oct 22 23:19:39 CEST 2018
--------------------------
All D processes:
D 1547 [kjournald]
D 9134 [kjournald]
D 20838 asm_config_server_rpc_handler.pl
--------------------------
current top IO:
--------------------------
top - 23:19:40 up 16 days, 21:35, 2 users, load average: 4.01, 4.60, 3.71
Tasks: 755 total, 7 running, 746 sleeping, 0 stopped, 2 zombie
Cpu(s): 27.1%us, 5.3%sy, 3.1%ni, 61.9%id, 2.0%wa, 0.2%hi, 0.4%si, 0.0%st
Mem: 16528432k total, 15194320k used, 1334112k free, 78960k buffers
Swap: 1048572k total, 808448k used, 240124k free, 3821212k cached
--------------------------
Process:
--------------------------
D 1547 [kjournald]
--------------------------
cat /proc/1547/io output:
--------------------------
rchar: 0
wchar: 0
syscr: 0
syscw: 0
read_bytes: 6504448
write_bytes: 3407814656
cancelled_write_bytes: 0
--------------------------
lsof -p 1547 output:
--------------------------
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
kjournald 1547 root cwd DIR 253,17 1024 2 /
kjournald 1547 root rtd DIR 253,17 1024 2 /
kjournald 1547 root txt unknown