BIGIP LTM Automated Pool Monitor Flap Troubleshooting Script in Bash

Problem this snippet solves:

A bash script is mainly for collecting data when F5 BIG-IP LTM pool member monitor flaps in a period of time and help determine the Root Cause of BIGIP monitor health check failure; Script will monitor the LTM logs, if new pool member down message event occurs, script will perform following functions: 1. Turn on LTM bigd debug ; 2. Start to tcpdump capture to capture relevant traffics; 3. Turn off bigd debug and terminate tcpdump process when timer elapse (timer is configurable) 4. Generate qkview (optinal) 5. Tar ball full logs files under /var/log/ directory (optinal)

Script has been tested on v11.x

Code :

#!/usr/bin/bash
##########identify the log file that script is monitoring 
filename="/var/log/ltm"
##########identify the period of time that debug and tcpdump are running, please change it according to the needs;
timer=60
##########IP address of pool member flaps 
poolMemberIP="10.10.10.229"
##########self IP address of LTM is usd to send LTM Health Monitor traffics
ltmSelfip="10.10.10.248"
##########pool member service port number
poolMemberPort="443"
##########TMOS command to turn on bigd debug 
turnonBigdDebug="tmsh modify sys db bigd.debug value enable"
##########TMOS command to turn off bigd debug 
turnoffBigdDebug="tmsh modify sys db bigd.debug value disable"
##########BASH command to tar BIGIP log files 
tarLogs="tar -czpf /var/tmp/logfiles.tar.gz /var/log/*"



####### function file check: following code will check if /var/log/ltm exist on the system, 
####### if it exists, script will be running and perform subsequent functions 
if [ -f $filename ]
then
      echo "/var/log/ltm exists and program is running to collect data when BG-IP pool member flaps"
else
####### if it does not exist, programe will be terminated and log following message 
      echo "no /var/log/ltm file found and program is terminated"
exit 0
fi
####### function file check ends

###### write timestap to /var/log/ltm for tracking purpose
echo "$(date) monitoring the log" >> $filename

###### start to monitor the /var/log/ltm for new events 
tail -f -n 0 $filename | while read -r line
do

###### counter for pool down message appears
hit=$(echo "$line" | grep -c "$poolMemberIP:$poolMemberPort monitor status down")

#echo $hit
###### 
if [ "$hit" == "1" ];
   then
###### diplay the pool down log event in file /var/log/ltm  
   echo $line
###### show timestamp of debug is on 
   echo "$(date) Turning on system bigddebug"
###### turn on bigd debug 
   echo $($turnonBigdDebug)
###### turn on tcpdump capture 
   echo $(tcpdump -ni 0.0:nnn -s0 -w /var/tmp/Monitor.pcap port $poolMemberPort and \(host $poolMemberIP and host $ltmSelfip\)) &
###### running timer 
   sleep $timer
###### show timestamp of debug is off 
   echo "$(date) Truning off system bigddebug"
###### turn off bigd debug 
   echo $($turnoffBigdDebug)
###### terminate tcpdump process 
   echo $(killall tcpdump)
###### generate qkview, it's an optional function, enable it by remove "#" sign
   #echo $(qkview)
###### tar log files, it's an optional function, enable it by remove "#" sign
   #echo $($tarLogs)
   break

#else
    #echo "Monitor in progress"
fi
done
###### show message that programe is end 
echo "$(date) exiting from programe"

###### exit from the program 
exit 0

Tested this on version:

11.6
Published Jun 18, 2015
Version 1.0
  • davidfisher the bash sleep command value is in seconds.

    Vasim the script would need to be called from an iCall action tied to an event trigger. There are examples of this, just search icall. I also have an article that walks through an example, though you won't need iRules for this.

    • the full logs will be in /var/tmp, should include the window under duress for analysis
    • it will run whenever triggered, but in your iCall script that executes it, you can select windows to avoid
    • anytime you run tcpdump it impacts the system, could be small could be large, depending on your filters and how much data you are writing to disk in the window it's running.
  • how to run this script in Production ? 

     

    where we will collect report ? whether there is any impact in working enivorment ? is it allowed to run during business hours ?

  • There's culprit on this script, By default, LTM will rotate the LTM logs and following command: "tail -f -n 0 $filename | while read -r line" which track the file according to the file descriptor (inode); so once file get rotated, script will still track the old file; change the command to "tail -F -n 0 $filename | while read -r line" should be able to overcome this issue, as "-F" will track the file according to the name only. Cheers Best Regards Kiozs