BIGIP LTM Automated Pool Monitor Flap Troubleshooting Script in Bash
Problem this snippet solves:
A bash script is mainly for collecting data when F5 BIG-IP LTM pool member monitor flaps in a period of time and help determine the Root Cause of BIGIP monitor health check failure; Script will monitor the LTM logs, if new pool member down message event occurs, script will perform following functions: 1. Turn on LTM bigd debug ; 2. Start to tcpdump capture to capture relevant traffics; 3. Turn off bigd debug and terminate tcpdump process when timer elapse (timer is configurable) 4. Generate qkview (optinal) 5. Tar ball full logs files under /var/log/ directory (optinal)
Script has been tested on v11.x
Code :
#!/usr/bin/bash
##########identify the log file that script is monitoring
filename="/var/log/ltm"
##########identify the period of time that debug and tcpdump are running, please change it according to the needs;
timer=60
##########IP address of pool member flaps
poolMemberIP="10.10.10.229"
##########self IP address of LTM is usd to send LTM Health Monitor traffics
ltmSelfip="10.10.10.248"
##########pool member service port number
poolMemberPort="443"
##########TMOS command to turn on bigd debug
turnonBigdDebug="tmsh modify sys db bigd.debug value enable"
##########TMOS command to turn off bigd debug
turnoffBigdDebug="tmsh modify sys db bigd.debug value disable"
##########BASH command to tar BIGIP log files
tarLogs="tar -czpf /var/tmp/logfiles.tar.gz /var/log/*"
####### function file check: following code will check if /var/log/ltm exist on the system,
####### if it exists, script will be running and perform subsequent functions
if [ -f $filename ]
then
echo "/var/log/ltm exists and program is running to collect data when BG-IP pool member flaps"
else
####### if it does not exist, programe will be terminated and log following message
echo "no /var/log/ltm file found and program is terminated"
exit 0
fi
####### function file check ends
###### write timestap to /var/log/ltm for tracking purpose
echo "$(date) monitoring the log" >> $filename
###### start to monitor the /var/log/ltm for new events
tail -f -n 0 $filename | while read -r line
do
###### counter for pool down message appears
hit=$(echo "$line" | grep -c "$poolMemberIP:$poolMemberPort monitor status down")
#echo $hit
######
if [ "$hit" == "1" ];
then
###### diplay the pool down log event in file /var/log/ltm
echo $line
###### show timestamp of debug is on
echo "$(date) Turning on system bigddebug"
###### turn on bigd debug
echo $($turnonBigdDebug)
###### turn on tcpdump capture
echo $(tcpdump -ni 0.0:nnn -s0 -w /var/tmp/Monitor.pcap port $poolMemberPort and \(host $poolMemberIP and host $ltmSelfip\)) &
###### running timer
sleep $timer
###### show timestamp of debug is off
echo "$(date) Truning off system bigddebug"
###### turn off bigd debug
echo $($turnoffBigdDebug)
###### terminate tcpdump process
echo $(killall tcpdump)
###### generate qkview, it's an optional function, enable it by remove "#" sign
#echo $(qkview)
###### tar log files, it's an optional function, enable it by remove "#" sign
#echo $($tarLogs)
break
#else
#echo "Monitor in progress"
fi
done
###### show message that programe is end
echo "$(date) exiting from programe"
###### exit from the program
exit 0Tested this on version:
11.66 Comments
- linjing
Employee
Thanks for sharing, it's useful. - Kiozs_131042
Altocumulus
There's culprit on this script, By default, LTM will rotate the LTM logs and following command: "tail -f -n 0 $filename | while read -r line" which track the file according to the file descriptor (inode); so once file get rotated, script will still track the old file; change the command to "tail -F -n 0 $filename | while read -r line" should be able to overcome this issue, as "-F" will track the file according to the name only. Cheers Best Regards Kiozs - davidfisher
Cirrus
Is the timer in mins or secs ?
- Vasim
Altocumulus
how to run this script in Production ?
where we will collect report ? whether there is any impact in working enivorment ? is it allowed to run during business hours ?
Vasim & davidfisher - this codeshare has been around a while and it looks like Kiozs_131042 may not be registered on DevCentral anymore. I'll see if I can find someone to address your questions. Thanks!
- JRahm
Admin
davidfisher the bash sleep command value is in seconds.
Vasim the script would need to be called from an iCall action tied to an event trigger. There are examples of this, just search icall. I also have an article that walks through an example, though you won't need iRules for this.
- the full logs will be in /var/tmp, should include the window under duress for analysis
- it will run whenever triggered, but in your iCall script that executes it, you can select windows to avoid
- anytime you run tcpdump it impacts the system, could be small could be large, depending on your filters and how much data you are writing to disk in the window it's running.