Forum Discussion
Script to run TCPDUMP when monitor goes down
Hi
We have the script here and I have created this alert to trigger it in /config/user_alert.conf but seems its not enough and something is a miss.
Any ideas please?
https://devcentral.f5.com/s/articles/run-tcpdump-on-event
alert endb_mon_down "01070638:5: Pool /Common/pool_one member /Common/10.1.62.61:0 monitor status down." {
exec command="/config/var/tmp/Pool-tshoot-script.sh";
}
Seems about right. Just tested with these settings, and it works.
[root@nielsvs-bigip:Active:Standalone] tmp # cat /config/user_alert.conf alert TEST2 "Non-existent pool member for pool /Common/demo.app/demo_adfs_pool_443" { exec command="/shared/bin/test.sh"; } [root@nielsvs-bigip:Active:Standalone] tmp #
And the script. Make sure it is executable (chmod +x <filename>). You might also want to consider to put your scripts somewhere in the /shared/ directory. The data on this partition will still be available after an upgrade.
[root@nielsvs-bigip:Active:Standalone] tmp # cat /shared/bin/test.sh #!/bin/bash TCPDUMP="/sbin/tcpdump" ${TCPDUMP} -nni 0.0:nnn -s0 -w /var/tmp/test-$$.pcap -c1 [root@nielsvs-bigip:Active:Standalone] tmp #
And to show that it works:
[root@nielsvs-bigip:Active:Standalone] tmp # ls -ltra /var/tmp/test* -rwxr-xr-x. 1 root root 22 Jul 19 2017 /var/tmp/test.sh -rw-r--r--. 1 root root 427 Jun 13 15:27 /var/tmp/test-30157.pcap -rw-r--r--. 1 root root 436 Jun 13 15:27 /var/tmp/test-31298.pcap -rw-r--r--. 1 root root 432 Jun 13 15:27 /var/tmp/test-31293.pcap -rw-r--r--. 1 root root 425 Jun 13 15:28 /var/tmp/test-3310.pcap -rw-r--r--. 1 root root 425 Jun 13 15:28 /var/tmp/test-3314.pcap -rw-r--r--. 1 root root 431 Jun 13 15:29 /var/tmp/test-7948.pcap -rw-r--r--. 1 root root 431 Jun 13 15:29 /var/tmp/test-7943.pcap -rw-r--r--. 1 root root 426 Jun 13 15:30 /var/tmp/test-12551.pcap -rw-r--r--. 1 root root 436 Jun 13 15:30 /var/tmp/test-12555.pcap -rw-r--r--. 1 root root 432 Jun 13 15:31 /var/tmp/test-17454.pcap -rw-r--r--. 1 root root 432 Jun 13 15:31 /var/tmp/test-17458.pcap -rw-r--r--. 1 root root 424 Jun 13 15:32 /var/tmp/test-21889.pcap -rw-r--r--. 1 root root 426 Jun 13 15:32 /var/tmp/test-21884.pcap -rw-r--r--. 1 root root 426 Jun 13 15:33 /var/tmp/test-26257.pcap -rw-r--r--. 1 root root 426 Jun 13 15:33 /var/tmp/test-26253.pcap -rw-r--r--. 1 root root 432 Jun 13 15:34 /var/tmp/test-30619.pcap -rw-r--r--. 1 root root 432 Jun 13 15:34 /var/tmp/test-30614.pcap [root@nielsvs-bigip:Active:Standalone] tmp #
- davidfisher
Cirrus
The timer setting is in secs or mins? What do you think?
And I am trying to trigger it with this:
logger -p local0.notice "Pool /Common/pool_one member /Common/10.1.62.61:0 monitor status down."
Should I see an SNMP TRAP for this test logger command as well?
Normally I see this when a pool mon fails:
Jun 13 14:41:24 bigip2 notice mcpd[8183]: 01070638:5: Pool /Common/gateway-failsafe member /Common/10.1.62.61:0 monitor status down. [ /Common/gateway_icmp: down; last error: /Common/gateway_icmp: N o successful responses received before deadline. @2019/06/13 14:41:24. ] [ was up for 0hr:0min:55sec ] Jun 13 14:41:25 bigip2 notice mcpd[8183]: 01070638:5: Pool /Common/auction-php-pool member /Common/10.1.62.61:80 monitor status down. [ /Common/http: down; last error: /Common/http: Unable to connec t; No successful responses received before deadline. @2019/06/13 14:41:25. ] [ was up for 0hr:0min:56sec ] Jun 13 14:41:25 bigip2 notice mcpd[8183]: 01071682:5: SNMP_TRAP: Virtual /Common/auction-http-vs has become unavailable Jun 13 14:41:25 bigip2 notice mcpd[8183]: 01071682:5: SNMP_TRAP: Virtual /Common/auction-https has become unavailable
How did you choose the msg ""Non-existent pool member for pool /Common/demo.app/demo_adfs_pool_443".
I think my msg is not matching the trap I configured, but why..?
I've just selected a message from /var/log/ltm. The error message I've used is shown every minute in my log file. So that was an easy message to test with.
- Simon_Blakely
Employee
After some frustrating experiences, I found that you cannot run tcpdump out of the alertd execution context - SELinux gets in the way and prevents access to the network devices.
And yes - it does work in some circumstances, but not reliably for all releases/platforms/situations.
I had to build out a hardware/version compatible repro to demonstrate and solve this problem when I first ran into it.
I solved it like this:
Have a startup script that creates a named pipe, and waits on the named pipe to run the tcpdump
This is running in the root context and has permission to run tcpdump.
/config/startup/monitor_down_dump.sh
#!/bin/bash NP=/var/run/monitor_down_tcpdump.pipe if [ -e $NP ]; then echo "$NP already exists; is this script already running?" exit 1 fi mkfifo $NP read x < $NP /bin/rm $NP logger -p local0.info "$x" # start a tcpdump # THIS count VALUE MAY NEED TESTING AND TUNING -nni 0.0:nnn -s0 -w /var/tmp/`uname -n`_`date +%F_%H:%M`.pcap
You also need a trigger script run from your user_alert that pushes data into the named pipe.
This runs in the alertd context and does not have permission to run tcpdump, but can push a message down the named pipe.
/shared/monitor_down_trigger.sh
#!/bin/bash NP=/var/run/monitor_down_tcpdump.pipe echo "debug_triggered" > $NP
and your user_alert.conf snippet
alert endb_mon_down "01070638:5: Pool /Common/pool_one member /Common/10.1.62.61:0 monitor status down." { exec command="/shared/monitor_down_trigger.sh"; }
For my implementation, the customer also had a cron task that checked to see if the script was still running every 10 minutes, and restarted it if it had triggered or stopped. This may or may not be required.
- davidfisher
Cirrus
I was trying the script on v12.1.
Is this workaround required for all versions? Which version are you running?
- Simon_Blakely
Employee
I developed that solution on 12.1.2, and I expect it to be required for all later versions.
It's complex, but it is reliable - just trying to run tcpdump out of user_alerts.conf may work (for example, it initially worked on my development 12.1.2 VE), but not for all cases (it didn't work on a physical 12.1.2 VCMP guest in the lab).
The solution I documented above does provide results.
However - it isn't instant, but using alertd introduces a delay anyhow.
Recent Discussions
Related Content
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com