Better UP/DOWN Poolmember alerts with node name + pool name
I am kind of amazed that the F5 user community is ok with just receiving the IP:PORT information when a poolmember goes down. Our support teams would not have been ok with that so I ended up building a tool that re-logs/re-alerts using the name defined for the node and the poolname and also the default IP:PORT.
I used a combination of tools that allow me to send more informative SNMP traps and syslog messages. Below is an example
Oct 21 22:49:37 dal-lan-ltm1 warning logger: POOLMEMBER DOWN: serverA-webapp-m (172.25.1.100:80)
In the message above that also gets sent to our SNMP trap host, serverA is the nodename defined for the node that maps to 172.25.1.100, and webapp-m is the poolname.
To accomplish this I had to do a few things:
1. Enable monitor status changes in the syslog
2. Add an automatic startup of action_on_log.pl (from DevCentral)
3. Schedule a daily crontab job to have a perl script (create_nodemap.pl) parse the bigip.conf and create a mapping table of node IPs to node names.
4. Setup action_on_log.pl to call another script (relog.pl) that uses the messages from the monitor status changes to determine the poolname and node name and then re-syslogs and re-traps the more informative message.
Anyhow, I was just wondering how other people did it and interested if there were others out there that had a simliar challenge. I was very surprised that this capability wasn't something that was native within F5 and also that the end-user community didn't already have a solution for.
If anyone is interested in the code I can certainly post it here but since I am not a perl expert I can't say it is perfect. Although it is supporting our 20 F5 3900's now for about a month.
Oh one more thing, I have to admit that my awesome F5 support engineer helped me out a lot on getting moving in the right direction with our solution.