You Want Action on a Threshold Violation? Use iCall!
iCall has been around since the 11.4 release, yet there seems to be a prevailing gap in awareness of this amazing functionality in BIG-IP. A blog I wrote last year covers the overview of the iCall system, but in brief, it provides event-based automation. The events can be periodic (like cron functionality,) perpetual (watching for something like a file to appear in a directory,) or triggered by an alert (like a pool member failure.)
Late last week I was at the mother ship (F5 Corporate in Seattle) and found this question in Q&A (paraphrased):
What is a good method for toggling interface 1.1 if active pool members in a pool falls below 70%?
My mind went immediately to iCall, as this is a perfect use case. It binds an event (a pool's active members falling below a threshold) to a task (disable an interface.) I didn't have time to flesh out the solution last week, but I dropped some (errant) code in the thread to point the original poster (Lee) down the right path. Flash forward to this week, and I was intrigued enough about the solution I thought I'd take a crack at making it work.
Building Out the Solution
Given that Lee set a threshold of 70% of active pool members, I figured a test pool of four members would be a good candidate since failing one member would be just over the threshold at 75% whereas failing a second member would take me to 50%. I suppose a pool of three members would have been equally fine, but I like to see that some failure doesn't force an accidental event. So I fired up my test BIG-IP device and a linux vm with several interface aliases and built a pool with four members.
ltm pool pool4 { members { 192.168.101.10:80 { address 192.168.101.10 session monitor-enabled state up } 192.168.101.20:80 { address 192.168.101.20 session monitor-enabled state up } 192.168.101.21:80 { address 192.168.101.21 session monitor-enabled state up } 192.168.101.22:80 { address 192.168.101.22 session monitor-enabled state up } } monitor http }
Next, I needed to build the iCall script. An iCall script is just a tmsh script stored in a specific section of the configuration. It's tcl just like tmsh. But what does the script need to do? Well, a few things:
- Define the pool of interest
- Set the total number of pool members
- Set the number of available members
- Do math
- Enable/Disable the interface based on the result of that math
Steps 1, 4, & 5 are pretty self explanatory. In tmsh scripting, setting an interface (and most other tmsh-based commands) look nearly identical to the shell command.
#tmsh tmsh modify /net interface 1.1 disabled #tmsh script tmsh::modify /net interface 1.1 disabled
Where it gets tricky is figuring out how to get pool member data. This is where the tmsh::get_status and tmsh::get_field_value commands come into play. Everything is object based in tmsh, and it can be a little overwhelming to figure out how to address the objects. If you were to just run the commands below in a script, the resulting output (in /var/tmp/scriptd.out) shows you the nomenclature of the addressable objects in that data.
set pn "/Common/pool4" set pooldata [tmsh::get_status /ltm pool $pn detail] puts $data #data set ltm pool pool4 { active-member-cnt 4 connq-all.age-edm 0 connq-all.age-ema 0 connq-all.age-head 0 connq-all.age-max 0 connq-all.depth 0 connq-all.serviced 0 connq.age-edm 0 connq.age-ema 0 connq.age-head 0 connq.age-max 0 connq.depth 0 connq.serviced 0 cur-sessions 0 members { 192.168.101.10:80 { addr 192.168.101.10 connq.age-edm 0 connq.age-ema 0 connq.age-head 0 connq.age-max 0 connq.depth 0 connq.serviced 0 cur-sessions 0 monitor-rule http (pool monitor) monitor-status up node-name 192.168.101.10 nodes { 192.168.101.10 { addr 192.168.101.10 cur-sessions 0 monitor-rule none monitor-status unchecked ...continued...
So I get to the pool member data by first getting the pool data. And the data needed for pool member availability is the availability-state and the enabled-state from the pool member data (incomplete view of data shown below, but the necessary information is there.)
members 192.168.101.22:80 { addr 192.168.101.22 monitor-rule http (pool monitor) monitor-status up node-name 192.168.101.22 nodes { 192.168.101.22 { addr 192.168.101.22 cur-sessions 0 monitor-rule none monitor-status unchecked name 192.168.101.22 session-status enabled status.availability-state unknown status.enabled-state enabled status.status-reason tot-requests 0 } } pool-name pool4 port 80 session-status enabled status.availability-state available status.enabled-state enabled status.status-reason Pool member is available }
Now that the data set is known, the script can be completed. Note that to get to particular state information bolded above, I just set those attributes against the member in the tmsh::get_field_value commands bolded below. The math part is simple, though to get floating point, the .0 is added to the $usable count variable in the expression. Logging statements and puts commands (sending data to /var/tmp/scriptd.out for debugging) added to the script for demonstration purposes.
sys icall script poolCheck.v1.0.0 { app-service none definition { set pn "/Common/pool4" set total 0 set usable 0 foreach obj [tmsh::get_status /ltm pool $pn detail] { puts $obj foreach member [tmsh::get_field_value $obj members] { puts $member incr total if { [tmsh::get_field_value $member status.availability-state] == "available" && \ [tmsh::get_field_value $member status.enabled-state] == "enabled" } { incr usable } } } if { [expr $usable.0 / $total] < 0.7 } { tmsh::log "Not enough pool members in pool $pn, interface 1.3 disabled" tmsh::modify /net interface 1.3 disabled } else { tmsh::log "Enough pool members in pool $pn, interface 1.3 enabled" tmsh::modify /net interface 1.3 enabled } } description none events none }
Now that the script is complete, I just need to create the handler. A triggered handler could be created to run the script every time a pool member alert happens (as configured in /config/user_alert.conf,) but for demo purposes I used a periodic handler with a 60 second interval.
sys icall handler periodic poolCheck.v1.0.0 { first-occurrence 2014-09-16:11:00:00 interval 60 script poolCheck.v1.0.0 }
Configuration complete, moving on to test!
Testing the Solution
To test, I activated the vm instance in my lab and validated that my BIG-IP interfaces and pool members were up. Then, I shut down one apache virtual ahead of the first period at 11:26, and since I had 75% availability the interface remained enabled. Next, I shut down the second apache virtual, dropping availability to 50%. At 11:27, the BIG-IP interface was deactivated. Finally, I re-enabled the apache virtuals and at the next period the BIG-IP interface was reactivated. Log files and ping test to that interface shown below.
# Log Files Sep 16 11:25:43 Pool /Common/pool4 member /Common/192.168.101.21:80 monitor status down. Sep 16 11:26:00 Enough pool members in pool /Common/pool4, interface 1.3 enabled Sep 16 11:26:26 Pool /Common/pool4 member /Common/192.168.101.22:80 monitor status down. Sep 16 11:27:00 Not enough pool members in pool /Common/pool4, interface 1.3 disabled Sep 16 11:27:32 Pool /Common/pool4 member /Common/192.168.101.21:80 monitor status up. Sep 16 11:27:36 Pool /Common/pool4 member /Common/192.168.101.22:80 monitor status up. Sep 16 11:28:01 Enough pool members in pool /Common/pool4, interface 1.3 enabled # Ping Test to Interface 1.3 Reply from 10.10.10.5: bytes=32 time=1ms TTL=255 Reply from 10.10.10.5: bytes=32 time=1ms TTL=255 Reply from 10.10.10.5: bytes=32 time=1ms TTL=255 Request timed out. Reply from 10.10.10.205: Destination host unreachable. Reply from 10.10.10.205: Destination host unreachable. Reply from 10.10.10.205: Destination host unreachable. Reply from 10.10.10.205: Destination host unreachable. Reply from 10.10.10.205: Destination host unreachable. Reply from 10.10.10.205: Destination host unreachable. Reply from 10.10.10.205: Destination host unreachable. Reply from 10.10.10.205: Destination host unreachable. Reply from 10.10.10.205: Destination host unreachable. Reply from 10.10.10.205: Destination host unreachable. Reply from 10.10.10.205: Destination host unreachable. Reply from 10.10.10.205: Destination host unreachable. Reply from 10.10.10.205: Destination host unreachable. Reply from 10.10.10.205: Destination host unreachable. Reply from 10.10.10.205: Destination host unreachable. Reply from 10.10.10.205: Destination host unreachable. Reply from 10.10.10.205: Destination host unreachable. Reply from 10.10.10.205: Destination host unreachable. Reply from 10.10.10.5: bytes=32 time=1000ms TTL=255 Reply from 10.10.10.5: bytes=32 time=1ms TTL=255 Reply from 10.10.10.5: bytes=32 time=1ms TTL=255 Reply from 10.10.10.5: bytes=32 time=1ms TTL=255 Reply from 10.10.10.5: bytes=32 time=1ms TTL=255 Reply from 10.10.10.5: bytes=32 time=1ms TTL=255 Reply from 10.10.10.5: bytes=32 time=1ms TTL=255
One note from this solution, don't rely on the GUI or CLI status of the interface (known tested versions in 11.5.x+. Bug 471860 catalogs the reporting issue on BIG-IP for the interface status. At boot time, if the interface is up it reports as ENABLED, but if you disable and then re-enable, it reports as DISABLED even though it will be up and passing traffic.
Dig into iCall!
iCall (and tmsh more generally) is tremendously powerful, take a look at several other use cases already in the iCall codeshare! This solution has been added to the codeshare as well.