You Want Action on a Threshold Violation? Use iCall!

iCall has been around since the 11.4 release, yet there seems to be a prevailing gap in awareness of this amazing functionality in BIG-IP. A blog I wrote last year covers the overview of the iCall system, but in brief, it provides event-based automation. The events can be periodic (like cron functionality,) perpetual (watching for something like a file to appear in a directory,) or triggered by an alert (like a pool member failure.)

Late last week I was at the mother ship (F5 Corporate in Seattle) and found this question in Q&A (paraphrased):

What is a good method for toggling interface 1.1 if active pool members in a pool falls below 70%?

My mind went immediately to iCall, as this is a perfect use case. It binds an event (a pool's active members falling below a threshold) to a task (disable an interface.) I didn't have time to flesh out the solution last week, but I dropped some (errant) code in the thread to point the original poster (Lee) down the right path. Flash forward to this week, and I was intrigued enough about the solution I thought I'd take a crack at making it work.

Building Out the Solution

Given that Lee set a threshold of 70% of active pool members, I figured a test pool of four members would be a good candidate since failing one member would be just over the threshold at 75% whereas failing a second member would take me to 50%. I suppose a pool of three members would have been equally fine, but I like to see that some failure doesn't force an accidental event. So I fired up my test BIG-IP device and a linux vm with several interface aliases and built a pool with four members.

ltm pool pool4 {
    members {
        192.168.101.10:80 {
            address 192.168.101.10
            session monitor-enabled
            state up
        }
        192.168.101.20:80 {
            address 192.168.101.20
            session monitor-enabled
            state up
        }
        192.168.101.21:80 {
            address 192.168.101.21
            session monitor-enabled
            state up
        }
        192.168.101.22:80 {
            address 192.168.101.22
            session monitor-enabled
            state up
        }
    }
    monitor http
}

Next, I needed to build the iCall script. An iCall script is just a tmsh script stored in a specific section of the configuration. It's tcl just like tmsh. But what does the script need to do? Well, a few things:

Define the pool of interest
Set the total number of pool members
Set the number of available members
Do math
Enable/Disable the interface based on the result of that math

Steps 1, 4, & 5 are pretty self explanatory. In tmsh scripting, setting an interface (and most other tmsh-based commands) look nearly identical to the shell command.

#tmsh
tmsh modify /net interface 1.1 disabled
#tmsh script
tmsh::modify /net interface 1.1 disabled

Where it gets tricky is figuring out how to get pool member data. This is where the tmsh::get_status and tmsh::get_field_value commands come into play. Everything is object based in tmsh, and it can be a little overwhelming to figure out how to address the objects. If you were to just run the commands below in a script, the resulting output (in /var/tmp/scriptd.out) shows you the nomenclature of the addressable objects in that data.

set pn "/Common/pool4"
set pooldata [tmsh::get_status /ltm pool $pn detail]
puts $data

#data set
ltm pool pool4 {
    active-member-cnt 4
    connq-all.age-edm 0
    connq-all.age-ema 0
    connq-all.age-head 0
    connq-all.age-max 0
    connq-all.depth 0
    connq-all.serviced 0
    connq.age-edm 0
    connq.age-ema 0
    connq.age-head 0
    connq.age-max 0
    connq.depth 0
    connq.serviced 0
    cur-sessions 0
    members {
        192.168.101.10:80 {
            addr 192.168.101.10
            connq.age-edm 0
            connq.age-ema 0
            connq.age-head 0
            connq.age-max 0
            connq.depth 0
            connq.serviced 0
            cur-sessions 0
            monitor-rule http (pool monitor)
            monitor-status up
            node-name 192.168.101.10
            nodes {
                192.168.101.10 {
                    addr 192.168.101.10
                    cur-sessions 0
                    monitor-rule none
                    monitor-status unchecked
...continued...

So I get to the pool member data by first getting the pool data. And the data needed for pool member availability is the availability-state and the enabled-state from the pool member data (incomplete view of data shown below, but the necessary information is there.)

members 192.168.101.22:80 {
    addr 192.168.101.22
    monitor-rule http (pool monitor)
    monitor-status up
    node-name 192.168.101.22
    nodes {
        192.168.101.22 {
            addr 192.168.101.22
            cur-sessions 0
            monitor-rule none
            monitor-status unchecked
            name 192.168.101.22
            session-status enabled
            status.availability-state unknown
            status.enabled-state enabled
            status.status-reason
            tot-requests 0
        }
    }
    pool-name pool4
    port 80
    session-status enabled
    status.availability-state available
    status.enabled-state enabled
    status.status-reason Pool member is available
}

Now that the data set is known, the script can be completed. Note that to get to particular state information bolded above, I just set those attributes against the member in the tmsh::get_field_value commands bolded below. The math part is simple, though to get floating point, the .0 is added to the $usable count variable in the expression. Logging statements and puts commands (sending data to /var/tmp/scriptd.out for debugging) added to the script for demonstration purposes.

sys icall script poolCheck.v1.0.0 {
    app-service none
    definition {
        set pn "/Common/pool4"
        set total 0
        set usable 0
        foreach obj [tmsh::get_status /ltm pool $pn detail] {
            puts $obj
            foreach member [tmsh::get_field_value $obj members] {
                puts $member
                incr total
                if { [tmsh::get_field_value $member status.availability-state] == "available" && \
                     [tmsh::get_field_value $member status.enabled-state] == "enabled" } {
                         incr usable
                }
            }
        }
        if { [expr $usable.0 / $total] < 0.7 } {
            tmsh::log "Not enough pool members in pool $pn, interface 1.3 disabled"
            tmsh::modify /net interface 1.3 disabled
        } else {
            tmsh::log "Enough pool members in pool $pn,  interface 1.3 enabled"
            tmsh::modify /net interface 1.3 enabled
        }
    }
    description none
    events none
}

Now that the script is complete, I just need to create the handler. A triggered handler could be created to run the script every time a pool member alert happens (as configured in /config/user_alert.conf,) but for demo purposes I used a periodic handler with a 60 second interval.

sys icall handler periodic poolCheck.v1.0.0 {
    first-occurrence 2014-09-16:11:00:00
    interval 60
    script poolCheck.v1.0.0
}

Configuration complete, moving on to test!

Testing the Solution

To test, I activated the vm instance in my lab and validated that my BIG-IP interfaces and pool members were up. Then, I shut down one apache virtual ahead of the first period at 11:26, and since I had 75% availability the interface remained enabled. Next, I shut down the second apache virtual, dropping availability to 50%. At 11:27, the BIG-IP interface was deactivated. Finally, I re-enabled the apache virtuals and at the next period the BIG-IP interface was reactivated. Log files and ping test to that interface shown below.

# Log Files
Sep 16 11:25:43 Pool /Common/pool4 member /Common/192.168.101.21:80 monitor status down.
Sep 16 11:26:00 Enough pool members in pool /Common/pool4,  interface 1.3 enabled
Sep 16 11:26:26 Pool /Common/pool4 member /Common/192.168.101.22:80 monitor status down.
Sep 16 11:27:00 Not enough pool members in pool /Common/pool4,  interface 1.3 disabled
Sep 16 11:27:32 Pool /Common/pool4 member /Common/192.168.101.21:80 monitor status up.
Sep 16 11:27:36 Pool /Common/pool4 member /Common/192.168.101.22:80 monitor status up.
Sep 16 11:28:01 Enough pool members in pool /Common/pool4,  interface 1.3 enabled
 
# Ping Test to Interface 1.3
Reply from 10.10.10.5: bytes=32 time=1ms TTL=255
Reply from 10.10.10.5: bytes=32 time=1ms TTL=255
Reply from 10.10.10.5: bytes=32 time=1ms TTL=255
Request timed out.
Reply from 10.10.10.205: Destination host unreachable.
Reply from 10.10.10.205: Destination host unreachable.
Reply from 10.10.10.205: Destination host unreachable.
Reply from 10.10.10.205: Destination host unreachable.
Reply from 10.10.10.205: Destination host unreachable.
Reply from 10.10.10.205: Destination host unreachable.
Reply from 10.10.10.205: Destination host unreachable.
Reply from 10.10.10.205: Destination host unreachable.
Reply from 10.10.10.205: Destination host unreachable.
Reply from 10.10.10.205: Destination host unreachable.
Reply from 10.10.10.205: Destination host unreachable.
Reply from 10.10.10.205: Destination host unreachable.
Reply from 10.10.10.205: Destination host unreachable.
Reply from 10.10.10.205: Destination host unreachable.
Reply from 10.10.10.205: Destination host unreachable.
Reply from 10.10.10.205: Destination host unreachable.
Reply from 10.10.10.205: Destination host unreachable.
Reply from 10.10.10.205: Destination host unreachable.
Reply from 10.10.10.5: bytes=32 time=1000ms TTL=255
Reply from 10.10.10.5: bytes=32 time=1ms TTL=255
Reply from 10.10.10.5: bytes=32 time=1ms TTL=255
Reply from 10.10.10.5: bytes=32 time=1ms TTL=255
Reply from 10.10.10.5: bytes=32 time=1ms TTL=255
Reply from 10.10.10.5: bytes=32 time=1ms TTL=255
Reply from 10.10.10.5: bytes=32 time=1ms TTL=255

One note from this solution, don't rely on the GUI or CLI status of the interface (known tested versions in 11.5.x+. Bug 471860 catalogs the reporting issue on BIG-IP for the interface status. At boot time, if the interface is up it reports as ENABLED, but if you disable and then re-enable, it reports as DISABLED even though it will be up and passing traffic.

Dig into iCall!

iCall (and tmsh more generally) is tremendously powerful, take a look at several other use cases already in the iCall codeshare! This solution has been added to the codeshare as well.

Published Sep 17, 2014

Version 1.0

iCall

TMSH