Forum Discussion

Ryan_34424's avatar
Ryan_34424
Icon for Altostratus rankAltostratus
Jul 15, 2015

Multiple Monitor Timers

I have a situation where I would like to implement multiple monitor timers on a particular monitor/pool.

 

Situation: Pool of X web servers hosting a fussy application.

 

Problem: When the config group makes a change on the back-end, the application pools on the web servers reset all at once, and stop responding to requests for several seconds while they reload. This generally takes longer than the timeout value of the monitor.

 

Increasing the timer value for the pool is an option, however I'd rather not since if a system goes down, we want fail-over to take less than the time it would require me to increase the current monitor by. And yes, changing the way the back-end changes roll from web server to web server is ideal... it's not in my control. What I can control is the F5...

 

My thoughts are, have multiple monitors... if the monitor receives X reply, use timer A... if the monitor receives Y reply, use timer B.

 

Is that something that is possible?

 

  • Strictly speaking, the external monitor could change the timeout value on itself by executing tmsh -- or an iCall periodic script could run the monitor page check and modify the timeout value -- but I strongly recommend against doing that, or ever having an external monitor change configuration, for that matter. The reason is, if the BIG-IP is part of an HA cluster, it will make the member out-of-sync (and if autosync is turned on, it may push the config when somebody is making changes on another unit, wiping out whatever changes were made).

     

    An external monitor could solve this in a number of ways, but here is logic for one way to do it:

     

    1. Run the test that was being performed by the monitor (e.g., if the monitor if http, use curl to retrieve the page and parse the result);
    2. If the test in 1 succeeds, return an "up" value (that is, print something to stdout), remove the file cited in 5 (if it exists), then exit;
    3. If the test in 1 fails, run the test against the monitoring page;
    4. If the test in 3 fails or if the results do not indicate that maintenance is happening, remove the file cited in 5 (if it exists), then exit without writing anything to stdout (this will trigger a "down" status);
    5. If the test in 3 succeeds and indicates that maintenance is happening, check to see if there is file named /var/tmp/mymonitor.$ip:$port.tmp ($ip and $port are always the first two command-line argument provided to the monitor);
    6. If the file does not exist, grab the current unix epoch time in seconds and write it to the file, return an "up" value, then exit;
    7. If the file does exist, read the value, grab the current unix epoch time in seconds and compare them;
    8. If the delta in 5 is less than your second threshold (30 seconds), return an "up" value and exit;
    9. If the delta in 5 is greater than or equal to your second threshold, exit without writing anything to stdout (which triggers a "down" status).

    Here it is in psuedo-code(ish) ($1 and $2 are the command-line arguments supplied to the script):

     

     all external monitors are automatically passed the target IP and port
     as the first two positional parameters.  If the IP is IPv4, it will
     start with ::ffff:, followed by the IPv4 address as dotted-quad
     (e.g., ::ffff:10.10.10.1).
    
    ip=`echo $1 | sed 's/::ffff://'`
    port=$2
    file="/var/tmp/$0.$ip:$port.tmp"
    monitor_page="https://my.monitor.example.com/status"
    
    first_test=`curl -Lis http://$ip/ | grep -E '^HTTP/1.1 200'
    
    if [ "$first_test" != "" ]; then
        rm -f "$file"
        echo "UP"
        exit 0
    fi
    
     assuming that I'm looking for the string "In Maintenance Mode"
    second_test=`curl -Ls $monitor_page | grep 'In Maintenance Mode'`
    
    if [ "$second_test" == "" ]; then
        rm -f "$file"
        exit 1
    fi
    
    if [ -f "$file" ]; then
        last_time=`cat "$file"`
        curr_time=`date +%s`
    else
        date +%s > "$file"
        echo "UP"
        exit 0
    fi
    
    if [ $(( $curr_time - $last_time )) -lt 30 ]; then
        echo "UP"
        exit 0
    else
        exit 1
    fi
    
  • Vernon_97235's avatar
    Vernon_97235
    Historic F5 Account

    Firstly, why would you not want the pool members to be marked as "down" after the restart? Since the targets are not accepting connections, presumably you don't want to forward to those pools. If the BIG-IP receives a RST from the targets, it will essentially proxy that RST to the client. I assume you have a good reason for this choice, I'm just trying to better understand the situation, and this part is a bit unclear to me.

     

    You distinguish between two different responses ("X" and "Y") based on the conditions for the pool members. How do those responses differ?

     

    I'm guessing that your only recourse here is to use an external monitor, which can really do just about anything you want. You would, however, need to write it yourself (or engage F5 Professional Services to assist you in this endeavor). The following links provide additional information about external monitors:

     

    • Ryan77777's avatar
      Ryan77777
      Icon for Altocumulus rankAltocumulus
      I don't want them marked as down because they're not "down". They're just really slow because the config guys made an app change that recycled all of the members at once (of which I/they have no control over). However I do have control over the F5... so if I could switch timers depending on what the F5 receives back from the monitoring page (text on the page, whatever), then I could say "hey, they're making a config change right now according to the page I'm monitoring, wait for 30 seconds to hear a response back instead of 10" or the like. How would an external monitor work? The timeout value isn't controlled there as far as I can tell is it?
  • Firstly, why would you not want the pool members to be marked as "down" after the restart? Since the targets are not accepting connections, presumably you don't want to forward to those pools. If the BIG-IP receives a RST from the targets, it will essentially proxy that RST to the client. I assume you have a good reason for this choice, I'm just trying to better understand the situation, and this part is a bit unclear to me.

     

    You distinguish between two different responses ("X" and "Y") based on the conditions for the pool members. How do those responses differ?

     

    I'm guessing that your only recourse here is to use an external monitor, which can really do just about anything you want. You would, however, need to write it yourself (or engage F5 Professional Services to assist you in this endeavor). The following links provide additional information about external monitors:

     

    • Ryan77777's avatar
      Ryan77777
      Icon for Altocumulus rankAltocumulus
      I don't want them marked as down because they're not "down". They're just really slow because the config guys made an app change that recycled all of the members at once (of which I/they have no control over). However I do have control over the F5... so if I could switch timers depending on what the F5 receives back from the monitoring page (text on the page, whatever), then I could say "hey, they're making a config change right now according to the page I'm monitoring, wait for 30 seconds to hear a response back instead of 10" or the like. How would an external monitor work? The timeout value isn't controlled there as far as I can tell is it?
  • Strictly speaking, the external monitor could change the timeout value on itself by executing tmsh -- or an iCall periodic script could run the monitor page check and modify the timeout value -- but I strongly recommend against doing that, or ever having an external monitor change configuration, for that matter. The reason is, if the BIG-IP is part of an HA cluster, it will make the member out-of-sync (and if autosync is turned on, it may push the config when somebody is making changes on another unit, wiping out whatever changes were made).

     

    An external monitor could solve this in a number of ways, but here is logic for one way to do it:

     

    1. Run the test that was being performed by the monitor (e.g., if the monitor if http, use curl to retrieve the page and parse the result);
    2. If the test in 1 succeeds, return an "up" value (that is, print something to stdout), remove the file cited in 5 (if it exists), then exit;
    3. If the test in 1 fails, run the test against the monitoring page;
    4. If the test in 3 fails or if the results do not indicate that maintenance is happening, remove the file cited in 5 (if it exists), then exit without writing anything to stdout (this will trigger a "down" status);
    5. If the test in 3 succeeds and indicates that maintenance is happening, check to see if there is file named /var/tmp/mymonitor.$ip:$port.tmp ($ip and $port are always the first two command-line argument provided to the monitor);
    6. If the file does not exist, grab the current unix epoch time in seconds and write it to the file, return an "up" value, then exit;
    7. If the file does exist, read the value, grab the current unix epoch time in seconds and compare them;
    8. If the delta in 5 is less than your second threshold (30 seconds), return an "up" value and exit;
    9. If the delta in 5 is greater than or equal to your second threshold, exit without writing anything to stdout (which triggers a "down" status).

    Here it is in psuedo-code(ish) ($1 and $2 are the command-line arguments supplied to the script):

     

     all external monitors are automatically passed the target IP and port
     as the first two positional parameters.  If the IP is IPv4, it will
     start with ::ffff:, followed by the IPv4 address as dotted-quad
     (e.g., ::ffff:10.10.10.1).
    
    ip=`echo $1 | sed 's/::ffff://'`
    port=$2
    file="/var/tmp/$0.$ip:$port.tmp"
    monitor_page="https://my.monitor.example.com/status"
    
    first_test=`curl -Lis http://$ip/ | grep -E '^HTTP/1.1 200'
    
    if [ "$first_test" != "" ]; then
        rm -f "$file"
        echo "UP"
        exit 0
    fi
    
     assuming that I'm looking for the string "In Maintenance Mode"
    second_test=`curl -Ls $monitor_page | grep 'In Maintenance Mode'`
    
    if [ "$second_test" == "" ]; then
        rm -f "$file"
        exit 1
    fi
    
    if [ -f "$file" ]; then
        last_time=`cat "$file"`
        curr_time=`date +%s`
    else
        date +%s > "$file"
        echo "UP"
        exit 0
    fi
    
    if [ $(( $curr_time - $last_time )) -lt 30 ]; then
        echo "UP"
        exit 0
    else
        exit 1
    fi
    
    • Ryan77777's avatar
      Ryan77777
      Icon for Altocumulus rankAltocumulus
      Dang! Thanks for all the information- VERY helpful :o) I didn't even think about writing files to track its (for lack of a better term-) state. So basically the external monitor timeout/interval is really just tracking hearing back from that external monitor the F5 points to... which houses all of the REAL timeout logic. So as long as that interval/timeout correlate to the external monitors check it should be good. Meaning, if my external monitor script is performing the check every 30 seconds, the F5 needs to fire that monitor in some division of 30 i.e. 5, 10, 15). And that also needs to line-up with my "as-is" (non-config change state) interval (say 5, 10, 15). Love the idea, I'm going to config something up and test. I'll post the results. Thanks again!
  • Vernon_97235's avatar
    Vernon_97235
    Historic F5 Account

    Strictly speaking, the external monitor could change the timeout value on itself by executing tmsh -- or an iCall periodic script could run the monitor page check and modify the timeout value -- but I strongly recommend against doing that, or ever having an external monitor change configuration, for that matter. The reason is, if the BIG-IP is part of an HA cluster, it will make the member out-of-sync (and if autosync is turned on, it may push the config when somebody is making changes on another unit, wiping out whatever changes were made).

     

    An external monitor could solve this in a number of ways, but here is logic for one way to do it:

     

    1. Run the test that was being performed by the monitor (e.g., if the monitor if http, use curl to retrieve the page and parse the result);
    2. If the test in 1 succeeds, return an "up" value (that is, print something to stdout), remove the file cited in 5 (if it exists), then exit;
    3. If the test in 1 fails, run the test against the monitoring page;
    4. If the test in 3 fails or if the results do not indicate that maintenance is happening, remove the file cited in 5 (if it exists), then exit without writing anything to stdout (this will trigger a "down" status);
    5. If the test in 3 succeeds and indicates that maintenance is happening, check to see if there is file named /var/tmp/mymonitor.$ip:$port.tmp ($ip and $port are always the first two command-line argument provided to the monitor);
    6. If the file does not exist, grab the current unix epoch time in seconds and write it to the file, return an "up" value, then exit;
    7. If the file does exist, read the value, grab the current unix epoch time in seconds and compare them;
    8. If the delta in 5 is less than your second threshold (30 seconds), return an "up" value and exit;
    9. If the delta in 5 is greater than or equal to your second threshold, exit without writing anything to stdout (which triggers a "down" status).

    Here it is in psuedo-code(ish) ($1 and $2 are the command-line arguments supplied to the script):

     

     all external monitors are automatically passed the target IP and port
     as the first two positional parameters.  If the IP is IPv4, it will
     start with ::ffff:, followed by the IPv4 address as dotted-quad
     (e.g., ::ffff:10.10.10.1).
    
    ip=`echo $1 | sed 's/::ffff://'`
    port=$2
    file="/var/tmp/$0.$ip:$port.tmp"
    monitor_page="https://my.monitor.example.com/status"
    
    first_test=`curl -Lis http://$ip/ | grep -E '^HTTP/1.1 200'
    
    if [ "$first_test" != "" ]; then
        rm -f "$file"
        echo "UP"
        exit 0
    fi
    
     assuming that I'm looking for the string "In Maintenance Mode"
    second_test=`curl -Ls $monitor_page | grep 'In Maintenance Mode'`
    
    if [ "$second_test" == "" ]; then
        rm -f "$file"
        exit 1
    fi
    
    if [ -f "$file" ]; then
        last_time=`cat "$file"`
        curr_time=`date +%s`
    else
        date +%s > "$file"
        echo "UP"
        exit 0
    fi
    
    if [ $(( $curr_time - $last_time )) -lt 30 ]; then
        echo "UP"
        exit 0
    else
        exit 1
    fi
    
    • Ryan77777's avatar
      Ryan77777
      Icon for Altocumulus rankAltocumulus
      Dang! Thanks for all the information- VERY helpful :o) I didn't even think about writing files to track its (for lack of a better term-) state. So basically the external monitor timeout/interval is really just tracking hearing back from that external monitor the F5 points to... which houses all of the REAL timeout logic. So as long as that interval/timeout correlate to the external monitors check it should be good. Meaning, if my external monitor script is performing the check every 30 seconds, the F5 needs to fire that monitor in some division of 30 i.e. 5, 10, 15). And that also needs to line-up with my "as-is" (non-config change state) interval (say 5, 10, 15). Love the idea, I'm going to config something up and test. I'll post the results. Thanks again!