Forum Discussion

smp_86112's avatar
smp_86112
Icon for Cirrostratus rankCirrostratus
Sep 16, 2009

Problem with RADIUS monitor

I have LTMs running 9.3.1HF6 in an Active/Standby config. I have 5 pools, each with two members, which are being checked by a RADIUS monitor. On the Active unit, the checks are being performed and the members are being marked Up. On the Standby unit however, tcpdumps confirm the checks are not being performed and the members are being marked Down. We have failed over in an attempt to diagnose - when the current Standby unit was Active, it exhibited the same behavior. So it seems isolated to a specific unit - not sure what to check though. I have tried removing all references to the vips, pools, nodes and monitor, and re-synching from the Active unit but that didn't work. Applying a simple UDP monitor to the pools on the Standby seems to trigger a check of the pool members (confirmed with tcpdump), and they are marked Up.

 

 

Your thoughts would be appreciated. Thanks.

3 Replies

  • By enabling monitor debugging, I was able to see what command is being executed by the monitor. The command is crashing on the problem unit:

     

     

    [root@:Active] builtins /usr/bin/monitors/builtins/wrap_pinger /usr/bin/monitors/builtins/RADIUS_monitor ::ffff: 1812

     

    [root@:Active] builtins

     

     

     

    [root@:Standby] builtins /usr/bin/monitors/builtins/wrap_pinger /usr/bin/monitors/builtins/RADIUS_monitor ::ffff: 1812

     

    Segmentation fault (core dumped)

     

    [root@:Standby] builtins

     

     

     

    And I have isolated it to the RADIUS_monitor binary:

     

     

     

    [root@:Active] builtins /usr/bin/monitors/builtins/RADIUS_monitor

     

    sendto() problem: Invalid argument

     

    [root@:Active] builtins

     

     

     

    [root@:Standby] builtins /usr/bin/monitors/builtins/RADIUS_monitor

     

    Segmentation fault (core dumped)

     

    [root@:Standby] builtins

     

     

  • spark_86682's avatar
    spark_86682
    Historic F5 Account
    That definitely sounds like you should open up a case with F5 support.
  • I did open one today. I compared the library dependencies of the binary (ldd -v) between the two units, and both are the same. I also compared each shared object file between the two units - by Modified Date, size, and md5sum - everything is the same. I also tried copying the binary from the good to the bad unit - no dice. I am out of my league when it gets this deep...

     

     

    I will update this thread when I have more info.