Forum Discussion

Dazzla_20011's avatar
Dazzla_20011
Icon for Nimbostratus rankNimbostratus
Feb 09, 2011

Monitor instance bigip *.*.*.*:443 UP --> DOWN from gtmd (no reply from big3d: timed out)

Hi,

 

 

Both our GTM's are filling the logs with the following entries every few minutes.

 

 

Monitor instance bigip *.*.*.*:443 UP --> DOWN from gtmd (no reply from big3d: timed out)

 

 

SNMP_TRAP: VS test.co.uk-test-nat (ip:port=*.*.*.*:443) (Server BIG-IP_Pair) state change green --> red (VS test.co.uk-test-nat : Monitor bigip from gtmd : no reply from big3d: timed out)

 

 

Monitor instance bigip *.*.*.*:443 DOWN --> UP from 10.224.200.8 (UP)

 

 

SNMP_TRAP: VS test.co.uk-test-nat (ip:port=*.*.*.*:443) (Server BIG-IP_Pair) state change red --> green

 

 

This is happening every few minutes.

 

 

We have tow data centres with a GTM at each site which connect in to an LTM pair in an Active - Standby set up.

 

I'm only seeing these logs for at one data centre.

 

 

Any ideas why this would be happening.

 

 

 

Thanks

7 Replies

  • Sounds like GTM is having issues talking to the big3d daemon on the LTM in one DC.

     

     

    On that LTM, check whether big3d is running properly, "bigstart status big3d" and if necessary, try restarting the service (bigstart restart big3d)

     

     

  • Turned out that someone had enabled nat-control on the firewall so the second GTM couldn't talk to the LTM's.

     

  • Hamish's avatar
    Hamish
    Icon for Cirrocumulus rankCirrocumulus
    Mmm...

     

     

    I have the same issue. But not a problem with comms, because GTM manages to get a list of VirtualServers OK... They're just always marked down. Comms are fine, I can see the traffic running... But for some reason GTM things big3d is timing out. Is there a way to get more debugging out of gtmd or big3d to see what's actually going wrong?

     

     

    H
  • Hi Hamish,

     

     

     

    Could it be possible that the monitor uses the default gateway on the mgmt interface to get its results? I have seen this behaviour before.

     

     

    If you have configured an item to be monitored (whichever item: a virtual server, a bigip, etc.) and your bigip has no route for it directly, it'll fall through using the default gateway of the mgmt interface.

     

     

    This can be particularly foul if the mgmt gateway actually let's stuff pass.

     

     

     

    Kind regards,

     

     

    Thomas
  • Hamish's avatar
    Hamish
    Icon for Cirrocumulus rankCirrocumulus
    The GTM is co-hosted with the LTM. So you'd think not... The iQuery comms show as fine. Just no bigip monitor updates by the looks. Very strange.

     

     

    Setting loglevel to debug (On the LTM only you have to set the db variable by hand, the GTM host can do it from GUI), doesn't help a lot so far... It all looks normal (WIth the exception of being able to find any indication of where the updates should be going :)

     

     

    H
  • Hamish's avatar
    Hamish
    Icon for Cirrocumulus rankCirrocumulus
    Yeah. It;s a known bug with v11 when you have multiple traffic-groups. Since multiple ALSO includes using both the default floating and default non-floating traffic groups for some addresses, it hits more often than you'd think.

     

     

    The iqdump output is pretty clear when this is happeneing, if you have a working unit and a non-working unit to check. When the unit isn't working, you're missing VIP and MONITOR messages in the stream. Hence gtmd times the VS out because it's not seeing the messages for the VS coming from the LTM. The message actually makes sense. It's just a little terse (Because it's not the comms timing out, it's the missing messages in the stream that cause it).

     

     

    I was told it'll be fixed in 11.3... But I did manage to convince support to get me an engineering hotfix for 11.2.0HF2. I just hope it'll get rolled into 11.2.0HF3 (Because the fix isn't in 11.2.1).

     

     

    FWIW the fix appears to be somewhere other than the big3d binary itself... The version of big3d didn't change. And replacing the EHF big3d with EM's big3d binary still works correctly too.

     

     

     

    H