Forum Discussion

wbbigdave_97776's avatar
wbbigdave_97776
Icon for Nimbostratus rankNimbostratus
Apr 12, 2010

Postfix / SNMP forwarding flood

Well hi,

 

 

Bit of a problem here. We have set up a BIgip balancing on a stic with a 4gig trunk to the back / front end. On this bigip there are about 40-50 virtual servers over 130 nodes and about 50 pools with about 4 members in each. We use a varity of http get requests and SOAP requests as monitors which are applied to the pools, the default node monitor is ICMP and there is no memeber specific monitoring.

 

 

The problem is that we are recieving hundereds and thousands of SNMP forwarded traps or poostfix forwarded alerts that many pool members are up and down. In one week we recieved 25,000 yup Twenty Five Thousand!

 

 

I have run a TCPdump and picked thorugh it with wireshark and all i can see is that the bigip is sending a ton of SynAcks every second between the virtual server address and the server that the node is residing on (each application has its own IP on a physical server which also has a node, for example 10.10.1.27 is a physical box Server07 for example and 10.10.1.127 is an application running on that box app1_Server07_node for example. Many physical boxes have 6-7 applications on them...don't shoot me i didnt design the system i implemented it under someone elses orders) I believe the problem might be to do with the fact we have these physical boxes as nodes not being used or something similar, i am unsure though.

 

 

If anyone has had similar issues or can proffer a solution please please help!

 

 

Thanks in advance

 

Luke

 

  • hoolio's avatar
    hoolio
    Icon for Cirrostratus rankCirrostratus
    Hi Luke,

     

     

    Are there any particular pools/members/nodes that are being marked up and down consistently, or is it distributed across all monitored objects? If the former, you could focus tuning the monitors. What kind of interval/timeout settings are you using? Is it: timeout = 3 x internal+1?

     

     

    Aaron
  • Yeh 5 and 16 for timeout, as far as the services going up and down, it woudl seem to be anything on the box. We thought it was restricted but I am unsure now. We did not configure the more complex SOAP request monitors or the backend applications so inefficiencies in either could cause this problem i suppose. Just wondered if there was anything anyone had come across before.

     

     

    Like i said it looks like the responses from the applications are gibberish based on the TCP dump i reviewed.

     

     

  • hoolio's avatar
    hoolio
    Icon for Cirrostratus rankCirrostratus
    Are you only using inbuilt monitors (no external scripted monitors)? If it's just a matter of load, you might try extending the interval and timeout to 10/31. It would probably be helpful to open a case with F5 Support and ask them to review your exact config, logs and tcpdumps to give you a more exact recommendation.

     

     

    Aaron
  • Yeh the Monitors are ones designed by our Architects It seems they are causing our issue, when it pushes out a the monitor send string it also requests a very long return string which is part of an even longer returned entry, basically there are a lot of datagrams going back and fourth and i think its timing out briefly and then coming back just in time.

     

  • hoolio's avatar
    hoolio
    Icon for Cirrostratus rankCirrostratus
    Can you trim the receive string to something more specific (and shorter)? Ideally, you'd be making a request to a custom web app page which just returns the status of the pool member and any dependent services beneath it (like connectivity to an app and/or db server).

     

     

    Aaron
  • Howdy

     

     

    Long time no speaky looking in depth at the issues we had with the monitors i found that the SOAP post requests were mighty and the recieve was very small. It only appears to be on a few servers so i can only draw the conclusion that the app is taking a long time to respond with the correct parameters, i know SOAP can be inherintly slow so it might be something here...I am unsure but it is out with the architects atm.