Forum Discussion

Sagar_Shah_9610's avatar
Sagar_Shah_9610
Icon for Nimbostratus rankNimbostratus
Sep 01, 2008

OSPF on active - standby BigIPs

Hello All,

 

 

I am running BigIP 3400s in a wireless data network of a telecom carrier (using CDMA 1xRTT as an access). The BigIPs are installed in active - standby configuration and are sitting inline between Cisco 6509s (subscriber side) & Cisco 7304s (internet gateways). For details, please find the architecture attached. For routing traffic from Cisco 6509s (subscriber) to Cisco 7304s (Internet gateway) we use static routing. For routing traffic from Cisco 7304s (Internet gateway) to Cisco 6509s (subscriber) OSPF is being used.

 

 

Background History -

 

Earlier we started with TMOS 9.3.0 on active - standby BigIPs. We observed, when OSPF was running on active as well as standby BigIP, standby BigIP was also sending topological database to its OSPF peers (6509s & 7304s), due to this peers used to send traffic to standby BigIP due to which there was a traffic outage. To solve this issue a script was provided to us which ensured that standby ZebOS daemon does not run when unit goes standby. Over a period of time, we saw that sometimes script failed to work and manual intervention was required to turn OFF ZebOS daemon on standby unit.

 

 

Afterwards it was recommend to upgrade TMOS to version 9.4.x (x > 3). We upgraded to TMOS 9.4.4. (Last week) In this version, OSPF is running on active and standby BigIPs, however the 'tmrouted' daemon ensures that routes advertised by OSPF on standby have a higher metric (65535), due to which peers will never prefer standby BigIP for sending traffic. This solution did answer our concern, however after implementing this solution we are seeing that all of a sudden there's a momentary traffic outage and throughput graph on active BigIP has a sharp trough (From 400Mbps, traffic drops to 0Mbps). This situation continues for few minutes and again the traffic shoots up. Last time it continued for 40 minutes and due to increasing complaints, we decided to stop 'tmrouted' and 'ZebOS' daemon on standby BigIP. This resolved our problem.

 

 

Has anyone come across a similar situation? Any thoughts on how this can be resolved?

 

 

Under mentioned is the running config of ZebOS on active and standby. The difference is OSPF router-ids are separate, rest all configuration is similar -

 

 

sh run

 

!

 

no service password-encryption

 

!

 

no banner motd

 

!

 

!

 

interface lo

 

!

 

interface tmm0

 

!

 

interface inside

 

!

 

interface venturi

 

!

 

interface outside

 

!

 

router ospf 10

 

ospf router-id 172.29.254.150

 

redistribute kernel

 

passive-interface venturi

 

network 172.29.254.128 0.0.0.15 area 172.23.119.0

 

network 172.29.254.144 0.0.0.15 area 172.23.119.0

 

!

 

route-map internal-out permit 10

 

!

 

line con 0

 

login

 

line vty 0 4

 

login

 

!

 

end

 

 

Best regards,

 

Sagar Shah

 

Email: sshah@venturiwireless.com; sagar.brit@gmail.com

 

+91 98208 95074

3 Replies

  • Hi Sagar,

     

    I have been looking at the diagram and configuration You have provided and I have one question: Why do you need to run OSPF there?

     

     

    From my point of view it should work pretty well with statics (assuming You are not exporting a lot of routes), however if You need to access VIPs, which probably will be anywhere except servers VLAN, it is still directly connected either for Internet and/or Subscriber side. Management addresses of servers behind LTM can be either NATed or statically routed as subnet on Ciscos.

     

     

    Regarding OSPF, I would suspect loosing adjacency or some other mishap in configurations of F5 and Cisco (something like graceful vs. signaled restarts, router priorities etc.). I would check Cisco configuration against ZebOS one and run a tcpdump of OSPF protocol to check what happens during traffic cut:

     

    - do you have active routes to F5 when cut starts? If yes, where they point to? => If they are ok, the problem is probably not with routing.

     

    - what is the state of neighboring routers? Check if they are not stuck in EXSTART, 2WAY or something similar. Check that on every node You have.

     

    - who is DR/BDR? Try to force Cisco as DR and BDR by setting priority.

     

    - check dumps if there are some malformed LSAs or something like that.

     

     

    If You find nothing, I would recommend to open the case either with F5 and/or Cisco.

     

     

    Update: do not forget to dump until traffic restores - it may be very interesting what causes that things return to normal...
  • Hi Paja,

     

     

    Thanks for the reply.

     

     

    Please see my response to your questions under mentioned -

     

     

    - do you have active routes to F5 when cut starts? If yes, where they point to? => If they are ok, the problem is probably not with routing.

     

     

    [SAGAR] >> No I donot have the same, I've asked my field guys to collect the same on BigIP when trough occurs. Basically, I've asked them to collect show ip ospf route and show ip ospf database outputs.

     

     

    - what is the state of neighboring routers? Check if they are not stuck in EXSTART, 2WAY or something similar. Check that on every node You have.

     

     

    [SAGAR] >> The state of neighbors on every router is Full/DR and Full/Backup which indicates that none of them get stuck halfway in some intermediate states!!!

     

     

    - who is DR/BDR? Try to force Cisco as DR and BDR by setting priority.

     

     

    [SAGAR] >> Generally DR/BDR are Ciscos but I've not set any priority on BigIP. It would be a good idea to set ip ospf priority on bigip to zero and force BigIP never to become a candidate for DR/BDR election.

     

     

    - check dumps if there are some malformed LSAs or something like that.

     

     

    [SAGAR] >> So far not collected dumps during the cut, waiting for a next cut and include this in my field manual!!

     

     

    Reason behind running OSPF is that, during deployment we ran into issues when we used auto last hop pool feature and pointed all traffic from internet routers to Cisco 6509s to the pool containing Cisco 6509s as members. The issue was predominantly inaccessibility to internet for the subscribers intermittently!! By populating OSPF on BigIPs this issue was resolved.

     

     

    I am also in touch with F5 support but so far I've not received a concrete solution to this problem.

     

     

    /Sagar