Forum Discussion
Dazzla_20011
Nov 26, 2010Nimbostratus
Help Major network outage involving F5 LTM
Hi,
I'm really hoping someone can help me. Last Friday we had a major problem which affected access to all our Core Systems. The initial problem was caused due to a bug within the Cisco Nexus IOS which caused loopguard to block the vlans on a port-channel and then unblock them.
The 3 vlans used by the F5 (real, virtual servers and heartbeat) between our two LTM's became blocked for a few microseconds.
2010 Nov 19 14:31:43 GR_Core2 %STP-2-LOOPGUARD_BLOCK: Loop guard blocking port port-channel1 on VLAN0205. 2010 Nov 19 14:31:43 GR_Core2 %STP-2-LOOPGUARD_UNBLOCK: Loop guard unblocking port port-channel1 on VLAN00205
We have two LTM's, in active (data centre1) - standby (data centre2).
When we came to investigate why users couldn't access the systems it was because the servers couldn't reach their default gateway which is a floating ip on the F5 LTM. To solve the problem I pressed update on the F5 self ip used as the DG. Suddenly the servers could reach their DG and access to systems was restored. I'm interested to know what this would have done. I suspect it sent out a gratuitous arp?
Having checked the logs the Standby LTM became Active. The LTM also reported address conflicts for some of the IP's which are used for the Virtual Servers.
Any help to determine the cause will be very much appreciated as we are new to the F5 world so troubleshooting is difficult as we are used to Cisco products. our support company isn't being very helpful.
One thing I have noticed as that we are not using MAC masquerade.
Many Thanks
Darren
- Dazzla_20011NimbostratusWe use a dedicated vlan for the network failover. During the night when the backup runs I'm seeing the inter-site links between the data centres hit 90% at times. I would have thought this would impact all vlans. I need to look in to policing this backup traffic.
- L4L7_53191NimbostratusOne of the worst outages I've ever seen had to do with a network heartbeat VLAN between two DCs that was severed. The BigIPs were split, as were 10 or so other devices that rely on GARP for failover. They all did what they were configured to do: they went active and all started sending GARPs out to every VLAN. What's worse is that only the failover VLAN was taken down and all of the other vlans were still connected. The arp storm was quite a spectacle to behold!
- HamishCirrocumulusAll very true... When configuring network failover I make it mandatory to have a NON core network path between the two boxes (And avoid 9.4 because it'll only use ONE path for network failover traffic. WIth 10.x you can configure multiple connections).
- Dazzla_20011NimbostratusWe use dark fibre to inter connect our data centres. We have some spare so I will use a pair just for the failover.
- Yes, dedicate ports, fiber etc whatever you can to avoid any issues.
- Paul_Szabo_9016Historic F5 AccountWhat's worse is that only the failover VLAN was taken down and all of the other vlans were still connected. The arp storm was quite a spectacle to behold!
Recent Discussions
Related Content
DevCentral Quicklinks
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com
Discover DevCentral Connects