Forum Discussion
Dazzla_20011
Nov 26, 2010Nimbostratus
Help Major network outage involving F5 LTM
Hi,
I'm really hoping someone can help me. Last Friday we had a major problem which affected access to all our Core Systems. The initial problem was caused due to a bug within the Cisco Nexus IOS which caused loopguard to block the vlans on a port-channel and then unblock them.
The 3 vlans used by the F5 (real, virtual servers and heartbeat) between our two LTM's became blocked for a few microseconds.
2010 Nov 19 14:31:43 GR_Core2 %STP-2-LOOPGUARD_BLOCK: Loop guard blocking port port-channel1 on VLAN0205. 2010 Nov 19 14:31:43 GR_Core2 %STP-2-LOOPGUARD_UNBLOCK: Loop guard unblocking port port-channel1 on VLAN00205
We have two LTM's, in active (data centre1) - standby (data centre2).
When we came to investigate why users couldn't access the systems it was because the servers couldn't reach their default gateway which is a floating ip on the F5 LTM. To solve the problem I pressed update on the F5 self ip used as the DG. Suddenly the servers could reach their DG and access to systems was restored. I'm interested to know what this would have done. I suspect it sent out a gratuitous arp?
Having checked the logs the Standby LTM became Active. The LTM also reported address conflicts for some of the IP's which are used for the Virtual Servers.
Any help to determine the cause will be very much appreciated as we are new to the F5 world so troubleshooting is difficult as we are used to Cisco products. our support company isn't being very helpful.
One thing I have noticed as that we are not using MAC masquerade.
Many Thanks
Darren
- nitassEmployeei suggest u opening a support case. user is able to open a case with f5 support directly - no need to go through partner.
- HamishCirrocumulusYeah. That sounds like the gratuitous arp was missed. Not sure you'll be able to gather a lot of data post problem. If it happens again, you could try a few things like flushing the arp table of one of the servers that are on the direct attached vlans and see if that brings a few things alive.
- Chris_MillerAltostratusI used to run into the fairly often.
- HamishCirrocumulusFWIW I'm not a great fan of MAC masquerading... Mainly because of issues we used to have with cisco switches and them getting a bit confused when MAC addresses used to move from one part of the network to another suddenly (It was always easier to fixup ARP entries than try to force updates of where a MAC had moved to).
- HamishCirrocumulusOh... The cisco issues where when running with multiple switches and a quite large network... Single switches worked fine...
- HamishCirrocumulusOh... The cisco issues where when running with multiple switches and a quite large network... Single switches worked fine...
- Dazzla_20011NimbostratusThanks for the replies. The High Availability - Redundancy - Link Down time on Failover is set to 0.1 seconds. Is that basically saying if the Standby LTM receives nothing from Active LTM within 0.1 seconds then it will be come Active. If so this seems very low?
- That option is disabled by default... but yes that's what it's saying... In my experience network failover was more trouble than it was worth... but you are in an unique setup where it seems your backup unit is in another DC... I do not believe that is an ideal HA pair scenario... It seems F5s goal is to backup a unit apples to apples usually within the same DC... but I don't see why they couldn't be in different DCs and utilize network failover...
- Dazzla_20011NimbostratusWe have two pairs of LTM's, each pair pair has different functions. The pair of LTM's I'm refering to were bought to replace a pair of Cisco CSS load balancers which ran active - active. These sit on our internal LAN and are used to load a balance a specific set of core applications. The core applications reside on servers at both data centres. The VLAN the servers and F5 LTM sit in spans across each data centre. so yes the units are in the same subnet. The F5 consultant who designed and installed them recommended we didn't use an active - active set up.
- HamishCirrocumulusMy turn.
Recent Discussions
Related Content
Â
DevCentral Quicklinks
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com
Discover DevCentral Connects