Forum Discussion

Mozzie's avatar
Mozzie
Icon for Nimbostratus rankNimbostratus
Jul 16, 2025

Issue with 2 parallel F5 clusters

Hello everybody and first of all thank you for taking the time to read my issue!

 

The issue that I have is in regards to a migration

We have a productive F5 BigIP cluster (Active/Standby), let's call this "Old F5", which has a lot of Virtual Servers in partitions, with specific pools and monitors for each application/service

This device also has 2 Vlans, internal (vlan11) and external (vlan10), and 2 interfaces in an LACP that it's tagged on both Vlans, and it's connected to the same one leg  to a Cisco APIC

It has 2 Self IP addresses (one for each Vlan):

   10.10.10.1-Vlan "external"

   10.20.20.1-Vlan "internal" (numbers are just for example)

It also has 4 Floating IP address (2 for each Vlan) with 2 traffic groups:

  10.10.10.2-Vlan external traffic group 1

  10.10.10.3-Vlan external traffic group 2

  10.20.20.2-Vlan internal traffic group 1

  10.20.20.3-Vlan internal traffic group 2

 

This device (cluster) has to be replaced by another F5 BigIP cluster (let's call this new F5), this device is an identical copy to the old F5 (the config was took from the old one and imported to the new one), meaning same Vlans, monitors, pools, VServers IP addresses etc

At the moment this one has the 2 interfaces disabled and a blackhole default reject route set up in order to not interfere with the old F5 which is the productive one.

 

The ideea is to configure the new F5 device with IP addresses from the same subnet (for example 10.10.10.5), and disable all the Virtual Servers so it doesn't handle traffic (the nodes, monitors, pools stay up on both devices), and have the 2 F5 devices, old and new, running in parallel and then move the Virtual servers one by one by just disabling the VS on the old F5 and enable it on the new F5.

At this point we also remove the blackhole route, configure the correct default static route (the same which is on the old F5), and enable the interfaces

This sounded and looked good, on the new F5 the nodes, pools are green and the Virtual servers are disabled as expected.

 

On the old productive F5 everything is up and green BUT if I try to reach one of the Virtual servers, either by the Virtual IP address or hostname the attempt just times out without any response (if I try to telnet to the VS on port 443 it connects meaning that the old F5 accepts the traffic)

 

I tried to disable on the new F5 also the nodes but still the same behaviour, the only to get it back to work is to disable the interfaces on the new F5 and add the default reject blackhole route.

 

This is not how I imagined it to work, in my mind I was expecting that the old F5 will work as normal, and the new F5 device will see the nodes and pools up (confirming good communication) but don't handle any traffic regarding the Virtual servers because they are disabled.

 

Does anyone have any idea what is causing this issue, why when both F5 devices are up in parallel, the connection to the Virtual server through the old productive F5 times out while that F5 sees both the pools and Virtual servers as up and running.

Thank you in advance!

 

 

 

 

 

 

 

3 Replies

  • As a first step you could do some tcpdump on new cluster to check if traffic ends there instead of old one

  • VGF5's avatar
    VGF5
    Icon for Cumulonimbus rankCumulonimbus

    Hi Mozzie​ 

     

    As per my understanding, you're running into ARP/IP conflicts., what’s happening is that the new F5 is responding to ARP requests for production IPs. As a result, the network is mistakenly sending live traffic to it, even though your virtual servers (VSs) might still be disabled.

    Don’t assign production IPs to the new F5 until you're ready to go live. If you need to preconfigure them, make sure ARP is disabled on those IPs to avoid interference. Always ensure only one F5 cluster at a time has those IPs active, having both can lead to traffic blackholing and outages

  • That is a lot to visualize.  I do have one question.  What device is doing the DNS to access these LTM's.  Are you using BIG IP GTM/DNS or another platform like InfoBlox.  Side note on the APIC.  Those things are so over designed.  We got a Cisco CCIE in our department and he still has to contact Cisco TAC to get APIC to work as required for our needs.  Even with disabling the Virtual Server there is the Virtual Address List that has ARP enable by default.