Poll members not stable after failover
Hi,
Our setup:
- two vcmp guests in HA (viprion with two blades)
- ~10 partitions
- simple configuration with LTM, AFM. nodes directly connected to f5 device (f5 device is default gw for nodes).
- sw 16.1.3.3, after upgrade 16.1.4
^^ this setup in two data centers.
We are hitting interesting behaviour in first data center only:
- second f5 guest is active: pool members monitors (http and https) respond without problem. everything is stable. this is valid for both f5 devices in HA.
- after failover (first f5 guest is active): pool members response is not stable (not stable for https monitor, http is stable again). sometimes are all pool members down, then virtual server is going down.
^^ it looks like a problem on node side, but it's not, because when second f5 device is active, everything is stable.
This issue is hitting almost all partitions. We checked:
- physical interface: everything is stable, no error on ports, ether-channels (trunks).
- arp records: everything looks correct, no mac flapping
- spanning tree: stable in environment
- routing: correct, default gw on node side: correct, subnet mask: correct on nodes and both f5 devices. floating addresses is working correctly (including arp in network)
- log on f5 devices: without any issue connected to this behaviour.
I don't know what else connected to this issue we can check.
Configuration for all f5 devices (2x dc1, 2x dc2 - two independed ha pairs) is the same (configured with automation), sw version is the same (we did upgrade to 16.1.4 two days ago). It looks that someting is "blocked" on first f5 device in dc1 (reboot or upgrade is not solving our issue).
Do you have any idea what else to check?