05-Jul-2023 08:12
We have 4 LTM's in a device group and moved one of our Traffic groups to a different LTM a few weeks ago. All 4 devices show that padf5-core1.csiweb.com (the LTM it was moved to) as the active device, but there are no stats or connections present on that LTM. The orignal LTM it was moved from still has all the traffic and I have traced MAC address' to that device. I have failed/moved between traffic groups many many times without issues. I opened a case with support which has not gotten very far...from the QKVIEW's they say that both devices are "active" and I need to disable the VS's on the original for it to work properly. I don't understand this as I have never had to do this in the past and if I were to disable the VS"s on any inactive LTM how would failover work in the event of an outage. I have requested a support session with F5 on this issue but haven't recieved a response at this time.
FYI both LTM's in question have other traffic groups running properly on them.
Thanks,
Joe
05-Jul-2023 09:31
You may have a split brain situation where you actually dont have an active sync and network failover happeneing between all 4 devices. Do you have all 4 devices within the same device-group/traffic-group? When you perform an iqdump on each unit, do you see the other 3 units communicating properly? Can you provide a screenshot of the sync page, where the F5 units part of the device groups are listed with their current status -- ie, grey or green balls.
05-Jul-2023 09:45
My ultimate goal will be to have only 2 devices in the traffic group for failover as we will be decommisioning 2 of the LTM's later this year.
From the image below I would like to ultimately only have the padf5-core1 and padf5-core2 in the traffic group, but currenly all the traffic is on Pad-F5-2.
Can you please provide more information on the iqdump?
Thanks,
05-Jul-2023 09:58
Do you have more than one traffic group configured? That is another instance where you would have more than one F5 unit active. If the traffic groups are not on the same unit, then multiple units may be active. If you need to move connections ASAP, just force offline the problematic F5 unit.
Looks like the devices should all see each other and configs are synced. So there shouldnt be any issue with network failover, the port lockdown settings, or iquery comms in this case. (If you log into each unit and the standby / active devices all look the same in GUI .. connected, grey balls or green balls, same ball colors, etc. then connectivity wise you should be OK. I would check the traffic groups.
05-Jul-2023 10:19
Yes we have seperate traffic groups for each Partition. My goal is migrate each Partion/Traffic group to the new LTM in phases. I know I have moved between all of the devices in the past but I am not sure I have since we performed an upgrade a few months ago. I think I am going to migrate traffic off the "problem one" this weekend and then perform a reboot. I have held off on this but I am pretty sure I am hitting a known issue where the versions don't show up correctly from one device to another, I have had this in the past and had to perform the following:
K13030: Forcing the mcpd process to reload the BIG-IP configuration
https://support.f5.com/csp/article/K13030
Thanks,
Joe
05-Jul-2023 10:37
Well, that is why you have a mish mash of active devices. It all depends where that traffic group is located. The shared objects, such as Virtual Servers, associated with that traffic group will still process traffic on the unit where that traffic group is active.
Not sure what you mean by versions btw. Are you suggesting the units are running different version of BIG-IP software (that shouldnt be the case) or that the configs are out of sync?
05-Jul-2023 10:58
Version of padf5-core1
Device listing from PadF5-2 shows wrong version for 2 devices, all devices are running the same version and have been verified multiple times. I had this issue when upgrading to 13.5.X as well and the "workaround" was do a a force load on all 4 devices.
CORESERVICES VIP on PADF5-2 (it is active but not for this traffic group) all the traffic is still being processed by this LTM.
Even PADF5-2 reflects that padf5-core1 shold be the active device.
No stats or data present on padf5-core1
padf5-core1 should be the primary for that traffic group and it would allow the option to Force to standby.
We have split traffic in the past betwen our different traffic groups without issue.
Thanks,
Joe
20-Aug-2023 09:20
I am still having the issue, it appears that one specific traffic group will go "Active" on another LTM but the traffic still seems to be "stuck" to the original active device. I failed a different traffic group over to another LTM without issue this morning but tried to move the CORESERVICES one last night and the mac/arp entries were still present on the orgiinal F5.
I had a case open with F5 and they stated that I needed to disable the VIPs on the existing LTM after I migrate force it to standby...but since they are in a config sync group that would shut down the VIP's for all LTM's. I am lost at this point. I am going to work on moving all traffic off the existing LTM and then rebooting it to see if that clears the issues.
Any ideas would be appreciated..
Thanks,
Joe
23-Aug-2023 06:30
I think I found the issue but not completely sure how it got this way or exactly how to resolve it. It appears the only way this Traffic group will failover is to another LTM is when the "traffic-group-1" is moved as well. I looked at the Virutal Servers : Virtual Address List and all the of the Address are assocated with the "traffic-group-1" instead of the proper Trafic Group:
I need to figure out how we got in this state and is it as simply as changing each one to the proper Traffic Group. I would also like to determine how to chorrect this issue going forward.
Thanks,
Joe