Forum Discussion
Sync-failover group doesn't sync properly
- Sep 02, 2023
Thank you for the hints! I've followed some actions described in ID882609 , though it wasn't exactly the situation I had. Specifically one of the devices failed to correctly restart tmm: bigstart restart tmm. That started spawning the following message each two seconds: Re-starting mcpd
I restarted that second device and did tail -f /var/log/tmm on both hosts.
First device
Sep 2 13:55:11 bigip2.xx.yyyy notice mcpd[6967]: 01b00004:5: There is an unfinished full sync already being sent for device group /Common/Sync-Failover-Group on connection 0xea1726c8, delaying new sync until current one finishes.
Second device with sync issues contained end_transaction message timeout
Sep 2 13:45:10 bigip1.xx.yyyy notice mcpd[7158]: 01070430:5: end_transaction message timeout on connection 0xe685c948 (user %cmi-mcpd-peer-10.13.13.132) Sep 2 13:45:10 bigip1.xx.yyyy notice mcpd[7158]: 01070418:5: connection 0xe685c948 (user %cmi-mcpd-peer-10.13.13.132) was closed with active requests Sep 2 13:45:10 bigip1.xx.yyyy notice mcpd[7158]: 0107143c:5: Connection to CMI peer 10.13.13.132 has been removed Sep 2 13:45:10 bigip1.xx.yyyy notice mcpd[7158]: 01071432:5: CMI peer connection established to 10.13.13.132 port 6699 after 0 retries Sep 2 13:50:10 bigip1.xx.yyyy notice mcpd[7158]: 01070430:5: end_transaction message timeout on connection 0xe685c948 (user %cmi-mcpd-peer-10.13.13.132) Sep 2 13:50:10 bigip1.xx.yyyy notice mcpd[7158]: 01070418:5: connection 0xe685c948 (user %cmi-mcpd-peer-10.13.13.132) was closed with active requests Sep 2 13:50:10 bigip1.xx.yyyy notice mcpd[7158]: 0107143c:5: Connection to CMI peer 10.13.13.132 has been removed Sep 2 13:50:10 bigip1.xx.yyyy notice mcpd[7158]: 01071432:5: CMI peer connection established to 10.13.13.132 port 6699 after 0 retries
That error message lead me to K25064172 and K10142141 despite I'm not running in AWS, my VmWare Workstation used vmxnet3 driver and I tried to switch to sock as suggested in that KB.
[root@bigip1:Standby:Not All Devices Synced] config # lspci -nn | grep -i eth 03:00.0 Ethernet controller [0200]: VMware VMXNET3 Ethernet Controller [15ad:07b0] (rev 01) 0b:00.0 Ethernet controller [0200]: VMware VMXNET3 Ethernet Controller [15ad:07b0] (rev 01) 13:00.0 Ethernet controller [0200]: VMware VMXNET3 Ethernet Controller [15ad:07b0] (rev 01) 1b:00.0 Ethernet controller [0200]: VMware VMXNET3 Ethernet Controller [15ad:07b0] (rev 01) [root@bigip1:Standby:Not All Devices Synced] config # tmctl -d blade tmm/device_probed pci_bdf pseudo_name type available_drivers driver_in_use ------------ ----------- --------- --------------------- ------------- 0000:03:00.0 F5DEV_PCI xnet, vmxnet3, sock, 0000:13:00.0 1.2 F5DEV_PCI xnet, vmxnet3, sock, vmxnet3 0000:0b:00.0 1.1 F5DEV_PCI xnet, vmxnet3, sock, vmxnet3 0000:1b:00.0 1.3 F5DEV_PCI xnet, vmxnet3, sock, vmxnet3
The fix for VmWare is
echo "device driver vendor_dev 15ad:07b0 sock" >> /config/tmm_init.tcl
And after I have restarted both nodes I saw the desired "In Sync" status.
What is interesting enough that I got this issue on two separate computers running the same VmWare Workstation version. I also reinstalled three different versions of BigIP and always got the same result. Another crazy thing is that if instead of Sync-Failover I would create Sync-Only group, there were no issues at all. It should be some compatibility issue I think.
Check the connectivity between the BIGIP's via HA interface IP's
10.13.13.131 and 10.13.13.132
Also check the Port lockdown settings for the HA Selfip.
make sure the HA interface is Tagged or untagged .
Do telnet on 4353 between the BIGIP on HA selfip
Thank you for the suggestion.
I haven't found the issue however:
- Port lockdown settings for HA Self IP's is set to "Allow All" for both devices
- Both HA interfaces are tagged with the same vlan 13
- 4353 connection is working fine, I can see packets travelling both ways on both hosts. Checked with: tcpdump -nn -i HA tcp port 4353
First host
09:39:39.272348 IP 10.13.13.132.4353 > 10.13.13.131.57460: Flags [P.], seq 71446:72894, ack 0, win 9018, options [nop,nop,TS val 1419664648 ecr 1419664639], length 1448 in slot1/tmm1 lis=_cgc_outbound_/Common/bigip2.xx.yyyy_6699 port=1.3 trunk=
09:39:39.272436 IP 10.13.13.131.57460 > 10.13.13.132.4353: Flags [.], ack 72894, win 65535, options [nop,nop,TS val 1419664647 ecr 1419664648], length 0 out slot1/tmm1 lis=_cgc_outbound_/Common/bigip2.xx.yyyy_6699 port=1.3 trunk=
09:39:39.283026 IP 10.13.13.132.4353 > 10.13.13.131.57460: Flags [.], seq 72894:74342, ack 0, win 9018, options [nop,nop,TS val 1419664651 ecr 1419664647], length 1448 in slot1/tmm1 lis=_cgc_outbound_/Common/bigip2.xx.yyyy_6699 port=1.3 trunk=
09:39:39.283110 IP 10.13.13.132.4353 > 10.13.13.131.57460: Flags [P.], seq 74342:74400, ack 0, win 9018, options [nop,nop,TS val 1419664651 ecr 1419664647], length 58 in slot1/tmm1 lis=_cgc_outbound_/Common/bigip2.xx.yyyy_6699 port=1.3 trunk=
09:39:39.793529 IP 10.13.13.132.25677 > 10.13.13.131.4353: Flags [P.], seq 1:203, ack 1, win 12316, length 202 in slot1/tmm1 lis=_cgc_inbound_/Common/bigip1.xx.yyyy port=1.3 trunk=
09:39:39.793643 IP 10.13.13.131.4353 > 10.13.13.132.25677: Flags [.], ack 203, win 16189, length 0 out slot1/tmm1 lis=_cgc_inbound_/Common/bigip1.xx.yyyy port=1.3 trunk=
09:39:39.811879 IP 10.13.13.131.4353 > 10.13.13.132.25677: Flags [P.], seq 1:76, ack 203, win 16189, length 75 out slot1/tmm1 lis=_cgc_inbound_/Common/bigip1.xx.yyyy port=1.3 trunk=
09:39:39.813850 IP 10.13.13.132.25677 > 10.13.13.131.4353: Flags [.], ack 76, win 12391, length 0 in slot1/tmm1 lis=_cgc_inbound_/Common/bigip1.xx.yyyy port=1.3 trunk=
09:39:39.824753 IP 10.13.13.131.57460 > 10.13.13.132.4353: Flags [P.], seq 0:202, ack 72894, win 65535, options [nop,nop,TS val 1419665200 ecr 1419664648], length 202 out slot1/tmm1 lis=_cgc_outbound_/Common/bigip2.xx.yyyy_6699 port=1.3 trunk=
Second host
09:41:24.654511 IP 10.13.13.132.4353 > 10.13.13.131.51678: Flags [P.], seq 39154:40551, ack 1, win 6565, options [nop,nop,TS val 1419770029 ecr 1419770026], length 1397 out slot1/tmm1 lis=_cgc_inbound_/Common/bigip2.xx.yyyy port=1.3 trunk=
09:41:24.658487 IP 10.13.13.131.51678 > 10.13.13.132.4353: Flags [.], ack 40551, win 65535, options [nop,nop,TS val 1419770030 ecr 1419770029], length 0 in slot1/tmm1 lis=_cgc_inbound_/Common/bigip2.xx.yyyy port=1.3 trunk=
09:41:24.658558 IP 10.13.13.132.4353 > 10.13.13.131.51678: Flags [P.], seq 40551:42079, ack 1, win 6565, options [nop,nop,TS val 1419770033 ecr 1419770030], length 1528 out slot1/tmm1 lis=_cgc_inbound_/Common/bigip2.xx.yyyy port=1.3 trunk=
09:41:25.189243 IP 10.13.13.132.25677 > 10.13.13.131.4353: Flags [.], ack 3575478456, win 13042, length 0 out slot1/tmm1 lis=_cgc_outbound_/Common/bigip1.xx.yyyy_6699 port=1.3 trunk=
09:41:25.190545 IP 10.13.13.131.4353 > 10.13.13.132.25677: Flags [.], ack 1, win 18138, length 0 in slot1/tmm1 lis=_cgc_outbound_/Common/bigip1.xx.yyyy_6699 port=1.3 trunk=
09:41:25.190633 IP 10.13.13.132.25677 > 10.13.13.131.4353: Flags [.], ack 1, win 13042, length 0 out slot1/tmm1 lis=_cgc_outbound_/Common/bigip1.xx.yyyy_6699 port=1.3 trunk=
09:41:25.191423 IP 10.13.13.131.4353 > 10.13.13.132.25677: Flags [.], ack 1, win 18138, length 0 in slot1/tmm1 lis=_cgc_outbound_/Common/bigip1.xx.yyyy_6699 port=1.3 trunk=
09:41:25.658648 IP 10.13.13.132.4353 > 10.13.13.131.51678: Flags [.], seq 40551:41999, ack 1, win 6565, options [nop,nop,TS val 1419771033 ecr 1419770030], length 1448 out slot1/tmm1 lis=_cgc_inbound_/Common/bigip2.xx.yyyy port=1.3 trunk=
09:41:25.764044 IP 10.13.13.131.51678 > 10.13.13.132.4353: Flags [.], ack 41999, win 65535, options [nop,nop,TS val 1419771136 ecr 1419771033], length 0 in slot1/tmm1 lis=_cgc_inbound_/Common/bigip2.xx.yyyy port=1.3 trunk=
09:41:25.764175 IP 10.13.13.132.4353 > 10.13.13.131.51678: Flags [P.], seq 41999:42079, ack 1, win 6565, options [nop,nop,TS val 1419771139 ecr 1419771136], length 80 out slot1/tmm1 lis=_cgc_inbound_/Common/bigip2.xx.yyyy port=1.3 trunk=
09:41:25.764206 IP 10.13.13.132.4353 > 10.13.13.131.51678: Flags [P.], seq 42079:43527, ack 1, win 6565, options [nop,nop,TS val 1419771139 ecr 1419771136], length 1448 out slot1/tmm1 lis=_cgc_inbound_/Common/bigip2.xx.yyyy port=1.3 trunk=
- ragunath154Sep 02, 2023Cirrostratus
looks like connectivity issue
have you checked below links
https://cdn.f5.com/product/bugtracker/ID882609.html
Recent Discussions
Related Content
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com