Forum Discussion
Sync-failover group doesn't sync properly
- Sep 02, 2023
Thank you for the hints! I've followed some actions described in ID882609 , though it wasn't exactly the situation I had. Specifically one of the devices failed to correctly restart tmm: bigstart restart tmm. That started spawning the following message each two seconds: Re-starting mcpd
I restarted that second device and did tail -f /var/log/tmm on both hosts.
First device
Sep 2 13:55:11 bigip2.xx.yyyy notice mcpd[6967]: 01b00004:5: There is an unfinished full sync already being sent for device group /Common/Sync-Failover-Group on connection 0xea1726c8, delaying new sync until current one finishes.
Second device with sync issues contained end_transaction message timeout
Sep 2 13:45:10 bigip1.xx.yyyy notice mcpd[7158]: 01070430:5: end_transaction message timeout on connection 0xe685c948 (user %cmi-mcpd-peer-10.13.13.132) Sep 2 13:45:10 bigip1.xx.yyyy notice mcpd[7158]: 01070418:5: connection 0xe685c948 (user %cmi-mcpd-peer-10.13.13.132) was closed with active requests Sep 2 13:45:10 bigip1.xx.yyyy notice mcpd[7158]: 0107143c:5: Connection to CMI peer 10.13.13.132 has been removed Sep 2 13:45:10 bigip1.xx.yyyy notice mcpd[7158]: 01071432:5: CMI peer connection established to 10.13.13.132 port 6699 after 0 retries Sep 2 13:50:10 bigip1.xx.yyyy notice mcpd[7158]: 01070430:5: end_transaction message timeout on connection 0xe685c948 (user %cmi-mcpd-peer-10.13.13.132) Sep 2 13:50:10 bigip1.xx.yyyy notice mcpd[7158]: 01070418:5: connection 0xe685c948 (user %cmi-mcpd-peer-10.13.13.132) was closed with active requests Sep 2 13:50:10 bigip1.xx.yyyy notice mcpd[7158]: 0107143c:5: Connection to CMI peer 10.13.13.132 has been removed Sep 2 13:50:10 bigip1.xx.yyyy notice mcpd[7158]: 01071432:5: CMI peer connection established to 10.13.13.132 port 6699 after 0 retries
That error message lead me to K25064172 and K10142141 despite I'm not running in AWS, my VmWare Workstation used vmxnet3 driver and I tried to switch to sock as suggested in that KB.
[root@bigip1:Standby:Not All Devices Synced] config # lspci -nn | grep -i eth 03:00.0 Ethernet controller [0200]: VMware VMXNET3 Ethernet Controller [15ad:07b0] (rev 01) 0b:00.0 Ethernet controller [0200]: VMware VMXNET3 Ethernet Controller [15ad:07b0] (rev 01) 13:00.0 Ethernet controller [0200]: VMware VMXNET3 Ethernet Controller [15ad:07b0] (rev 01) 1b:00.0 Ethernet controller [0200]: VMware VMXNET3 Ethernet Controller [15ad:07b0] (rev 01) [root@bigip1:Standby:Not All Devices Synced] config # tmctl -d blade tmm/device_probed pci_bdf pseudo_name type available_drivers driver_in_use ------------ ----------- --------- --------------------- ------------- 0000:03:00.0 F5DEV_PCI xnet, vmxnet3, sock, 0000:13:00.0 1.2 F5DEV_PCI xnet, vmxnet3, sock, vmxnet3 0000:0b:00.0 1.1 F5DEV_PCI xnet, vmxnet3, sock, vmxnet3 0000:1b:00.0 1.3 F5DEV_PCI xnet, vmxnet3, sock, vmxnet3
The fix for VmWare is
echo "device driver vendor_dev 15ad:07b0 sock" >> /config/tmm_init.tcl
And after I have restarted both nodes I saw the desired "In Sync" status.
What is interesting enough that I got this issue on two separate computers running the same VmWare Workstation version. I also reinstalled three different versions of BigIP and always got the same result. Another crazy thing is that if instead of Sync-Failover I would create Sync-Only group, there were no issues at all. It should be some compatibility issue I think.
Thank you for the suggestion.
I haven't found the issue however:
- Port lockdown settings for HA Self IP's is set to "Allow All" for both devices
- Both HA interfaces are tagged with the same vlan 13
- 4353 connection is working fine, I can see packets travelling both ways on both hosts. Checked with: tcpdump -nn -i HA tcp port 4353
First host
09:39:39.272348 IP 10.13.13.132.4353 > 10.13.13.131.57460: Flags [P.], seq 71446:72894, ack 0, win 9018, options [nop,nop,TS val 1419664648 ecr 1419664639], length 1448 in slot1/tmm1 lis=_cgc_outbound_/Common/bigip2.xx.yyyy_6699 port=1.3 trunk=
09:39:39.272436 IP 10.13.13.131.57460 > 10.13.13.132.4353: Flags [.], ack 72894, win 65535, options [nop,nop,TS val 1419664647 ecr 1419664648], length 0 out slot1/tmm1 lis=_cgc_outbound_/Common/bigip2.xx.yyyy_6699 port=1.3 trunk=
09:39:39.283026 IP 10.13.13.132.4353 > 10.13.13.131.57460: Flags [.], seq 72894:74342, ack 0, win 9018, options [nop,nop,TS val 1419664651 ecr 1419664647], length 1448 in slot1/tmm1 lis=_cgc_outbound_/Common/bigip2.xx.yyyy_6699 port=1.3 trunk=
09:39:39.283110 IP 10.13.13.132.4353 > 10.13.13.131.57460: Flags [P.], seq 74342:74400, ack 0, win 9018, options [nop,nop,TS val 1419664651 ecr 1419664647], length 58 in slot1/tmm1 lis=_cgc_outbound_/Common/bigip2.xx.yyyy_6699 port=1.3 trunk=
09:39:39.793529 IP 10.13.13.132.25677 > 10.13.13.131.4353: Flags [P.], seq 1:203, ack 1, win 12316, length 202 in slot1/tmm1 lis=_cgc_inbound_/Common/bigip1.xx.yyyy port=1.3 trunk=
09:39:39.793643 IP 10.13.13.131.4353 > 10.13.13.132.25677: Flags [.], ack 203, win 16189, length 0 out slot1/tmm1 lis=_cgc_inbound_/Common/bigip1.xx.yyyy port=1.3 trunk=
09:39:39.811879 IP 10.13.13.131.4353 > 10.13.13.132.25677: Flags [P.], seq 1:76, ack 203, win 16189, length 75 out slot1/tmm1 lis=_cgc_inbound_/Common/bigip1.xx.yyyy port=1.3 trunk=
09:39:39.813850 IP 10.13.13.132.25677 > 10.13.13.131.4353: Flags [.], ack 76, win 12391, length 0 in slot1/tmm1 lis=_cgc_inbound_/Common/bigip1.xx.yyyy port=1.3 trunk=
09:39:39.824753 IP 10.13.13.131.57460 > 10.13.13.132.4353: Flags [P.], seq 0:202, ack 72894, win 65535, options [nop,nop,TS val 1419665200 ecr 1419664648], length 202 out slot1/tmm1 lis=_cgc_outbound_/Common/bigip2.xx.yyyy_6699 port=1.3 trunk=
Second host
09:41:24.654511 IP 10.13.13.132.4353 > 10.13.13.131.51678: Flags [P.], seq 39154:40551, ack 1, win 6565, options [nop,nop,TS val 1419770029 ecr 1419770026], length 1397 out slot1/tmm1 lis=_cgc_inbound_/Common/bigip2.xx.yyyy port=1.3 trunk=
09:41:24.658487 IP 10.13.13.131.51678 > 10.13.13.132.4353: Flags [.], ack 40551, win 65535, options [nop,nop,TS val 1419770030 ecr 1419770029], length 0 in slot1/tmm1 lis=_cgc_inbound_/Common/bigip2.xx.yyyy port=1.3 trunk=
09:41:24.658558 IP 10.13.13.132.4353 > 10.13.13.131.51678: Flags [P.], seq 40551:42079, ack 1, win 6565, options [nop,nop,TS val 1419770033 ecr 1419770030], length 1528 out slot1/tmm1 lis=_cgc_inbound_/Common/bigip2.xx.yyyy port=1.3 trunk=
09:41:25.189243 IP 10.13.13.132.25677 > 10.13.13.131.4353: Flags [.], ack 3575478456, win 13042, length 0 out slot1/tmm1 lis=_cgc_outbound_/Common/bigip1.xx.yyyy_6699 port=1.3 trunk=
09:41:25.190545 IP 10.13.13.131.4353 > 10.13.13.132.25677: Flags [.], ack 1, win 18138, length 0 in slot1/tmm1 lis=_cgc_outbound_/Common/bigip1.xx.yyyy_6699 port=1.3 trunk=
09:41:25.190633 IP 10.13.13.132.25677 > 10.13.13.131.4353: Flags [.], ack 1, win 13042, length 0 out slot1/tmm1 lis=_cgc_outbound_/Common/bigip1.xx.yyyy_6699 port=1.3 trunk=
09:41:25.191423 IP 10.13.13.131.4353 > 10.13.13.132.25677: Flags [.], ack 1, win 18138, length 0 in slot1/tmm1 lis=_cgc_outbound_/Common/bigip1.xx.yyyy_6699 port=1.3 trunk=
09:41:25.658648 IP 10.13.13.132.4353 > 10.13.13.131.51678: Flags [.], seq 40551:41999, ack 1, win 6565, options [nop,nop,TS val 1419771033 ecr 1419770030], length 1448 out slot1/tmm1 lis=_cgc_inbound_/Common/bigip2.xx.yyyy port=1.3 trunk=
09:41:25.764044 IP 10.13.13.131.51678 > 10.13.13.132.4353: Flags [.], ack 41999, win 65535, options [nop,nop,TS val 1419771136 ecr 1419771033], length 0 in slot1/tmm1 lis=_cgc_inbound_/Common/bigip2.xx.yyyy port=1.3 trunk=
09:41:25.764175 IP 10.13.13.132.4353 > 10.13.13.131.51678: Flags [P.], seq 41999:42079, ack 1, win 6565, options [nop,nop,TS val 1419771139 ecr 1419771136], length 80 out slot1/tmm1 lis=_cgc_inbound_/Common/bigip2.xx.yyyy port=1.3 trunk=
09:41:25.764206 IP 10.13.13.132.4353 > 10.13.13.131.51678: Flags [P.], seq 42079:43527, ack 1, win 6565, options [nop,nop,TS val 1419771139 ecr 1419771136], length 1448 out slot1/tmm1 lis=_cgc_inbound_/Common/bigip2.xx.yyyy port=1.3 trunk=
looks like connectivity issue
have you checked below links
https://cdn.f5.com/product/bugtracker/ID882609.html
Recent Discussions
Related Content
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com