For more information regarding the security incident at F5, the actions we are taking to address it, and our ongoing efforts to protect our customers, click here.

Forum Discussion

Ovov's avatar
Ovov
Icon for Altostratus rankAltostratus
Sep 01, 2023
Solved

Sync-failover group doesn't sync properly

Hello, I need some help with essential Active/Standby setup where I can't make two nodes to sync data. This is the problem I end up with: "did not receive last sync successfully" VLANs are co...
  • Ovov's avatar
    Sep 02, 2023

    Thank you for the hints! I've followed some actions described in ID882609 , though it wasn't exactly the situation I had. Specifically one of the devices failed to correctly restart tmm: bigstart restart tmm. That started spawning the following message each two seconds: Re-starting mcpd

    I restarted that second device and did tail -f /var/log/tmm on both hosts.

    First device

     

    Sep 2 13:55:11 bigip2.xx.yyyy notice mcpd[6967]: 01b00004:5: There is an unfinished full sync already being sent for device group /Common/Sync-Failover-Group on connection 0xea1726c8, delaying new sync until current one finishes.

     

    Second device with sync issues contained end_transaction message timeout

     

    Sep 2 13:45:10 bigip1.xx.yyyy notice mcpd[7158]: 01070430:5: end_transaction message timeout on connection 0xe685c948 (user %cmi-mcpd-peer-10.13.13.132)
    Sep 2 13:45:10 bigip1.xx.yyyy notice mcpd[7158]: 01070418:5: connection 0xe685c948 (user %cmi-mcpd-peer-10.13.13.132) was closed with active requests
    Sep 2 13:45:10 bigip1.xx.yyyy notice mcpd[7158]: 0107143c:5: Connection to CMI peer 10.13.13.132 has been removed
    Sep 2 13:45:10 bigip1.xx.yyyy notice mcpd[7158]: 01071432:5: CMI peer connection established to 10.13.13.132 port 6699 after 0 retries
    Sep 2 13:50:10 bigip1.xx.yyyy notice mcpd[7158]: 01070430:5: end_transaction message timeout on connection 0xe685c948 (user %cmi-mcpd-peer-10.13.13.132)
    Sep 2 13:50:10 bigip1.xx.yyyy notice mcpd[7158]: 01070418:5: connection 0xe685c948 (user %cmi-mcpd-peer-10.13.13.132) was closed with active requests
    Sep 2 13:50:10 bigip1.xx.yyyy notice mcpd[7158]: 0107143c:5: Connection to CMI peer 10.13.13.132 has been removed
    Sep 2 13:50:10 bigip1.xx.yyyy notice mcpd[7158]: 01071432:5: CMI peer connection established to 10.13.13.132 port 6699 after 0 retries

     

    That error message lead me to K25064172 and K10142141 despite I'm not running in AWS, my VmWare Workstation used vmxnet3 driver and I tried to switch to sock as suggested in that KB.

    [root@bigip1:Standby:Not All Devices Synced] config # lspci -nn | grep -i eth
    03:00.0 Ethernet controller [0200]: VMware VMXNET3 Ethernet Controller [15ad:07b0] (rev 01)
    0b:00.0 Ethernet controller [0200]: VMware VMXNET3 Ethernet Controller [15ad:07b0] (rev 01)
    13:00.0 Ethernet controller [0200]: VMware VMXNET3 Ethernet Controller [15ad:07b0] (rev 01)
    1b:00.0 Ethernet controller [0200]: VMware VMXNET3 Ethernet Controller [15ad:07b0] (rev 01)
    
    [root@bigip1:Standby:Not All Devices Synced] config # tmctl -d blade tmm/device_probed
    pci_bdf pseudo_name type available_drivers driver_in_use
    ------------ ----------- --------- --------------------- -------------
    0000:03:00.0 F5DEV_PCI xnet, vmxnet3, sock,
    0000:13:00.0 1.2 F5DEV_PCI xnet, vmxnet3, sock, vmxnet3
    0000:0b:00.0 1.1 F5DEV_PCI xnet, vmxnet3, sock, vmxnet3
    0000:1b:00.0 1.3 F5DEV_PCI xnet, vmxnet3, sock, vmxnet3

    The fix for VmWare is

    echo "device driver vendor_dev 15ad:07b0 sock" >> /config/tmm_init.tcl

    And after I have restarted both nodes I saw the desired "In Sync" status.

    What is interesting enough that I got this issue on two separate computers running the same VmWare Workstation version. I also reinstalled three different versions of BigIP and always got the same result. Another crazy thing is that if instead of Sync-Failover I would create Sync-Only group, there were no issues at all. It should be some compatibility issue I think.