Forum Discussion

Perry_71428's avatar
Perry_71428
Icon for Nimbostratus rankNimbostratus
Feb 06, 2012

Config Sync Problem - LTM v11.1.0 Hotfix 1953.0

Hi

 

 

I have raised a F5 support case for this which is still being worked on, but has anyone else experienced any issues with config sync in LTM v11.1.0 Hotfix 1953.0 after adding a ssl certificate to an active unit in an active/passive pair?

 

 

I have 20 ssl loaded plus all required VIP's/Pool's/Nodes/Irules etc etc and after making changes to anything except adding another ssl config sync works fine.

 

 

However, adding that extra cert (and I have tried certs from different CA's but the same error occurs), causes the config sync to fail and it blows away the config sync interface

 

 

> Feb 3 12:31:07 LB01 notice logger: /usr/bin/tmipsecd --tmmcount 2 ==> /usr/bin/bigstart restart racoon

 

> Feb 3 12:34:54 LB01 err mcpd[29254]: 01071392:3: Background command '/usr/bin/rsync --rsync-path=/usr/bin/rsync -at --blocking-io /var/named/ rsync://192.168.0.2/var_name' failed. The command exited with status 12.

 

> Feb 3 12:41:35 LB01 err mcpd[29254]: 01071392:3: Background command '/usr/bin/rsync --rsync-path=/usr/bin/rsync -at --blocking-io /var/named/ rsync://192.168.0.2/var_name' failed. The command exited with status 12.

 

> Feb 3 12:45:55 LB01 info bcm56xxd[29185]: 012c0015:6: Link: 1.4 is DOWN

 

>

 

> Feb 3 12:49:24 LB01 notice mcpd[29254]: 0107143a:5: CMI reconnect timer: disabled, all peers are connected

 

> Feb 3 12:51:29 LB01 err mcpd[29254]: 01071392:3: Background command '/usr/bin/rsync --rsync-path=/usr/bin/rsync -at --blocking-io /var/named/ rsync://192.168.0.2/var_name' failed. The command exited with status 23.

 

> Feb 3 12:54:34 LB01 err mcpd[29254]: 01071392:3: Background command '/usr/bin/rsync --rsync-path=/usr/bin/rsync -at --blocking-io /var/named/ rsync://192.168.0.2/var_name' failed. The command exited with status 23.

 

> Feb 3 12:55:26 LB01 info bcm56xxd[29185]: 012c0015:6: Link: 1.4 is DOWN

 

>

 

> eb 3 13:00:41 LB01 err mcpd[29254]: 01071392:3: Background command '/usr/bin/rsync --rsync-path=/usr/bin/rsync -at --blocking-io /var/named/ rsync://192.168.0.2/var_name' failed. The command exited with status 23.

 

> Feb 3 13:08:11 LB01 err mcpd[29254]: 01071392:3: Background command '/usr/bin/rsync --rsync-path=/usr/bin/rsync -at --blocking-io /var/named/ rsync://192.168.0.2/var_name' failed. The command exited with status 23.

 

> Feb 3 13:08:35 LB01 info bcm56xxd[29185]: 012c0015:6: Link: 1.4 is DOWN

 

> Feb 3 13:08:38 tmm1 err tmm1[31736]: 01340002:3: HA Connection with peer 192.168.0.2:1028 lost.

 

 

Changing the config sync interface to be another self IP makes no difference (and indeed makes it harder to get out of the problem when not using the HA connection between the units)

 

 

Another bit of evidence is that under some circumstances, after the failed config sync, the config fails to load at all giving a syntax error when it finds the word "ALL" in a datagroup string value.

 

 

After removing the additional certificate, and a bit more messing around with reboots, the units sync again properly. The "ALL" in the datagroup string value is then not an issue again (so I think this is a symptom of the problem rather than the cause)

 

 

It seems to me there must be either some kind of config corruption happening when the ssl certificate is imported, or there is a bug in the config parser/loader that I am hitting.

 

 

Anyone else seen this or anything similar that may help to fix/workaround?

 

 

Thanks

 

 

Perry

 

 

  • i understand you are talking about c1040720. i did a little bit test and yes i agree with you that it could be a bug.

    these are my log.

    [root@LB01:Standby] config  cat /var/log/ltm
    Feb  6 14:15:26 LB01 err mcpd[4679]: 01071392:3: Background command '/usr/bin/rsync --rsync-path=/usr/bin/rsync -at --blocking-io  /var/named/  rsync://192.168.0.2/var_name' failed. The command exited with status 23.
    
    [root@LB02:Active] log  cat /var/log/ltm
    Feb  6 14:15:05 LB02 err mcpd[4811]: 0107134b:3: (Child rsync being terminated due to timeout. Total size in Kb: 20 timeout in secs: 10 start-time: Mon Feb  6 14:14:53 2012  max-end-time: Mon Feb  6 14:14:53 2012  time now: Mon Feb  6 14:14:53 2012 ) errno(0) errstr().
    Feb  6 14:15:06 LB02 notice mcpd[4811]: 01071038:5: Unit key read from the hardware.
    Feb  6 14:15:06 LB02 err mcpd[4811]: 01070712:3: Caught configuration exception (0), Failed to sync files. - sys/validation/FileObject.cpp, line 5565.
    Feb  6 14:15:06 LB02 err mcpd[4811]: 01071488:3: Remote transaction for device group /Common/XXXXX-DeviceGroup to commit id 6 5706025887953349833 /Common/LB01.xxxxx.xxx failed with error Failed to sync files. - sys/validation/FileObject.cpp, line 5565.
    
  • Hi nitass

     

     

    Thanks for feedback - my support agent is raising this to Severity 3 although I am asking if it can be moved to Severity 2. I am keen to get resolution as this is preventing me from deploying new business critical VIP's to our F5's.

     

     

    If it is a LTM bug, what are the typical turnround times for a hotfix release (I obviously understand that this would have to go through your own testing process) ?

     

     

    Regards

     

     

    Perry
  • i tried to convert PKCS12 to PEM but it did not help. anyway, after deleting the problem certificate/key, configsync worked just fine.

     

     

    i have not had any other idea yet. anyway, i guess log message on LB02 may be more useful than the one on LB01.

     

     

    support engineer will work with Engineering Service to confirm whether it is bug or not and what workaround/hotfix is.

     

     

    by the way, Jaspreet is one of the best engineer we have.

     

     

    cheer!
  • This has been confirmed as a bug in the LTM 11.1 bug id 378505 but fortunately there is a workaround

     

     

    Add the two /etc/hosts entries to the Standby

     

    tmsh modify sys global-settings remote-host add { yourhostname1 {addr yourprivateselfipforhost1 hostname yourhostname1}}

     

    tmsh modify sys global-settings remote-host add { yourhostname2 {addr yourprivateselfipforhost2 hostname yourhostname2}}

     

     

    Push this minor config update to the Active

     

    tmsh run cm config-sync to-group yourdevicegroup

     

     

    Save the config state on the Active and Standby units

     

    tmsh save sys config partitions all

     

     

    I can confirm that this workaround worked for me
  • I faced the similar issue with exception (0), Failed to sync files. - sys/validation/FileObject.cpp, line 5565.

     

     

    The issue was the time was not in sync on both devices, once the time is synchronize I was able to Sync the devices.

     

  • @Bakir_abdel - glad it worked for you.

     

     

    I'm now on LTM v11.1.0 Hotfix 2027.0 (HF2) as there were some further memory management issues in 1953 that were affecting us.

     

     

    From F5 support I understand that LTM v11.2 is imminent which I am again keen to load up as that contains a fix for https://support.f5.com/kb/en-us/solutions/public/13000/400/sol13493.html which again is something that is causing us issues meaning I have temporarily disabled web acceleration (ramcache) on our F5's

     

     

    I would really recommend that you take a qkview and upload it to iHealth as that will scan your config for any other errors as that spotted the above issue for me.

     

     

    Perry
  • Hi Daniel

     

     

    Glad that worked second time around - I was going to suggest that you checked your config on the line that was failing for use of reserved words - we have had cases where using a reserved word - even as a datagroup value was enough under certain circumstances to cause the config to fail to load.

     

     

    Regards

     

     

    Perry