Forum Discussion

Sathish_Kumar_S's avatar
Sathish_Kumar_S
Icon for Nimbostratus rankNimbostratus
Jul 02, 2015

NTP Sync problem after BIG IP LTM system restarted

Hi,

 

We have upgraded our LTM system to 11.5 in Feb2015 and restarted the system. After that we didnt notice NTP sync got stuck, during our recent preventive maintenance check for "Leap Second" noticed that NTP sync is not working and there is 4.35secs delay as well.

 

[root@slb101:Active:In Sync] log date Thu Jul 2 16:57:57 MMT 2015

 

[root@slb101:Active:In Sync] config ntpq -p

 

remote refid st t when poll reach delay offset jitter

*172.16.2.237 0.0.0.0 1 u 145d 64 0 2.885 6.386 0.000 +172.16.2.238 0.0.0.0 1 u 145d 64 0 2.633 6.069 0.000

 

The 145d indicates that "time since the last response to a poll was received (in seconds).. so it is almost system is running without NTP.

 

Ping/traceroute also not working towards NTP server.

 

[root@slb101:Active:In Sync] config ping 172.16.2.237 PING 172.16.2.237 (172.16.2.237) 56(84) bytes of data. From 10.32.22.172 icmp_seq=1 Destination Net Unreachable From 10.32.22.172 icmp_seq=2 Destination Net Unreachable From 10.32.22.172 icmp_seq=3 Destination Net Unreachable --- 172.16.2.237 ping statistics --- 3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2001ms

 

[root@slb101:Active:In Sync] config ping 172.16.3.238 PING 172.16.3.238 (172.16.3.238) 56(84) bytes of data. From 10.32.22.172 icmp_seq=1 Destination Net Unreachable From 10.32.22.172 icmp_seq=2 Destination Net Unreachable From 10.32.22.172 icmp_seq=3 Destination Net Unreachable --- 172.16.3.238 ping statistics --- 3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2001ms [root@slb101:Active:In Sync] config

 

[root@slb101:Active:In Sync] config traceroute 172.16.2.237 traceroute to 172.16.2.237 (172.16.2.237), 30 hops max, 40 byte packets 1 (10.32.22.172) 1.613 ms !N 1.839 ms !N 2.170 ms !N [root@slb101:Active:In Sync] config

 

[root@slb101:Active:In Sync] log tmsh list /sys management-route sys management-route default { gateway 172.20.129.174 network default } TCP socket open failed: Connection refused Socket error connecting to 127.0.0.1:6889 [root@slb101:Active:In Sync] log

 

[root@slb101:Active:In Sync] log tmsh list /sys management-ip sys management-ip 172.20.129.161/28 { } TCP socket open failed: Connection refused Socket error connecting to 127.0.0.1:6889 [root@slb101:Active:In Sync] log

 

I have gone through SOL7017 for this issue and as per that it is mentioned to restart the ntpd process as WA and so i did.

 

[root@slb101:Active:In Sync] log bigstart restart ntpd Shutting down ntpd: [ OK ] Starting ntpd: [ OK ]

 

[root@slb101:Active:In Sync] log [root@slb101:Active:In Sync] log ntpq -p

 

remote refid st t when poll reach delay offset jitter

172.16.2.237 .INIT. 16 u - 64 0 0.000 0.000 0.000 172.16.3.238 .INIT. 16 u - 64 0 0.000 0.000 0.000 [root@slb101:Active:In Sync] log

 

After restart we can see in the NTP status there is no prefix like */+ infront of IP indicates both Primary & secondary NTP servers are not reachable.

 

“when” field is showing “-“ which means system is not getting any response from NTP server.

 

What shall i do to fix this issue now ?

 

Thanks in advance Sathish

 

  • You said, Seems like after restart it uses other VLAN (say my internal LAN) rather than mgmt (eth0). In our shop we have always used a management route for NTP, if not it will use the TMM default gateway your ingress/egress interface which may or may have port lockdown enabled.

     

  • Seems like after restart it uses other VLAN (say my internal LAN) rather than mgmt (eth0).

     

    [root@slb101:Active:In Sync] log netstat -rn | tail -12

     

    0.0.0.0 10.32.22.174 0.0.0.0 UG 0 0 0 internal 0.0.0.0 172.20.129.174 0.0.0.0 UG 0 0 0 eth0

     

    172.20.129.160 0.0.0.0 255.255.255.240 U 0 0 0 eth0 172.20.129.160 0.0.0.0 255.255.255.240 U 0 0 0 eth0 [root@slb101:Active:In Sync] log

     

    As you can see, after reboot the system is trying to reach NTP server with Gw as "10.32.22.172" but it suppose to catch 172.20.129.174. So we have temporarily added a static route to fix as of now.

     

    eg: route add -net 172.16.2.237 netmask 255.255.255.255 gw 172.20.129.174 route add -net 172.16.3.238 netmask 255.255.255.255 gw 172.20.129.174

     

    And now it is synced properly with NTP server.

     

    [root@slb101:Active:In Sync] log ntpq -p

     

    remote refid st t when poll reach delay offset jitter

    *172.16.2.237 0.0.0.0 1 u 51 64 377 2.887 6.614 2.086 +172.16.3.238 0.0.0.0 1 u 39 64 377 10.444 5.905 2.102 [root@slb101:Active:In Sync] log

     

    But still, i know the system was very well working without these static route initially. Dont know why it took different GW after reboot.

     

    Anyways issue got fixed.

     

    Thanks Sathish

     

  • You said, Seems like after restart it uses other VLAN (say my internal LAN) rather than mgmt (eth0). In our shop we have always used a management route for NTP, if not it will use the TMM default gateway your ingress/egress interface which may or may have port lockdown enabled.

     

  • So in this case why server uses internal LAN rather than Mgmt Lan after reboot (as before reboot it used Mgmt Lan properly).

     

    any idea ?