F5 LTM and ASM Sentinel integration - works on one cluster, doesn't on another
Hi!
I have quite complex trouble with Sentinel integration.
I have 2 F5 clusters implemented as IaaS in Azure - Prod and PreProd with LTM logging took from this manual: https://my.f5.com/manage/s/article/K85539421 and ASM integration took from this manual: https://community.f5.com/t5/technical-articles/integrating-the-f5-bigip-with-azure-sentinel/ta-p/282868 (here only ASM part).
The thing is PreProd F5 Cluster sends the logs correctly while Prod does not.
The configuration is very similar for both clusters (with MGT interface, external, internal & HA via internal interface)
It has been reimplemented multiple times on Prod cluster, including 4 eyes check, focusing to keep the same config on working PreProd cluster.
Checking and rechecking again and again FW rules, NSGs - all should work.
PreProd is working, Prod is not...
Recently I started to take a look though logs, finding thousands of logs on Prod F5:
Fri, 23 Jun 2023 13:34:35 GMT - warning: [telemetry] Skipped Data - Category: "LTM" | Consumers: ["My_Consumer"] | Addtl Info: "event_timestamp": "2023-06-23T13:34:35.000Z"
In that moment I realized, we've had a problem with NTP, that was not working and after some TShoot we took in into backlog (probably for too long time).
So - NTP can't sync to time.windows.com by url or IP (other time servers also do not work).
I started to TShoot this thread.
it seems NTP service is running correctly (yet I restarted it) - no change:
# tmsh show /sys service ntpd
* ntpd.service - start and stop ntpd
Loaded: loaded (/etc/rc.d/init.d/ntpd; enabled; vendor preset: enabled)
Active: active (running) since Mon 2023-06-26 16:53:06 CEST; 17s ago
Process: 25697 ExecStop=/etc/rc.d/init.d/ntpd stop (code=exited, status=0/SUCCESS)
Process: 25762 ExecStart=/etc/rc.d/init.d/ntpd start (code=exited, status=0/SUCCESS)
CGroup: /system.slice/ntpd.service
`-25766 ntpd -g
What is a bit strange - NTP listens only on IPv6(?)
Jun 26 16:53:06 bigip-f5-bigip1.local ntpd[25766]: Listen normally on 18 mgmt fe80::222:48ff:fe80:cdf4 UDP 123
Jun 26 16:53:06 bigip-f5-bigip1.local ntpd[25766]: Listen normally on 19 eth0 fe80::222:48ff:fe80:cdf4 UDP 123
Jun 26 16:53:06 bigip-f5-bigip1.local ntpd[25766]: Listen normally on 20 tmm fc00:f5::1 UDP 123
Jun 26 16:53:06 bigip-f5-bigip1.local ntpd[25766]: Listen normally on 21 eth4 fe80::6245:bdff:fe8e:24ab UDP 123
Jun 26 16:53:06 bigip-f5-bigip1.local ntpd[25766]: Listen normally on 22 eth1 fe80::222:48ff:fe80:abc4 UDP 123
Jun 26 16:53:06 bigip-f5-bigip1.local ntpd[25766]: Listen normally on 23 external fe80::222:48ff:fe80:abc4 UDP 123
Jun 26 16:53:06 bigip-f5-bigip1.local ntpd[25766]: Listen normally on 24 dev_internal fe80::6245:bdff:fe8e:24ab UDP 123
Jun 26 16:53:06 bigip-f5-bigip1.local ntpd[25766]: Listening on routing socket on fd #41 for interface updates
Jun 26 16:53:06 bigip-f5-bigip1.local ntpd[25766]: 0.0.0.0 c016 06 restart
Jun 26 16:53:06 bigip-f5-bigip1.local ntpd[25766]: 0.0.0.0 c012 02 freq_set kernel -10.616 PPM
All NTPs service trying to sync have INIT status
# ntpq -np
remote refid st t when poll reach delay offset jitter
==============================================================================
20.101.57.9 .INIT. 16 u - 64 0 0.000 0.000 0.000
I set time manually on all F5s (didn't solve the problem with Sentinel. PreProd works, Prod doesn't
tcpdump shows that my F5s try to reach NTP server with TMM external interface, which is wrong.
# tcpdump -i any host 20.101.57.9 and port 123 -vv
18:04:16.677266 IP (tos 0xc0, ttl 64, id 54613, offset 0, flags [DF], proto UDP (17), length 76)
10.10.1.4.123 > 20.101.57.9.123: [bad udp cksum 0x64fd -> 0x380e!] NTPv4, length 48
Client, Leap indicator: clock unsynchronized (192), Stratum 0 (unspecified), poll 10 (1024s), precision 32
Root Delay: 0.000000, Root dispersion: 0.054641, Reference-ID: (unspec)
Reference Timestamp: 0.000000000
Originator Timestamp: 0.000000000
Receive Timestamp: 0.000000000
Transmit Timestamp: 965835954.533013659 (2066/09/16 00:14:10)
Originator - Receive Timestamp: 0.000000000
Originator - Transmit Timestamp: 965835954.533013659 (2066/09/16 00:14:10) out slot1/tmm0 lis= port=1.1 trunk=
According to https://my.f5.com/manage/s/article/K92145845 it should use MGT interface and written there cause is a lack of MGT route. But this is not my example as I do have MGT routes set correctly (I guess).
Below there is one route towards Azure service endpoint and 2nd is default route for MGT:
# tmsh list /sys management-route
sys management-route azure-metadata {
gateway 10.0.0.1
network 169.254.169.254/32
}
sys management-route default {
gateway 10.0.0.1
network default
}
After this moment I started to get confused, what might be the problem and is it really NTP related.
Any ideas?