Forum Discussion
Pool member down due to no Ping response
Hello,
Below is the Monitor log for a pool member which shows down. The pool member is another Vserver which is active and gives a successful ping when I do it directly from cmd line.
The log shows that the ping is failing. Trying to understand why the monitor log shows ping failing when the direct ping from cmd line gives successful result.
[0][23691] 2018-09-24 17:20:49.032242: ID 2278 :(_do_ping): time to ping, now=[1537827649.032066], status=DOWN [ addr=::ffff:ip:port mon=/Common/abcd-https-monitor fd=-1 pend=0 conn=0 up_intvl=5 dn_intvl=5 timeout=16 time_until_up=0 immed=0 next_ping=[1537827649.028548][2018-09-24 17:20:49] last_ping=[1537827644.062996][2018-09-24 17:20:44] deadline=[1537827650.078785][2018-09-24 17:20:50] on_service_list=True snd_cnt=6 rcv_cnt=0 ]
- rluhrman_127985Historic F5 Account
The logs is misleading. Ping really means request... if you do the same for an HTTP monitor it doesn't state "Request Sent" is states (_do_ping) etc.
Can you do a "tmsh list ltm monitor (monitor name)" for output for the monitor that is failing?
Also try using either telnet or socat to connect to the IP:PORT combination instead of using ICMP, which goes as far as layer 3, while your monitor should test all the way to layer 7.
- abhy201
Nimbostratus
Thank you for checking. Below is the output for the list ltm monitor.
ltm monitor https monitor_name-https-monitor { adaptive disabled cert /Common/client_cert-co.crt cipherlist DEFAULT:+SHA:+3DES:+kEDH compatibility enabled defaults-from https destination : interval 5 ip-dscp 0 key /Common/client_cert-co.key recv "HTTP/1.(0|1) (200|301|302|404)" recv-disable none send "HEAD / HTTP/1.0\r\n\r\n" time-until-up 0 timeout 16 }
Also adding a detailed stack of the monitor log.
[1][23692] 2018-09-24 17:20:47.467516: ID 2278 :(inst_to_service) Logging enabled. [ addr=::ffff:IP_abcd:Port srcaddr=none ] [0][23691] 2018-09-24 17:20:47.467517: ID 2278 :(inst_to_service) Logging enabled. [ addr=::ffff:IP_abcd:Port srcaddr=::ffff:IP_abcd:Port ] [0][23691] 2018-09-24 17:20:49.032073: ID 2278 :(_ssl_shutdown_service): shutting down, return ssl true [ addr=::ffff:IP_abcd:Port srcaddr=::ffff:IP_abcd:Port mon=/Common/monitor_name-https-monitor fd=14 ] [0][23691] 2018-09-24 17:20:49.032242: ID 2278 :(_do_ping): time to ping, now=[1537827649.032066][2018-09-24 17:20:49], status=DOWN [ addr=::ffff:IP_abcd:Port mon=/Common/monitor_name-https-monitor fd=-1 pend=0 conn=0 up_intvl=5 dn_intvl=5 timeout=16 time_until_up=0 immed=0 next_ping=[1537827649.028548][2018-09-24 17:20:49] last_ping=[1537827644.062996][2018-09-24 17:20:44] deadline=[1537827650.078785][2018-09-24 17:20:50] on_service_list=True snd_cnt=6 rcv_cnt=0 ] [0][23691] 2018-09-24 17:20:49.032267: ID 2278 :(_send_active_service_ping): pinging [ addr=::ffff:IP_abcd:Port srcaddr=none ] [0][23691] 2018-09-24 17:20:49.032273: ID 2278 :(_connect_to_service): creating new socket (rd0) [ addr=::ffff:IP_abcd:Port ] [0][23691] 2018-09-24 17:20:49.032298: ID 2278 :(_connect_to_service): connect: Operation now in progress [ addr=::ffff:IP_abcd:Port srcaddr=::ffff:IP_abcd:Port ] [0][23691] 2018-09-24 17:20:49.032326: ID 2278 :(_do_ping): post ping, status=DOWN [ addr=::ffff:IP_abcd:Port mon=/Common/monitor_name-https-monitor fd=15 pend=1 conn=1 up_intvl=5 dn_intvl=5 timeout=16 time_until_up=0 immed=0 next_ping=[1537827654.028548][2018-09-24 17:20:54] last_ping=[1537827649.032066][2018-09-24 17:20:49] deadline=[1537827650.078785][2018-09-24 17:20:50] on_service_list=True snd_cnt=7 rcv_cnt=0 ] [0][23691] 2018-09-24 17:20:49.036759: ID 2278 :(_main_loop): Activity on pending service, now=[1537827649.036433][2018-09-24 17:20:49] [ addr=::ffff:IP_abcd:Port srcaddr=::ffff:IP_abcd:Port fd=15 pend=1 conn=1 ] [0][23691] 2018-09-24 17:20:49.036772: ID 2278 :(_send_active_service_ping): pinging [ addr=::ffff:IP_abcd:Port srcaddr=::ffff:IP_abcd:Port ] [0][23691] 2018-09-24 17:20:49.036787: ID 2278 :(_send_active_service_ping): writing [ addr=::ffff:IP_abcd:Port srcaddr=::ffff:IP_abcd:Port ] send=HEAD / HTTP/1.0\x0d\x0a\x0d\x0a
[0][23691] 2018-09-24 17:20:49.036796: ID 2278 :(do_ssl_write): incoming state: 0 [ addr=::ffff:IP_abcd:Port srcaddr=::ffff:IP_abcd:Port ] [0][23691] 2018-09-24 17:20:49.036817: ID 2278 :(do_ssl_write) state: INIT [ addr=::ffff:IP_abcd:Port srcaddr=::ffff:IP_abcd:Port ] [0][23691] 2018-09-24 17:20:49.036825: ID 2278 :(initialize_ssl) legacy: false, cipher: 'DEFAULT:+SHA:+3DES:+kEDH', compat: true [ addr=::ffff:IP_abcd:Port srcaddr=::ffff:IP_abcd:Port ] [0][23691] 2018-09-24 17:20:49.037040: ID 2278 :(do_ssl_write) state: CONNECTING, legacy mode: false [ addr=::ffff:IP_abcd:Port srcaddr=::ffff:IP_abcd:Port ] [0][23691] 2018-09-24 17:20:49.037076: ID 2278 :(do_ssl_write): state: 4 [ addr=::ffff:IP_abcd:Port srcaddr=::ffff:IP_abcd:Port ] [0][23691] 2018-09-24 17:20:49.040714: ID 2278 :(_main_loop): Service ready for read, now=[1537827649.040704][2018-09-24 17:20:49] [ addr=::ffff:IP_abcd:Port srcaddr=::ffff:IP_abcd:Port fd=15 pend=0 conn=0 ] [0][23691] 2018-09-24 17:20:49.040727: ID 2278 :(_recv_active_service_ping): reading [ addr=::ffff:IP_abcd:Port srcaddr=::ffff:IP_abcd:Port ] [0][23691] 2018-09-24 17:20:49.040734: ID 2278 :(do_ssl_read) legacy mode: false [ addr=::ffff:IP_abcd:Port srcaddr=::ffff:IP_abcd:Port ] [0][23691] 2018-09-24 17:20:49.040741: ID 2278 :(do_ssl_read): state: 4 [ addr=::ffff:IP_abcd:Port srcaddr=::ffff:IP_abcd:Port ] [0][23691] 2018-09-24 17:20:49.040748: ID 2278 :(_send_active_service_ping): pinging [ addr=::ffff:IP_abcd:Port srcaddr=::ffff:IP_abcd:Port ] [0][23691] 2018-09-24 17:20:49.040756: ID 2278 :(_send_active_service_ping): writing [ addr=::ffff:IP_abcd:Port srcaddr=::ffff:IP_abcd:Port ] send=HEAD / HTTP/1.0\x0d\x0a\x0d\x0a
- abhy201
Nimbostratus
And also the telnet to the pool member is successful.
- rluhrman_127985Historic F5 Account
A tcpdump of the interactions between bigd (the daemon that does the health check communication) and the node would help as the communication can technically fail at multiple levels of the OSI model.
I noticed that your receive string would accept a 404 as a valid response. Usually that would indicate that a resource being queried is not available, but the monitor would mark it up even if the resource was not found.
For the tcpdumps, looking at both the traffic between Bigd and TMM, and TMM to the node may help to determine what is happening.
For the traffic between tmm and bigd, use "tcpdump -ni :nnnh -s0 -w /var/tmp/$(hostname)_<vlan_name>.pcap host (ip address of node> and port
For the traffic between tmm and the node use "tcpdump -ni 0.0:nnn -s0 -w /var/tmp/$(hostname)_tmm.pcap host (ip address of node> and port
Run those concurrently.
You should open a support ticket for the tcpdump analysis by a Network Support Engineer. Be sure the generate a qkview after running the tcpdumps and provide that as well.
Recent Discussions
Related Content
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com