application health monitoring
38 TopicsSNMP CPU Usage
I'm curious what the difference between these two numbers is? sysGlobalHostCpuUsageRatio (uses a 5-second polling interval) sysGlobalHostCpuUsageRatio5s (also uses a 5-second polling interval) I know what I'm getting with the latter, but I'm not sure what the first one is actually reporting. Thanks.Solved1.3KViews0likes4CommentsReceive String Not Reconized By Custom HTTP Health Check Monitor
Hi, We're using a custom HTTP monitor for our application that hits a specific URL and checks the response to make sure that it is correct. The F5 is not recognizing the Receive string and thus is marking the node as down. Here is the send string: GET /syshc/health HTTP/1.1\r\nUser-Agent:BigIP Prober\r\nHost: \r\nConnection: close\r\n\r\n Here is the receive string: SystemMonitortrue The URL above either returns SystemMonitortrue if the application is working or SystemMonitorfalse if is not. I have tried HTTP version 1.0 and I have set the host as either a existing host, or as a dummy host, but no luck. The monitor log file states "Response did not match recv regex yet" From the F5, I have done these tests in an attempt to figure out what is wrong, and I get different responses based on whether I use curl or telnet or netcat: F5-LTM>echo -e "GET /syshc/health HTTP/1.1\r\nUser-Agent:BigIP Prober\r\nHost: \r\nConnection: close\r\n\r\n" | nc 8080 HTTP/1.1 200 OK Server: Apache-Coyote/1.1 X-OneAgent-JS-Injection: true Set-Cookie: dtCookie=2$215E59177B4BD85717C0665144C1D564; Path=/ Content-Type: text/plain;charset=UTF-8 Content-Length: 40 Date: Mon, 10 Dec 2018 17:51:59 GMT Connection: close SystemMonitorprotocol = http host = null (NOTE no true or false) If I telnet I get this: F5-LTM>telnet 8080 Trying ... Connected to . Escape character is '^]'. GET /syshc/health HTTP/1.1 User-Agent:BigIP Prober HOST: Connection: Close HTTP/1.1 200 OK Server: Apache-Coyote/1.1 X-OneAgent-JS-Injection: true Set-Cookie: dtCookie=2$D90B430DD2CA1434D3DD98267CAF0952; Path=/ Content-Type: text/plain;charset=UTF-8 Content-Length: 40 Date: Mon, 10 Dec 2018 18:52:43 GMT Connection: close SystemMonitorprotocol = http host = null (no "true" or "false") However, if I use curl, I get the correct result: F5-LTM>curl :8080/syshc/health SystemMonitortrue <-- correct response! In several browsers, Chrome, Firefox, Safari and IE, the response is "SystemMonitortrue" which is the correct response. I have scoured the following articles: https://support.f5.com/csp/article/K2167 https://support.f5.com/csp/article/K13397 https://support.f5.com/csp/article/K5917 https://support.f5.com/csp/article/K3224 What am I missing here?999Views0likes3CommentsInterval and Timeout set to two seconds on HTTP/HTTPS health monitor
Hi all, I currently have a need from upper management to shorten the length of time a user will need to wait if they're connected to our site and tomcat goes down on one of our backend servers that they're connected to. In order to achieve a wait time of no longer than 10 seconds before their web page is loaded on another server in the pool we set the interval and timeout values on the health monitor to 2 seconds each. Originally I had it set to 5 and 16, then 3 and 10, but that wasn't giving us a low enough wait time before a webpage comes back. So, testing our values at 2 and 2 gave us the results we wanted but I want to make sure there are no "gotchas" as far as on the F5. There's no reason why one of our servers wouldn't be able to respond back within 2 seconds and sending that check every 2 seconds isn't a concern as far as network load. The only thing I can think of that might be a concern is the ability of the server to respond to the string below. As far as I know that string is only making sure http and https are responsive, it's not asking for a webpage or something that would take more time/resources. Have any of you had success/issues with setting a monitor interval and timeout to such a low value and what effects did you see? For reference we're running version 12.1.0 and the monitor string is below: GET / HTTPS/1.1\r\nHost: \r\nConnection: Close\r\n\r\n HTTP/1.1\r\nHost: \r\nConnection: Close\r\n\r\n798Views0likes3Commentsmonitor timeouts vs actual behaviour
Hi guys, the default http health monitor (v10.2.4) polls on a 5 second interval, timeout of 16 seconds. To me, this says that every 5 seconds a monitor will fire, should no monitor be successful for 16 seconds then the pool mmeber is down. Yet this really REALLY doesn't match what happens on the network to a huge extent: pool blah_pool { monitor all http members 1.2.3.4:1234 {} } ` a tcpdump shows: `11:01:36.761159 IP 10.101.131.4.35514 > 1.2.3.4.1234: S 11:03:13.742647 IP 10.101.131.4.46160 > 1.2.3.4.1234: S 11:03:16.742445 IP 10.101.131.4.46160 > 1.2.3.4.1234: S 11:03:22.742838 IP 10.101.131.4.46160 > 1.2.3.4.1234: S 11:03:34.741285 IP 10.101.131.4.46160 > 1.2.3.4.1234: S 11:03:58.740435 IP 10.101.131.4.46160 > 1.2.3.4.1234: S 11:04:46.736725 IP 10.101.131.4.46160 > 1.2.3.4.1234: S 11:06:23.738147 IP 10.101.131.4.48428 > 1.2.3.4.1234: S 11:06:26.737763 IP 10.101.131.4.48428 > 1.2.3.4.1234: S 11:06:32.737102 IP 10.101.131.4.48428 > 1.2.3.4.1234: S 11:06:44.735753 IP 10.101.131.4.48428 > 1.2.3.4.1234: S so we have only one single TCP attempt at one time, so not every 5 seconds, and whilst the monitor will mark a node down after 16 seconds still, the tcp connection is still going to try to continue until the tcp/ip stack times it out. So once it's down after 16 seconds it's still got a huge wait before it tries again, no new connection will try to connect until the single current one finishes. so if, for some (presumably pretty stupid) reason the specific connection is not being replied to, maybe a weird FW rule or IPS action) LTM won't be able to check status on a new connection for three minutes and 10 seconds. I've also seen equivalent behaviour with an http GET just not being replied to, again having to wait until the TCP connection is reset, or the webserver finally responds well, well after the "timeout" period has expired before the monitor will fire again. Testing just now, I see the HTTP monitor just crudely stuffing additional GET's down the same connection that's still waiting for a response, what's that all about?? I can't make any sense of this, and, TBH, has gone right against all the things I've designed for, sticking to the 3n+1 rule etc. What merit does 3n+1 have in this sort of situation? I see no logic in it at all if additional monitors can't run in parallel. Who would want to be forwarding to a web server that is routinely taking, say, 15 seconds to reply (3n+1 - 1s) when all the other members in a pool take 0.01s to serve the same gif file? Shouldn't a timeout actually always be something like 4 seconds (to at least give time for 2 SYN's to hit the back end? Even in that case though, I'm still stuffed until the next connection is allowed to be attempted. Any thoughts on this would be appreciated!599Views0likes4CommentsIssue with external monitor using curl on ntlm site
I need to create a monitor for our share point environment. I first tried with the built-in HTTPS monitor, but it gave a 401 error. After some investigation it seems there is an issue if the service is using NTLM and I was recommended to use an external monitor. With the information I found here I created the following script: !/bin/sh This script expects the following Name/Value pairs: URI = the URI to check USER = username PASSWORD = password RECV = the expected response (case sensitive) remove IPv&/IPv4 compatibility prefix (LTM passes addresses in IPv6 format) IP=`echo ${1} | sed 's/::ffff://'` PORT=${2} PIDFILE="/var/run/`basename ${0}`.${IP}_${PORT}.pid" kill of the last instance of this monitor if hung and log current pid if [ -f $PIDFILE ] then echo "EAV exceeded runtime needed to kill ${IP}:${PORT} $PIDFILE" | logger -p local0.error kill -9 `cat $PIDFILE` > /dev/null 2>&1 fi echo "$$" > $PIDFILE send request and check for expected response if [ $PORT -eq 443 ] then curl -kfNS --ntlm --user ${USER}:${PASSWORD} https://${IP}${URI} | grep "${RECV}" 2>&1 > /dev/null else curl -kfNS --ntlm --user ${USER}:${PASSWORD} http://${IP}:${PORT}${URI} | grep "${RECV}" 2>&1 > /dev/null fi mark node UP if expected response was received if [ $? -eq 0 ] then Remove the PID file rm -f $PIDFILE echo "UP" else Remove the PID file rm -f $PIDFILE fi exit Currently only using 443 so it on it will only use the first curl command, but wanted it able to handle both and I had some issue when I tried to use the second command. The curl command works fine from the F5 cli and if I use "run /util test-monitor intranet_sharepoint_monitor_ext address 10.xxx.xxx.xxx port 443" from tmsh it correctly marks the nodes as up or down. If do the test from the LTM Monitor I get "No successful responses received before deadline" if I try it on the share point that uses NTLM, if I try it on another share point that doesn't use NTLM it works fine. Why would it work with the test-monitor command but not with the actual monitor in the GUI, shouldn't they be the same?! Anyone have any suggestions on what I could do to solve this? Have checked so it doesn't contain any windows characters and I have checked so that file located in /config/filestore/files_d/Common_d/external_monitor_d/ has the correct permissions.531Views0likes1CommentMultiple port monitoring on LTM
I need help to write an iRule which helps to monitor three different service ports running on member server. I have a web-server which accept connections on port 5555 but internally this service depends on service port 8024 or 8026. F5 should monitor all these three ports but bring down node when both 8024 and 8026 stop responding. Member web server should remain UP till either of service port 8024 or 8026 responding well. Same time I want active monitoring on port 5555 which means if port 5555 stop responding, member server status should be Down irrespective status of port 8024 and port 8026. Port 5555 support https and rest two ports HTTP. Thanks, Mihir499Views0likes2CommentsForwarded health check
Hi All, I have application configured in the way: Users -> f5 -> server[1,2]_apache -> server[1,2]_app . On load balancer there is pool with nodes: server1_apache:port server2_apache:port There is standard http monitor assigned to the pool. The server_apache is doing only forwarding of requests. Nodes cannot be changed to server_app due some firewall settings. It makes some issues, in a case that application is down the apache still works fine and requests are send to not working application server. Is it possible to setup monitor for the pool that it will check server_app through server_apache? For example check could monitor http response from server1_apache:port/app_page which will return data from server_app. ThanksSolved459Views0likes5Commentshttps_443 marking a host as down after 302 response code
Does the https_443 monitor look for error responses (4xx, 5xx) or does it look for a 200 response code? I'm seeing a node marked as down that's returning a 302 redirect, and I'm not sure if the monitor is flagging the 302 as a problem or if there's something else wrong. Thanks!455Views0likes3Comments