application health monitoring

38 Topics

External/Scripted monitor fails
Hi, I need to have a monitor that do a curl on a url witch returns a specific value. I could have done this with the regular http monitor but I need to insert a random number into the URI to avoid caching in the internal proxy. I have tried to import the script and create an external monitor but without any results. I'm checking the access log on the apache server and I can not see any requests. I have also created a "script" monitor where I put the script in the /config/eav/ directory. In the script monitor I can set it in the debug mode and I get the following in the log. ********** Debugging session beginning at: Tue Sep 16 21:57:08 2014 Arguments 1-2: ::ffff:192.168.20.1 80 Environment variables: DEBUG=yes FILENAME=test MON_TMPL_NAME=/Common/test NODE_IP=::ffff:192.168.20.1 NODE_PORT=80 -- !/bin/sh invalid line from file /config/eav/test: '!/bin/sh' If I run the script manually it works as expected, but not when the monitor is executing the script. I have 755 as access: -rwxr-xr-x 1 root root 2267 Sep 16 21:51 test Regards Andréas
forsan_102218
Jun 05, 2023 Place Technical Forum
226Views
0likes
2Comments
monitor timeouts vs actual behaviour
Hi guys, the default http health monitor (v10.2.4) polls on a 5 second interval, timeout of 16 seconds. To me, this says that every 5 seconds a monitor will fire, should no monitor be successful for 16 seconds then the pool mmeber is down. Yet this really REALLY doesn't match what happens on the network to a huge extent: pool blah_pool { monitor all http members 1.2.3.4:1234 {} } ` a tcpdump shows: `11:01:36.761159 IP 10.101.131.4.35514 > 1.2.3.4.1234: S 11:03:13.742647 IP 10.101.131.4.46160 > 1.2.3.4.1234: S 11:03:16.742445 IP 10.101.131.4.46160 > 1.2.3.4.1234: S 11:03:22.742838 IP 10.101.131.4.46160 > 1.2.3.4.1234: S 11:03:34.741285 IP 10.101.131.4.46160 > 1.2.3.4.1234: S 11:03:58.740435 IP 10.101.131.4.46160 > 1.2.3.4.1234: S 11:04:46.736725 IP 10.101.131.4.46160 > 1.2.3.4.1234: S 11:06:23.738147 IP 10.101.131.4.48428 > 1.2.3.4.1234: S 11:06:26.737763 IP 10.101.131.4.48428 > 1.2.3.4.1234: S 11:06:32.737102 IP 10.101.131.4.48428 > 1.2.3.4.1234: S 11:06:44.735753 IP 10.101.131.4.48428 > 1.2.3.4.1234: S so we have only one single TCP attempt at one time, so not every 5 seconds, and whilst the monitor will mark a node down after 16 seconds still, the tcp connection is still going to try to continue until the tcp/ip stack times it out. So once it's down after 16 seconds it's still got a huge wait before it tries again, no new connection will try to connect until the single current one finishes. so if, for some (presumably pretty stupid) reason the specific connection is not being replied to, maybe a weird FW rule or IPS action) LTM won't be able to check status on a new connection for three minutes and 10 seconds. I've also seen equivalent behaviour with an http GET just not being replied to, again having to wait until the TCP connection is reset, or the webserver finally responds well, well after the "timeout" period has expired before the monitor will fire again. Testing just now, I see the HTTP monitor just crudely stuffing additional GET's down the same connection that's still waiting for a response, what's that all about?? I can't make any sense of this, and, TBH, has gone right against all the things I've designed for, sticking to the 3n+1 rule etc. What merit does 3n+1 have in this sort of situation? I see no logic in it at all if additional monitors can't run in parallel. Who would want to be forwarding to a web server that is routinely taking, say, 15 seconds to reply (3n+1 - 1s) when all the other members in a pool take 0.01s to serve the same gif file? Shouldn't a timeout actually always be something like 4 seconds (to at least give time for 2 SYN's to hit the back end? Even in that case though, I'm still stuffed until the next connection is allowed to be attempted. Any thoughts on this would be appreciated!
Chris_Phillips
Jun 02, 2023 Place Technical Forum
600Views
0likes
4Comments
Issue with external monitor using curl on ntlm site
I need to create a monitor for our share point environment. I first tried with the built-in HTTPS monitor, but it gave a 401 error. After some investigation it seems there is an issue if the service is using NTLM and I was recommended to use an external monitor. With the information I found here I created the following script: !/bin/sh This script expects the following Name/Value pairs: URI = the URI to check USER = username PASSWORD = password RECV = the expected response (case sensitive) remove IPv&/IPv4 compatibility prefix (LTM passes addresses in IPv6 format) IP=`echo ${1} | sed 's/::ffff://'` PORT=${2} PIDFILE="/var/run/`basename ${0}`.${IP}_${PORT}.pid" kill of the last instance of this monitor if hung and log current pid if [ -f $PIDFILE ] then echo "EAV exceeded runtime needed to kill ${IP}:${PORT} $PIDFILE" | logger -p local0.error kill -9 `cat $PIDFILE` > /dev/null 2>&1 fi echo "$$" > $PIDFILE send request and check for expected response if [ $PORT -eq 443 ] then curl -kfNS --ntlm --user ${USER}:${PASSWORD} https://${IP}${URI} | grep "${RECV}" 2>&1 > /dev/null else curl -kfNS --ntlm --user ${USER}:${PASSWORD} http://${IP}:${PORT}${URI} | grep "${RECV}" 2>&1 > /dev/null fi mark node UP if expected response was received if [ $? -eq 0 ] then Remove the PID file rm -f $PIDFILE echo "UP" else Remove the PID file rm -f $PIDFILE fi exit Currently only using 443 so it on it will only use the first curl command, but wanted it able to handle both and I had some issue when I tried to use the second command. The curl command works fine from the F5 cli and if I use "run /util test-monitor intranet_sharepoint_monitor_ext address 10.xxx.xxx.xxx port 443" from tmsh it correctly marks the nodes as up or down. If do the test from the LTM Monitor I get "No successful responses received before deadline" if I try it on the share point that uses NTLM, if I try it on another share point that doesn't use NTLM it works fine. Why would it work with the test-monitor command but not with the actual monitor in the GUI, shouldn't they be the same?! Anyone have any suggestions on what I could do to solve this? Have checked so it doesn't contain any windows characters and I have checked so that file located in /config/filestore/files_d/Common_d/external_monitor_d/ has the correct permissions.
FredrikP
Jun 02, 2023 Place Technical Forum
564Views
0likes
1Comment
Receive String Not Reconized By Custom HTTP Health Check Monitor
Hi, We're using a custom HTTP monitor for our application that hits a specific URL and checks the response to make sure that it is correct. The F5 is not recognizing the Receive string and thus is marking the node as down. Here is the send string: GET /syshc/health HTTP/1.1\r\nUser-Agent:BigIP Prober\r\nHost: \r\nConnection: close\r\n\r\n Here is the receive string: SystemMonitortrue The URL above either returns SystemMonitortrue if the application is working or SystemMonitorfalse if is not. I have tried HTTP version 1.0 and I have set the host as either a existing host, or as a dummy host, but no luck. The monitor log file states "Response did not match recv regex yet" From the F5, I have done these tests in an attempt to figure out what is wrong, and I get different responses based on whether I use curl or telnet or netcat: F5-LTM>echo -e "GET /syshc/health HTTP/1.1\r\nUser-Agent:BigIP Prober\r\nHost: \r\nConnection: close\r\n\r\n" | nc 8080 HTTP/1.1 200 OK Server: Apache-Coyote/1.1 X-OneAgent-JS-Injection: true Set-Cookie: dtCookie=2$215E59177B4BD85717C0665144C1D564; Path=/ Content-Type: text/plain;charset=UTF-8 Content-Length: 40 Date: Mon, 10 Dec 2018 17:51:59 GMT Connection: close SystemMonitorprotocol = http host = null (NOTE no true or false) If I telnet I get this: F5-LTM>telnet 8080 Trying ... Connected to . Escape character is '^]'. GET /syshc/health HTTP/1.1 User-Agent:BigIP Prober HOST: Connection: Close HTTP/1.1 200 OK Server: Apache-Coyote/1.1 X-OneAgent-JS-Injection: true Set-Cookie: dtCookie=2$D90B430DD2CA1434D3DD98267CAF0952; Path=/ Content-Type: text/plain;charset=UTF-8 Content-Length: 40 Date: Mon, 10 Dec 2018 18:52:43 GMT Connection: close SystemMonitorprotocol = http host = null (no "true" or "false") However, if I use curl, I get the correct result: F5-LTM>curl :8080/syshc/health SystemMonitortrue <-- correct response! In several browsers, Chrome, Firefox, Safari and IE, the response is "SystemMonitortrue" which is the correct response. I have scoured the following articles: https://support.f5.com/csp/article/K2167 https://support.f5.com/csp/article/K13397 https://support.f5.com/csp/article/K5917 https://support.f5.com/csp/article/K3224 What am I missing here?
Ken_04_163875
Jun 01, 2023 Place Technical Forum
1KViews
0likes
3Comments
LTM Monitor Receive String
Hello, my ltm monitor receive string doesn't work. Problem is that i am only interested in one line. Here is the whole output of the receive sting: { "returnCode": 0, "version": "9.0.0.37265", "product": "match", "command": "list project", "errorMsg": "OK", "projectList": [ { "projectId": "4711", "author": "Sam Sample", "description": "Test", "title": "Example", "configurationTime": "Date:Time", "state": "RUNNING" } ] } For me only the line with: "state": "RUNNING" is interesting. When this is changing pool member should be go offline and no traffic will be forwarded. I have tested with differnet wildcards e.g.: .\"state\":\"RUNNING\". But it doesn't work. Thankfull for any help. Cheers, Nikolas
nikzin_341815
Apr 03, 2019 Place Technical Forum
416Views
0likes
2Comments
iRule or monitor to test website login?
I want to create an iRule or a Monitor (whatever is simplest) to test login of a webpage and then look for an element on the page after login. I have very limited experience with either iRules or Monitors. My monitor experience is limited to the gui. Can anybody point me in the right direction? I've read a lot today on DevCentral but nothing I've found has been much help. Thanks in advance for any guidance!
bhobson2000_114
Mar 25, 2019 Place Technical Forum
341Views
0likes
1Comment
Health monitor vs. forced offline on a pool node.
What happens to the existing connections to a pool node when a health monitor marks a pool node as unavailable? Do the connections get immediately dropped or does the LTM have the same behavior as if you forced the node offline?
Mike_Galehouse
Mar 08, 2019 Place Technical Forum
428Views
0likes
2Comments
host value in HTML monitor
HI all, we a production environment which need to have maintenance once in a while. So the apps team created two clusters that are being mirrored, What they want to do is to utilize the F5 to switch traffic from one cluster to the other. I dont want to be bothered by them so I thought i could write a custom http monitor that checks a simple html page and based on the value received from that page the node gets enabled or disabled. This webpage is on each node. I checked http monitor and there is no way to add a host variable? Does that mean i have to write 4 individual monitors? Other suggestion are welcome :) Thanks! setup: 1 virtual server with 4 nodes 2 are enabled 2 are disabled -> based on the content of the webpage
sjaakie_85264
Nov 26, 2018 Place Technical Forum
298Views
0likes
1Comment
All health monitors UNKNOWN
This weekend one of our LTMs had All health monitors go un UNKNOWN, then reset. I'm not sure why or if it caused issues, had anyone seen this before? F5 Health Monitor /Common/api_monitor on ssweb-3 changed its status to Unknown on ALPGSDLB01
cshannahan_3137
Sep 24, 2018 Place Technical Forum
303Views
0likes
0Comments
Help with POST Content Length HTTP Monitor
Hello, this is what I have for a POST. When the content length is 0 I get a 200 back. When it's something else I get 415 unsupported media type. I'm not sure if there's something wrong with the way it's written or if I just have the wrong content length in there. The application guys are stating the 0 is causing some weird errors in the logs so I'm trying to get a 200 back with the proper length. POST /api/ChunkingService/CacheChunk HTTP/1.1\r\nHost: obcs.uat.alc.ca\r\nConnection: /\r\nContent-Length: 0\r\nContent-Type: application\json\r\n\r\n{\"TerminalId\":\"LBTest\",\"Value\":\"_\",\"RetailerId\":99999999}
cshannahan_3137
Sep 11, 2018 Place Technical Forum
369Views
0likes
2Comments