Forum Discussion
Roman_80473
Nov 17, 2011Nimbostratus
Detect if node is half-dead with an iRule?
Hi folks,
I was tasked to monitor app servers in the pool with an iRule (LTM 10.2). I wrote a simple rule which does the following:
if I get into LB_FAILED, I take the node out and resend request
if I get into HTTP_RESPONSE, and http status >= 500, I take the node out and resend request
It only seems to be working when the nodes are either fine or completely dead. Otherwise, (server ran out of memory), request gets into LB_SELECTED, and sits there forever. I get "The connection to the server was reset while the page was loading" error in the browser after about a minute or two, but my iRule never kicks in.
Is there a way to detect that the node is "half-dead" with iRule? Or, there're some external configuration in the VIP, pool, etc?
Any help is greatly appreciated
Thanks, Roman
- hooleylistCirrostratusHi Roman,
- Roman_80473NimbostratusHi Aaron,
- hooleylistCirrostratusYou set the timer in HTTP_REQUEST and then cancel it in HTTP_RESPONSE if a response is received within the timeout. You should be able to just modify the TCP::respond command to do whatever action you want (disable the pool member, select a new one, etc), but leave the events as they are.
- nitassEmployeepool foo accepts 3-ways handshake but not responding any http request. i tried to re-select pool foo2 instead but i got - Address in use tcl error message (and of course, bigip sent reset to both sides).
[root@ve1023:Active] config b virtual bar list virtual bar { snat automap pool foo destination 172.28.19.79:80 ip protocol 6 rules myrule profiles { http {} tcp {} } } [root@ve1023:Active] config b pool foo list pool foo { members 200.200.200.101:88 {} } [root@ve1023:Active] config b pool foo2 list pool foo2 { members 200.200.200.101:80 {} } [root@ve1023:Active] config b rule myrule list rule myrule { when RULE_INIT { set static::response_timeout 5 } when HTTP_REQUEST { log local0. "Received request, beginning response monitor interval. [clock seconds]" } when LB_SELECTED { set monitor_id [\ after $static::response_timeout { LB::reselect pool foo2 log local0. "Timeout $static::response_timeout milliseconds elapsed without server response. [clock seconds]" }\ ] } when HTTP_RESPONSE { log local0. "Received server response." if {[info exists monitor_id]} { log local0. "Canceling after script with id $monitor_id" after cancel $monitor_id } } } curl -i http://172.28.19.79 curl: (52) Empty reply from server Nov 18 00:16:02 local/tmm info tmm[4766]: Rule myrule : Received request, beginning response monitor interval. 1321604162 Nov 18 00:16:02 local/tmm err tmm[4766]: 01220001:3: TCL error: myrule - Address in use (line 1) invoked from within "LB::reselect pool foo2" 00:16:02.240879 IP 172.28.19.253.34199 > 172.28.19.79.80: S 3035055119:3035055119(0) win 5840 00:16:02.240917 IP 172.28.19.79.80 > 172.28.19.253.34199: S 2804861295:2804861295(0) ack 3035055120 win 4380 00:16:02.244080 IP 172.28.19.253.34199 > 172.28.19.79.80: . ack 1 win 46 00:16:02.244104 IP 172.28.19.253.34199 > 172.28.19.79.80: P 1:155(154) ack 1 win 46 00:16:02.244203 IP 200.200.200.10.34199 > 200.200.200.101.88: S 3752179749:3752179749(0) win 4380 00:16:02.244864 IP 200.200.200.101.88 > 200.200.200.10.34199: S 195117275:195117275(0) ack 3752179750 win 5792 00:16:02.244875 IP 200.200.200.10.34199 > 200.200.200.101.88: . ack 1 win 4380 00:16:02.244886 IP 200.200.200.10.34199 > 200.200.200.101.88: P 1:155(154) ack 1 win 4380 00:16:02.245899 IP 200.200.200.101.88 > 200.200.200.10.34199: . ack 155 win 54 00:16:02.249309 IP 200.200.200.10.34199 > 200.200.200.101.88: R 155:155(0) ack 1 win 4380 00:16:02.249333 IP 172.28.19.79.80 > 172.28.19.253.34199: R 1:1(0) ack 155 win 4534
- nitassEmployeeif i put LB::detach before LB::reselect, tcl was not error but after sending fin, bigip did not establish connection to pool foo2. i think because bigip already established connection to pool foo.
[root@ve1023:Active] config b rule myrule list rule myrule { when RULE_INIT { set static::response_timeout 5 } when HTTP_REQUEST { log local0. "Received request, beginning response monitor interval. [clock seconds]" } when LB_SELECTED { set monitor_id [\ after $static::response_timeout { LB::detach LB::reselect pool foo2 log local0. "Timeout $static::response_timeout milliseconds elapsed without server response. [clock seconds]" }\ ] } when HTTP_RESPONSE { log local0. "Received server response." if {[info exists monitor_id]} { log local0. "Canceling after script with id $monitor_id" after cancel $monitor_id } } } curl -i http://172.28.19.79 ...no response... Nov 18 00:27:41 local/tmm info tmm[4766]: Rule myrule : Received request, beginning response monitor interval. 1321604861 Nov 18 00:27:41 local/tmm info tmm[4766]: Rule myrule : Timeout 5 milliseconds elapsed without server response. 1321604861 00:27:41.513841 IP 172.28.19.253.47126 > 172.28.19.79.80: S 2431649126:2431649126(0) win 5840 00:27:41.513884 IP 172.28.19.79.80 > 172.28.19.253.47126: S 3818007430:3818007430(0) ack 2431649127 win 4380 00:27:41.517039 IP 172.28.19.253.47126 > 172.28.19.79.80: . ack 1 win 46 00:27:41.517091 IP 172.28.19.253.47126 > 172.28.19.79.80: P 1:155(154) ack 1 win 46 00:27:41.517268 IP 200.200.200.10.47126 > 200.200.200.101.88: S 1242901181:1242901181(0) win 4380 00:27:41.517934 IP 200.200.200.101.88 > 200.200.200.10.47126: S 3205360631:3205360631(0) ack 1242901182 win 5792 00:27:41.517946 IP 200.200.200.10.47126 > 200.200.200.101.88: . ack 1 win 4380 00:27:41.517957 IP 200.200.200.10.47126 > 200.200.200.101.88: P 1:155(154) ack 1 win 4380 00:27:41.518888 IP 200.200.200.101.88 > 200.200.200.10.47126: . ack 155 win 54 00:27:41.522163 IP 200.200.200.10.47126 > 200.200.200.101.88: F 155:155(0) ack 1 win 4380 00:27:41.522884 IP 200.200.200.101.88 > 200.200.200.10.47126: F 1:1(0) ack 156 win 54 00:27:41.522899 IP 200.200.200.10.47126 > 200.200.200.101.88: . ack 2 win 4380 00:27:41.617423 IP 172.28.19.79.80 > 172.28.19.253.47126: . ack 155 win 4534
- hooleylistCirrostratusMaybe we could use HTTP::retry instead of LB::reselect?
- nitassEmployeeHTTP::retry is not valid in LB_SELECTED event.
- Roman_80473NimbostratusAaron and nitass,
- nitassEmployeeI set the monitor_id with "after" inside LB_SELECTED, then cancel it in both http_request and http_response.i do not think we should cancel monitor_id in HTTP_REQUEST event since the monitor_id will start scheduling in LB_SELECTED event.
- hooleylistCirrostratusNitass, can you try putting the after statement in HTTP_REQUEST and use (LB::detach + LB::reselect) or HTTP::retry? Else, I can try testing this later today.
Recent Discussions
Related Content
Â
DevCentral Quicklinks
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com
Discover DevCentral Connects