Forum Discussion
CraigM_17826
Altocumulus
Jan 19, 2010Health monitor issue
Hi all,
we have a rather pressing and odd issue. Like most users I have setup health monitors to monitor the individual nodes within a pool. These health monitors used to work. Now at some unknown point this stopped. The reason why it is at some unknown point is that the health monitors are indicating nodes as being up when they are not. At the moment I am having to force the node down on the nodes that are down for maintenance. I am totally confused and alarmed at this behaviour. The health monitors in question are very basic monitors, they simply check for what is returned from a
get /wps/portal
and checks for the string
WebSphere Application Server
This used to work. I am totally at a loss as to why it's suddently stopped working. I can ping the physical servers from the BigIP units. Is there any command line tool on the BigIP to test monitors or any monitor debuging tool?
Sorry, forgot to mention, our units are running BIG-IP 9.4.7 Build 320.1 Final
We are in the planning to go to 10.x next week.
Regards
Craig
13 Replies
- hoolio
Cirrostratus
Hi Craig,
Is the server returning a response which includes 'WebSphere Application Server' even when down for maintenance?
You can enable debug on the monitoring daemon, bigd, by running 'b db bigd.debug enable' at the command line. The output is written to /var/log/bigdlog. The logging is very verbose so disable the logging once you're done collecting logs (using 'b db bigd.debug disable').
Aaron - CraigM_17826
Altocumulus
Hi Aaron,
thanks for that piece of information. I will do this shortly. I've just spent the last hour with our F5 support contractor and he was quite confused as well. We created a small shell script on the BigIP itself based on a example off the wiki and when run from the command line it works, that is it connects to the URL and does a grep for the search string and if grep finds it returns true. We then created a new external monitor and specified this script, but the pool status still stays as a blue square. As a last resort we changed the health monitor for the pool to be tcp, and even that results in the same behavior.
I'll enable debug tonight and do some tests and post the results.
Many thanks for your help on this and all the other issues I've posted here that you have replied to.
Regards
Craig - hoolio
Cirrostratus
Hi Craig,
Is there a node or pool member-specific configuration for the member(s) you're seeing this on? It sounds like the member might have monitoring set to none either on the pool member or maybe the node definition.
Can you configure the simple TCP monitor on the pool? Then please post an anonymized copy of your pool definition using 'b pool POOL_NAME list' and the status using 'b pool POOL_NAME show'.
Also, I added a wiki page on some tips for troubleshooting monitors. You might find something useful there. Or if you figure out another method that works for you, feel free to add it.
Troubleshooting LTM monitors
http://devcentral.f5.com/Wiki/default.aspx/AdvDesignConfig/TroubleshootingLtmMonitors.html
Thanks,
Aaron - CraigM_17826
Altocumulus
Posted By hoolio on 01/21/2010 4:51 PM
Is there a node or pool member-specific configuration for the member(s) you're seeing this on? It sounds like the member might have monitoring set to none either on the pool member or maybe the node definition.
No, both pool members inherit the monitor from the pool. I did try removing the monitor from the pool and then adding the monitor to both nodes, but the behaviour was the same, a status of the blue box.Posted By hoolio on 01/21/2010 4:51 PM
Can you configure the simple TCP monitor on the pool? Then please post an anonymized copy of your pool definition using 'b pool POOL_NAME list' and the status using 'b pool POOL_NAME show'.
Will do, but not until Monday.Posted By hoolio on 01/21/2010 4:51 PM
Also, I added a wiki page on some tips for troubleshooting monitors. You might find something useful there. Or if you figure out another method that works for you, feel free to add it.
Wonderfull. I'll have a look. Once again thanks for all of your help.
Regards
Craig - CraigM_17826
Altocumulus
Hi Aaron,
just to let you know we have just completed upgrading to 10.0.1 with HF3 and lo and behold the heath monitors started working again with no changes to them. I'll be keeping my eye on them to see if they start to fail again but fingers crossed they won't.
Craig - CraigM_17826
Altocumulus
Now they have stopped working again. Arrgh. Raised a support case with F5. I enabled debugging as you suggested but the file bigdlog in /var/log wasn't created. Perhaps this has changed in v10?
I've now tried a external monitor which basically does a curl to the specified host looking for the expected text in the output from curl via grep and then returns UP if found. Seems to work fine from the command line, but not from the GUI.
Craig - JoeBlogs1759_10
Nimbostratus
It sounds like your issue is on the F5 but you can run a wget http://host/path and examine the downloaded file for your string. - CraigM_17826
Altocumulus
Hi JoeBlogs,
I'll have to source a version of wget foir the BigIP as it doesn't seem to have wget as part of it's distro. The monitor also needs to connect over SSL. I just re-iterate, all of the monitors are mis behaving. I'm also a little confused as to what I should be using in the GET string now. This is the situation, I need to check for some specific test being returned when a URI is accessed. For example,
The URI is /new-services
The expected text is "xyzzy"
At the moment I just have as the SEND string GET /new-services and the RECEIVE STRING as xyzzy
On some other examples in the forums and wiki I see some variations, some specify the use of \r\n and HTTP\1.1 which I have tried adding to the SEND string
GET /new-servies \r\n HTTP\1.1
but it still doesn't work. What is confusing is that the monitor status says unkown, and it just stays as a BLUE box.
I have even tried removing the SSL cert details from the settings (the remote server uses it's own self signed SSL certificate) as Aaron suggested the monitor might not require it, but still no go. In desperation I even created a bog standard TCP monitor and even that fails, yet I can ping the remote server fine from the BigIP units.
Even enabling debugging doesn't seem to work, the logfile it is meant to create in /var/log/biglog ins't created. I'm just begining to wonder if their is something fundementally wrong with the pair of units, perhaps a something went wrong during the upgrade from 9.4 to 9.6 or 9.6 to 10. It more or less worked as expected under 9.4.
I've also just noticed that when I add new monitors they are not appearing is the list of available monitors to add to a pool. There isn't a licensing restriction on how many health monitors you can have running is there?
Craig - chanwood_14867
Nimbostratus
I've also just noticed that when I add new monitors they are not appearing is the list of available monitors to add to a pool.
i had the same problem, waiting for answer - hoolio
Cirrostratus
Hi chanwood,
What kind of monitor did you create? Can you run 'b monitor MONITOR_NAME list' and reply with the monitor definition?
Aaron
Help guide the future of your DevCentral Community!
What tools do you use to collaborate? (1min - anonymous)Recent Discussions
Related Content
DevCentral Quicklinks
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com
Discover DevCentral Connects