LTM External Monitors: The Basics
And the new finding is that the script execution STOPS when anything is written to the standard output!!! How can this be?!! well it is that way. so clean up BEFORE anything is written!
I'm going to expand my comment. Is it documented that this is the behavior of the external monitor scripts? If so can someone point it out to the rest of us?!! Please!!!!
I have seen a number of scripts on devcentral and almost ALL of them try to have cleanup code AFTER the success is indicated by writing to standard output (via Echo or otherwise).
Almost ALL of them have this process/PID management business in them and have a cleanup end the end.. Which as many have observed (and probably spent hours of time trying to figure out what was wrong) it doesn't cleanup so the next run tries to kill a process that doesn't exist...
A hole with this is that it might kill some other process as it doesn't appear from examples I have seen that it really checks that the process it is going to kill is actually one that is from this same monitor.
Then I wonder why this is even necessary.. How often, if ever, would a process still be lingering? These should be quick and lightweight monitors.. A couple examples are very extensive.. and some are structured very nicely, but end up introducing overhead with all the elegance (not that I am opposed to elegance and well written code).
So then I finally have to comment on how the external monitor reports back the success or failure. This is just full of problems IMHO that can lead to it indicating success when in reality some error or other issue occurred that writes something to the standard output resulting in two things: 1. The script STOPS at that point. 2. It indicates success - that the health monitor indicates the server is good for use.
Oh my gosh, really? This is a very sloppy hand off.. Can the developers look at this and realize how much the passed off to the writer of the external monitor to know and be aware of? (again is this outlined anywhere?).
A hand off should be clean and clear. Yes it succeeded, No it did not succeed. Other systems have return codes.. Zero is success anything else is a failure code or other indicator. Or they have an agreed to return.. which could be a string to match success/failure.
Why does the script stop when something is written (indicating success)?.. Perhaps to reduce the 'overhead' of the monitor. But almost anyone writing these seem to be unaware of this behavior and why would anyone writing a script need to think.. oh something got written out.. the script is over?
Off the soapbox.. but the design and the behaviors really makes it confusing for those who would like to write an external monitor.