Forum Discussion
TOO MANY TRIGGERED EMAIL ALERTS BY HEALTH MONITOR
Hi guys,
I have successfully setup email alerts on my F5 BIGIP in my production environment and it works but there is an issue. For every second the monitor goes down or for every second the monitor sees the node as down, an email alert is sent to our mails as configured. But this causes too many influx of mails for negligible node downtimes. What can I do to correct this?
Hi Oreoluwa,
You can change monitor's interval/timeout values to 8/25 or 10/31.
You must verify that monitor settings are properly defined for your environment. F5 recommends that in most cases the timeout value should be equal to three times the interval value, plus one. For example, the default timeout/interval ratio is 5/16 (three times 5 plus one equals 16). This setting prevents the monitor from marking the node as down before sending the last check.
- SWJO
Cirrostratus
It seems that your server setting wasn`t good.
If you using http monitor, insert close session syntax.
or most of case, your server`s kernal value related with tcp is root cause.
- Yoann_Le_Corvi1
Cumulonimbus
Hi
as @eaa mentionned, the key is probably you monitor settings. What are the current Interval / timeout values ?
Yoann
- Oreoluwa
Altocumulus
Hi, my current interval/timeout values are 5/16.
I am planning to change it to 5/300 or should I make it 90/300. The client wants 5 mins of repeated failed checks before the server is considered as down. It is believed that, the server only currently goes down when there is so much as just 4 secs downtime. So we think 5/300 or 90/300 will prevent that. What do you think?
- Yoann_Le_Corvi1
Cumulonimbus
Hi,
To be tried on your environment, but yes on the paper that should do the trick.
If you set 5/300, then you will send 60 requests MAX over 5 minutes if the server is not responding.
If you set 90/300, then you will send 3 requests MAX over 5 minutes if the server is not responding.
So it really depend on how insistent you want to be on your backend.
Yoann
- jaikumar_f5
Noctilucent
The best practice is 3n+1.
You really should not be adjusting the timeout & interval just to reduce the number of flaps or alerts to be suppressed. If you still do that, you won't have a stable infra.
If the flappings are continuos, you should identify that and resolve it. Or put a proper monitor accordingly.
But simply increasing the timeout & interval is not the right approach I'd say.
Keep us posted if you need more help.
Recent Discussions
Related Content
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com