Hi guys, I have successfully setup email alerts on my F5 BIGIP in my production environment and it works but there is an issue. For every second the monitor goes down or for every second the monitor sees the node as down, an email alert is sent to our mails as configured. But this causes too many influx of mails for negligible node downtimes. What can I do to correct this?

Hi Oreoluwa, You can change monitor's interval/timeout values to 8/25 or 10/31. You must verify that monitor settings are properly defined for your environment. F5 recommends that in most cases the timeout value should be equal to three times the interval value, plus one. For example, the default timeout/interval ratio is 5/16 (three times 5 plus one equals 16). This setting prevents the monitor from marking the node as down before sending the last check.REF: https://support.f5.com/csp/article/K12531

It seems that your server setting wasn`t good.If you using http monitor, insert close session syntax.or most of case, your server`s kernal value related with tcp is root cause.

i don't understand this SWJO. Could you explain better please? I am interested in this

likewiseGET /test.html HTTP/1.1\r\nUser-Agent: \r\nHost: 127.0.0.1\r\nConnection: Close\r\n\r\n

This config is for the health monitor ?

TOO MANY TRIGGERED EMAIL ALERTS BY HEALTH MONITOR

10 Replies

Enes_Afsin_Al
MVP
Dec 24, 2019
Hi Oreoluwa,

You can change monitor's interval/timeout values to 8/25 or 10/31.

You must verify that monitor settings are properly defined for your environment. F5 recommends that in most cases the timeout value should be equal to three times the interval value, plus one. For example, the default timeout/interval ratio is 5/16 (three times 5 plus one equals 16). This setting prevents the monitor from marking the node as down before sending the last check.
REF: https://support.f5.com/csp/article/K12531
SWJO
Cirrostratus
Dec 26, 2019
It seems that your server setting wasn`t good.
If you using http monitor, insert close session syntax.
or most of case, your server`s kernal value related with tcp is root cause.
- Oreoluwa
  Altocumulus
  Jan 02, 2020
  i don't understand this SWJO. Could you explain better please? I am interested in this
  - SWJO
    Cirrostratus
    Jan 02, 2020
    likewise
    GET /test.html HTTP/1.1\r\nUser-Agent: \r\nHost: 127.0.0.1\r\nConnection: Close\r\n\r\n
Yoann_Le_Corvi1
Cumulonimbus
Jan 03, 2020
Hi

as @eaa mentionned, the key is probably you monitor settings. What are the current Interval / timeout values ?

Yoann
- Oreoluwa
  Altocumulus
  Jan 03, 2020
  Hi, my current interval/timeout values are 5/16.
  I am planning to change it to 5/300 or should I make it 90/300. The client wants 5 mins of repeated failed checks before the server is considered as down. It is believed that, the server only currently goes down when there is so much as just 4 secs downtime. So we think 5/300 or 90/300 will prevent that. What do you think?
Yoann_Le_Corvi1
Cumulonimbus
Jan 03, 2020
Hi,

To be tried on your environment, but yes on the paper that should do the trick.

If you set 5/300, then you will send 60 requests MAX over 5 minutes if the server is not responding.
If you set 90/300, then you will send 3 requests MAX over 5 minutes if the server is not responding.

So it really depend on how insistent you want to be on your backend.

Yoann
jaikumar_f5
Noctilucent
Jan 03, 2020

The best practice is 3n+1.
You really should not be adjusting the timeout & interval just to reduce the number of flaps or alerts to be suppressed. If you still do that, you won't have a stable infra.

If the flappings are continuos, you should identify that and resolve it. Or put a proper monitor accordingly.
But simply increasing the timeout & interval is not the right approach I'd say.

Keep us posted if you need more help.