Forum Discussion

sgogean's avatar
sgogean
Icon for Nimbostratus rankNimbostratus
Sep 29, 2022

F5 GTM DNS - high number of requests

Hello comunity,

 

We're facing for quite a while, some sort of a DDoS tries (I can't consider it as an actual attack), against our F5 Public DNS.

Initially we were seeing ~700K - 1Mil connections (IP connections) and after some analisys we decided to lower port UDP 53 timeout from default 40sec to 10sec (on our firewall that is in front of the F5). That change had an positive impact on peak conenctions (when DDoS was seen) by lowering them to under 50% - appros 300-400K connections nowadays. 

At this point, we started to collect data and we've seen that there is a pattern on this DDoS (at least once a day, and close to the same time - almost) and also we got the IP's that are performing those querryes (top 10 IP's based on conenctions performed in that moment).

The missing piece was the actuall querry that was performed during the HIGH connections moments, and for that we enabled F5 DNS logging, but having too much traffic, we could only see 10 - 15 sec in F5 logs ðŸ˜” .

Still, the good part was that we were right on the spot in couple of occasions and we were able to get a snap of the querryes:

 

Now yesterday, we manage to get the F5 DNS logging to be forwarded to an GrayLog, and over the night, we were able to capture a peak of ~1Mil logs (so I would say like 500K querryes - considering that you have 2 log lines per querry) and while exporting the logs for the 5 min this event happened, we were able to get a list of the TOP 20 querryes - that are legitimate DNS querryes, nothing abnormal.

 

So after all this monolog, my question to you is, does anyone faced similar HIGH DNS querry and what did you do to prevent it ?

While looking for ways to prevent it, we've seen article K11005751 that might be for us, and with an iRule we can DROP requests for FQDNS that don't exists in our environment.

Would that add an extra load on resources, considering that the iRule will be executed each time a DNS Querry arrives to our F5?

 

Thank you,

 

4 Replies

  • Having spent 22 years architecting backbone DNS solutions for global service providers, I would urge you to consider a few things immediately:

    1. Turn off tcp DNS if on. 
    2. UDP timeout can be immediate. When flooded, UDP acts almost like streaming video.. it will blow up each connection as wide as it can and then take every connection possible. You end up with port exhaustion and eventual collapse. Make a new UDP profile and apply it to your DNS VIP(s).
    3. AFM. Immediately. For the price, what AFM brings to the table is immeasurable. If you use an iRule, you will need to open all of your packets at layer 7.. and how many connections are you getting? Your processors will scream. AFM allows you to do the majority of the heavy lifting earlier in the HUD chain.. before a packet ever even hits a VIP. You can filter DNS at the global context... or route domain.
    4. IP Intelligence.. but make sure you're getting your updates on a VLAN that does not feel inbound DOS pain.. EVER. Why IPI? How many of those clients are people? We update IPI every 5 minutes.
    5. Fancier.. Flowspec from your GTM to your perimeter routers. If a DOS event is triggered, blackhole that IP.
    6. Consider calling in F5 SIRT, if it's bad. They will put an easy stop to it for a consulting fee. As Jason mentioned, platforms are fast and easy to implement, if needed.
    • sgogean's avatar
      sgogean
      Icon for Nimbostratus rankNimbostratus

      Thank you JRahm and AubreyKingF5 for your responses.

      To clarify and answer/confirm AubreyKingF5 questions or recommendations (in blue) :

      1. Turn off tcp DNS if on. - our GTM is set to answer UDP only.
      2. UDP timeout can be immediate. When flooded, UDP acts almost like streaming video.. it will blow up each connection as wide as it can and then take every connection possible. You end up with port exhaustion and eventual collapse. Make a new UDP profile and apply it to your DNS VIP(s). - that was the first thing we did back in June/July when we noticed the events. As you see in the below screenshot, by lowering the UDP 53 from 40 sec default timeout to 10 sec default timeout, we are now facing ~350K connection (while initially we were with ~1.2Mil connections) . The UDP 53 timeout was set on the Firewall, that is in-path between Internet and F5 GTM .


      3. AFM. Immediately. For the price, what AFM brings to the table is immeasurable. If you use an iRule, you will need to open all of your packets at layer 7.. and how many connections are you getting? Your processors will scream. AFM allows you to do the majority of the heavy lifting earlier in the HUD chain.. before a packet ever even hits a VIP. You can filter DNS at the global context... or route domain. - so, we have AFM enabled, but only with an DDoS profile for DNS and that didn't helped that much, as the DNS queries are valid, like A or AAAA or CNAME queries, not bad ones.... 
        Is it possible to get some documents/recommendations for AFM set-up for DNS (doesn't need to be specific for DNS as we can et an ideea out of it and build it ourselves). 

      4. IP Intelligence.. but make sure you're getting your updates on a VLAN that does not feel inbound DOS pain.. EVER. Why IPI? How many of those clients are people? We update IPI every 5 minutes. - We certainly can start looking into this too, and enable certain filters based on the IP Intelligence, and like for the AFM, we ask for some documents/recommendations for IP Intel set-up in general.
        Thing is, the IP's that reaches use are not BAD - like those ones from this morning (194.226.75.82, 194.226.75.83, 212.12.0.2, 173.212.200.42, 213.136.95.11, 193.232.160.48, 194.226.75.81, 193.232.160.51, 5.148.43.44, 193.232.231.81) I searched them on different lists and they are OK IP's. 

        [or from an "attack" a week ago 194.226.75.81, 194.226.75.83, 193.232.230.82, 193.232.230.81, 194.226.75.82, 193.232.160.48, 193.232.160.50, 193.232.160.49, 193.232.160.51, 193.232.231.81] 
        This is why we were not looking into getting those IP's BLOCKED on our Firewall, as others would be used next. We wanted to identify the exact attack and drop that specifically. So since all the queries are normal DNS queries , we thought that using an iRule that would DROP queries for records not existing in our GTM, would be a better approach ðŸ™‚ .

      5. Fancier.. Flowspec from your GTM to your perimeter routers. If a DOS event is triggered, blackhole that IP. - Right, therefore we struggled a bit to get the DNS logs out of F5 and be able to filter throuh and with some inteligence, we would extract all the IP's that are getting an REFUSED response back in 1 min or so, and if that happens over 50 or 100 iterrations  a minute, we would block that for a certain period. It's still a work-in-progress but we're close ðŸ™‚ .

      6. Consider calling in F5 SIRT, if it's bad. They will put an easy stop to it for a consulting fee. As Jason mentioned, platforms are fast and easy to implement, if needed. - Thank you for the recommendations , so we're not in a critical position now, as we have it cotained - if we can say so ðŸ™‚ .

      So, our take from all this is to look on the AFM and IP Intelligence and in the end (if this doesn't get the results as expected) we could go with the iRule we've talked about .

      If there are any other ideeas, please share .

      Thank you and have a great weekend,

      • AubreyKingF5's avatar
        AubreyKingF5
        Icon for Admin rankAdmin

        Awesome! To clarify #2, I would recommend setting your UDP timeout to "Immediate", not 10. 1 UDP packet takes less than a ms to complete.  Like much less. I've personally watched a single UDP profile on F5 DNS handle 3M PPS.. ON A VE !! (High Performance VE, but still.)

         

        Regarding the AFM, there's not a ton besides the manual, but this lab guide may help:

        https://clouddocs.f5.com/training/community/firewall/html/class2/class2.html

         

        Also, here is a snippet of tmsh from a service provider grade DNS AFM DoS policy that I worked on for an example:

        tmsh modify security dos device-config dos-device-config dos-device-vector { sweep { detection-threshold-percent 500 detection-threshold-pps 250 per-source-ip-limit-pps 500 packet-types replace-all-with { dns-a-query dns-aaaa-query dns-any-query dns-axfr-query dns-cname-query dns-ixfr-query dns-mx-query dns-ns-query dns-other-query dns-oversize dns-ptr-query dns-response-flood dns-soa-query dns-srv-query dns-txt-query udp } } }

         

        The idea is to build a policy that disallows all undesired query types, LIMITS all desired query types (flooding is dropped) and then defends against all UDP flooding.  You may want to carve that up, too.. so UDP flooding cut off globally (like.. why not?) and the dns-specific dos profile attached to the VIP(s).

  • iRules are processed first in the line of the other services, so whereas there is always an impact, it will be negligible per connection compared to the other services processing those requests and that is a well-known and helpful approach to managing the issues you are seeing.

    That said, the local system is still going to get all those requests, even if you're dropping them. A service like F5 Distributed Cloud DDoS Mitigation could scrub these brefore the requests hit your local services if the volume becomes too great for your equipment or pipes to manage.