How to make DNS resolutions in an HTTP request event work FOR you, not against you
I got a message from a colleague about an interesting challenge a couple of weeks ago. The challenge was two-fold with the customer iRule that was in place:
- The rule used the RESOLV::lookup command, which works but has been deprecated since version 15.1
- The rule pulled the FQDN needing a lookup from an HTTP header, which means that every request matching the interesting client addresses resulted in a DNS resolution.
I wrote an article a while back introducing the replacement strategy for RESOLV::lookup, and that is the RESOLVER::name_lookup command. You need to create a resolver configuration object to use this command, and that is covered in the linked article. That needed to be handled but is not the interesting part. The second challenge is where the ideas start turning. This customer did not have any of the advanced DNS modules that would have made this trivial (and more robust.) Ultimately, we need a cache to accomplish two things: first, to reduce the number of queries to the resolver and second, to limit the lifetime to avoid stale data. Thankfully, the table command, which uses system memory for session data, is the perfect tool in the toolbox for this job.
First, let's take a look at the original rule and see what it was doing.
Original Rule
when HTTP_REQUEST priority 500 {
if { [class match -- [IP::client_addr] equals targetIPs] } {
set lb_server [HTTP::header "be_server"]
set lb_port [HTTP::header "be_server_port"]
HTTP::header replace "Host" "$lb_server"
set dest [lindex [RESOLV::lookup @10.10.10.10 -a "$lb_server"] 0]
if { $static::debug == 1 } {
# log local0. "be_server--> $lb_server"
# log local0. "be_server_port--> $lb_port"
# log local0. "Resolved Destination IP--> $dest"
}
if { [scan $dest {%d.%d.%d.%d} a b c d] == 4} {
node $dest $lb_port
}
} else {
HTTP::redirect "https://www.google.com"
}
}
The only iRules event in play here is HTTP_REQUEST (save for RULE_INIT to set the debug variable, but that isn't relevant to our discussion.) When I first saw the iRule to analyze the scenario, the fact that the FQDN that needed a lookup was buried in an HTTP header was lost on my, and I thought I could tackle this problem in CLIENT_ACCEPTED, but alas, no can do.
The iRule starts with a conditional to match interesting clients, otherwise redirects to Google. If there's a match, the FQDN and the port are pulled out of headers and then the HTTP Host header is replaced, switching from the virtual server FQDN, to the one in the unique header.
The next line is where the big action in this iRule takes place. A DNS resolution is done against server 10.10.10.10 for the A record matching the FQDN retrieved from the be_server header and the first value of one or more returned addresses is stored in the dest variable.
Finally, if scanning the dest variable results in four validly set variables, then forward that request traffic to the resolved IP address and original port as specified in the be_server_port header.
This is super useful iRule, but we're here to improve it! The good news is only one line (the RESOLV::lookup one) in this iRule needs to change, even though the changes will result in more lines. Let's take a look at the changes.
Final Rule
when HTTP_REQUEST priority 500 {
if { [class match -- [IP::client_addr] equals targetIPs] } {
set lb_server [HTTP::header "be_server"]
set lb_port [HTTP::header "be_server_port"]
HTTP::header replace "Host" "$lb_server"
# COMMENTING OUT this line
# set dest [lindex [RESOLV::lookup @10.49.15.253 -a "$lb_server"] 0]
# START NEW CODE
# if table key value is returned empty
set local_cache_lookup [table lookup -notouch -subtable dns_cache -- $lb_server]
if { $local_cache_lookup == "" } {
# Make a lookup and store the values with lb_server as key with a lifetime timer of 30s
set resolver_lookup [RESOLVER::name_lookup "/Common/resolver1" $lb_server a]
table set -subtable dns_cache -- $lb_server [RESOLVER::summarize $resolver_lookup] 30
set dest [lindex [RESOLVER::summarize $resolver_lookup] [expr {int(rand()*[llength [RESOLVER::summarize $resolver_lookup]])}]]
} else {
# Randomly assign dest from the values returned from the lookup
set dest [lindex $local_cache_lookup [expr {int(rand()*[llength $local_cache_lookup])}]]
}
set dest [lindex $dest 4]
# END NEW CODE
if { $static::debug == 1 } {
# log local0. "be_server--> $lb_server"
# log local0. "be_server_port--> $lb_port"
# log local0. "Resolved Destination IP--> $dest"
}
if { [scan $dest {%d.%d.%d.%d} a b c d] == 4} {
node $dest $lb_port
}
} else {
HTTP::redirect "https://www.google.com"
}
}
We start by commenting out the line we're going to replace. Next, a lookup is done in the local cache to see if a value exists for that FQDN provided in the be_server header. The -notouch flag is used so the age of the record isn't impacted.
If there is no current value, a lookup is performed using the newer RESOLVER::name_lookup command and then the result of that, which could be any number of A records in binary format, are converted to human-readable format with the RESOLVER::summarize command and stored as a list in the local cache for a lifetime of 30 seconds.
The next line here (and the same line in the else, only with resolution already summarized) looks complicated, but all it's doing is storing a random member of the resolution list to the dest variable by using lindex against the list and using a Tcl expression to randomly select the index based on the length of that list.
Finally, on the last line of the changes, the dest is updated to select out of the DNS resource record the IP address and not the leading fields from that A record.
The Result
You can probably imagine that the wait times for DNS resolution were noticeable to the users and the request load also noticeable to the BIG-IP systems management teams. I was talking to my colleage after the fact, and he noted that these simple changes resulted in a 95% reduction in DNS resolutions from the iRule, and an improvement in answers on the front-end to mere microseconds. Not bad at all!
Conclusion
Sometimes optimization is a fools errand, spending countless hours trying to eek out minimal cycles reductions. But other times, there are opportunities to significantly reduce system load and user response times by taking a step back to see where a different approach might help. Hopefully this exercise will help you in evaluating where a return on investment might be in your own iRules codebase! Happy coding, community!