Forum Discussion

Jimmy_87630's avatar
Jimmy_87630
Icon for Nimbostratus rankNimbostratus
Jul 13, 2009

Possible Design Issue?

Hello,

 

 

I appreciate any ridicule and/or assistance.

 

 

I have an LTM 6800 operating in a carrier class ISP environment. I have 6 UNIX based SMTP servers sitting behind the 6800. These mail servers are simply acting as relays for our customer's mail. The servers are numbered out of private IP space. The Virtual Server on the LTM is numbered out of public space. The Virtual Server on the LTM listens on TCP 25 and balances the traffic between the 6 SMTP relay servers.

 

 

Each SMTP relay server also runs a local DNS resolver. The relays use the local resolver to service all DNS queries required to send email.

 

 

I have a forwarding VS configured for all protocols to facilitate outbound traffic. I have SNATs configured for EACH relay server to ensure that each relay is uniquely identifiable to destination SMTP servers.

 

 

Recently I've ran into two situations where the local DNS resolvers on the relays were unable to communicate properly with CERTAIN external DNS servers. This of course resulted in our customers being unable to send email to domains hosted by these providers. Initially I thought I was dealing with a remote broken DNS server. However I have now seen a couple of these issues and the most recent is with a rather large DNS provider. (worldwidedns.net)

 

 

Upon investigation I discovered that the DNS responses arriving back from CERTAIN external DNS servers were being ignored by the SMTP relay servers due to a bad UDP checksum. However the F5 itself and all other devices NOT behind the 6800 did not have a problem querying this particular server.

 

 

I have all the tcpdump and packet trace data showing the problem but before I start blinding the group with verbose tcpdump / snoop output I'd like to know if perhaps the design is not conducive to maximum interoperability with other DNS Servers. I have been unable to get some of these admins to take a serious look and assist in the troubleshooting as the problem is clearly on my end. I have verified the problem exists on Solaris 10/9/ and 8. As well as with Redhat Linux.

 

 

I suspect certain remote DNS providers are employing load balancers of some sort and the UDP packet is getting broken at the F5 on my end during the final translation.

 

 

Is there something else I should have configured to make sure that these DNS resolvers on the SMTP relay servers can query All internet DNS servers properly?

 

 

If someone wants a look at the actual configuration and/or the results of my exhaustive troubleshooting I'll be happy to provide.

 

 

I appreciate the assistance and I'm looking forward to giving back to the group.

 

 

-jimmy..
  • I'm posting relevant bits of the LTM configuration.

         
     virtual SMTP {     
     pool smtp     
     destination 24.24.24.254:smtp     
     ip protocol tcp     
     rules SMTP_RATE_LIMIT     
     }     
     virtual VS_OUT_226 {     
     ip forward     
     destination any:any     
     mask none     
     vlans Relay_Services     
     external-relay enable     
     rules RELAY_SERVICES_OUT     
     }     
     

    Here's the IRULE I use to setup the SNATs to set SMTP server public IPs. I was going to use the Hosts / Svcs data classes to bypass the SNAT for DNS traffic. Currently those datagroups are empty.

         
     rule RELAY_SERVICES_OUT {     
     when CLIENT_ACCEPTED {     
          
     if { [matchclass [IP::client_addr] equals $::Hosts]} {     
          
     if { [matchclass [UDP::local_port] equals $::Svcs]} {    
     node 24.24.24.1   
     } else {     
     switch [ IP::client_addr ] {     
     172.28.6.11 { snat 24.24.24.11 }     
     172.28.6.12 { snat 24.24.24.12 }     
     default { node 24.24.24.1 }     
     }     
     }     
     }     
     node 24.24.24.1     
     }     
     }     
     
  • SOLVED.

     

     

    If you find yourself losing UDP packets from certain hosts try disabling PVA Acceleration in the FastL4 profile.

     

     

    worked for me.. I've opened a case and I'm actively submitting test data.

     

     

     

     

  • Be sure and ask if there is a CR open for this behavior - I recall that there may be one for PVA-enabled UDP traffic like this. Request that your case be attached to the CR if it exists, so it'll hopefully be resolved if possible.

     

     

    -MC
  • This issue bit us yesterday. We'll be opening a ticket. What CR should we request this be attached to?

     

     

    Thanks for posting the workaround!