Forum Discussion

Dazzla_20011's avatar
Dazzla_20011
Icon for Nimbostratus rankNimbostratus
May 24, 2011

F5 GTM DNS persistence

Hi,

 

 

Has anyone any experience in implementing dns persistence on the GTM's. We identified any issue with our current configuration and were recommended to split our LTM's from an active - Standby pair in to two independent LTMs'. Since doing this we've encountered problems with dns flipping during a session and redirecting a user to a different data centre and therefore a different sever. Before the change this didn't matter because we were using one pool and therefore one source ip persistence table so a user was directed to the same server not matter which data centre they connected in from. We have layer 2 links between our DC's so it possible for us to have servers located in different data centres in the same F5 pool.

 

 

To get around this problem we've been advised to use dsn persistence so a user will be directed to the same data centre if the dns ttl expires. Has anyone any experience with this and what are the potential problems we could encounter? I'm conscious we have no control over which dns servers a user hits so in my mind there're a chance a user could be still be flipped from one data centre to another which we cannnot afford.

 

 

To get around my problem I'm thinking of reverting back to an active - standby LTM set up. I can fool the GTM's in to thinking we have two active LTM's by NAT'ing a public ip address at each data centre back to the private real address of the LTM's. I would also need to NAT the source address of each GTM for iquery purposes. Would anyone know if NATing the source ip of the GTM could cause any problems with iquery.

 

 

Any advice very much appreciated as we seem to be implementing different fixes which in turn causes additional problems.

 

 

 

Many Thanks

 

Darren
  • We ran for about 8 years with a dual site load balancing architecture for our Internet accessible web sites. Started with 3DNS engines and BigIP at version 3.x, thru 4.5, now at version 9.48 GTM and LTM. We used persistence at the DNS level and simple source ip persistence at the LTM, or BIgiIP level. That worked fine for the first few years until the load balancing of local DNS servers started happenign and it started to become more common for a users ldns to change during the middle of their web session. That could be alleviated by increasing the TTL, but the tradeoff is more downtime to a user if you have a problem at one datacenter and they keep coming to it until the TTL expires and they ask you again which datacenter to go to.

     

     

    As a stopgap we applied topology records that basically split the Internet in half, and went with a quasi active/standby for each half, making each datacenter primary for one of the halfs. That seemed to work fairly well. It was still technically possible for someone to be using an ldns from the lower half if the IP range and then come from one in the upper range, but it was much less likely.

     

     

    We then (about 2 years ago) closed our second in region datacenter and moved a lot of the equipment either to our one remaining datacenter in the region, and opened an out of region datacenter about 1500 miles away. The latency involved with that connection made a lot of the apps unusable when the app servers were in the out of region datacenter and the database was in the primary, so we are in a true active standby mode now.

     

     

    I have read some of the stuff F5 is coming out with on global topology to try and address the issue of ldns persistence, but haven't had the time to pursue it yet, although we are starting to get more pressure to make use of the valuable resources in our out of region datacenter for more than just insurance against a disaster, so I'm going to need to make the time....

     

     

    Regards,

     

     

    Mark
  • Hey guys, I ran into this very problem a couple years back... Persistence worked just great at the WideIP level for a while... Then it seemed we were getting more and more complaints about bouncing from one dc to the next.. After some investigation, yep you guessed it, load balancing of LDNSs.... And the problem with Persistence at the WIP level is it's based on the 32 bit address of the LDNS..

     

     

    I actually put a feature request in a while back so you could specify the CIDR block... ie perist on first 8 bits, 16, etc etc This was back in ver 9.3.x I believe... I don't believe it has been resolved.. has it? If you see this as useful, could you please +1 the feature request?? I'll try to dig up the ID... or maybe someone internal will see this post and dig it up for us ;)

     

     

    Soo your solution is to use Static Persist at the Pool level, where you CAN specify the CIDR.. You specify the CIDR in--> System--->Configuration---->Global Traffic--->General--->Static Persist CIDR (IPv4) & (IPV6).. I do wish this was not a global setting, and specified at the pool level..

     

     

    Now that will absolutely work for you, but you need to understand how it works and the trade-offs...

     

     

    Static Persist will do an MD5 hash on the pool member makeup and the LDNS then return the SAME value every time to a user, unless the pool makeup changes, or that member is unavailable.. There is no persistence table with this logic..

     

     

    With all of that in mind you may not see the most even split of traffic... In my experience it was never to bad... 60/40ish and the business was very happy to have a fix.. but it has the potential to be worse.. So you need to take that into account and share the info.. but if persistence is what you need it seems to be the best option without using topology record logic..

     

     

     

  • Hello,

     

    We are running into the same issue. Let us know if F5 has a recommended solution for ldns persistence.

     

     

    Regards,

     

    Karthik