Forum Discussion

Mahi's avatar
Mahi
Icon for Cirrus rankCirrus
Nov 17, 2019

Flapping iquery, have to ARP to make it work

Hello Everyone,

 

Scenario:

Its a very simple scenario to explain, three data centers with pair of standalone GTMs and a pair of LTMs in HA. Full mesh iquery exists from the GTMs in one data center to all the LTMs from each DC. The GTMs in one DC is using the LTMs floating-IP as the default-gateway, required forwarders exists on all the LTMs to make the full mesh iquery be allowed among all the LTMs and GTMs.

 

Issue:

On the GTM of DC1, we see the iquery is flapping towards all the GTMs and LTMs of DC2 and DC3. The same thing happens on GTMs at DC2 and DC3 as well at a later time. GSLB servers are marked up for a few minutes and then marked down intermittently.

 

Workaround:

When we see the servers are marked down, deleting the current ARP on the GTM for the default-gateway brings them up the servers. The ARP is resolved to the valid MAC address belonging to the floating-IP on the LTM. Deleting the ARP entry does not change the MAC address.

 

Please let me know if you can suggest a permanent solution or a better workaround. Thanks in advance.

 

 

 

  • If you can see that the ARP table entry for the default gateway on the GTM is correct, but deleting and re-ARPing for that address resolves the problem, it sounds to me like a switch issue (corrupt CAM table entry).

     

    Does it happen as the result of a failover between the HA LTMs?

    Are you using Mac Masquerade on the HA LTMs?

  • It is not due to HA failover, we tried manual failover but didn't affect the GTMs. The iquery goes down, GSLB servers are marked red and then come back after some time.

     

    I tried configuring the static ARP, didn't resolve the issue.

     

    I configured MAC Masquerade, but, still the same issue. I am clearing the ARP once in a while until the TAC finds the resolution.

     

    Can you suggest some procedure to delete the ARP via a cron job that runs hourly? can we spin up something like that and execute every 10 minutes or so.

     

    Code we are running is 11.5.1 HF1, it seems to be buggy.

  • > Code we are running is 11.5.1 HF1, it seems to be buggy.

     

    It's a really old version - you really should update it. But the issue you describe does not seem to be a BigIP issue.

     

    > Can you suggest some procedure to delete the ARP via a cron job that runs hourly? can we spin up something like that and execute every 10 minutes or so.

     

    Create a bash script in /etc/cron.hourly to delete the ARP entry.

     

    • Mahi's avatar
      Mahi
      Icon for Cirrus rankCirrus

      Thank you. Can you suggest a few lines of code that I can insert in a file under cron.hourly directory? I added following one line but that seems not to work;

      Tmsh delete net arp 192.168.1.1

  • /usr/bin/tmsh -q -c "delete /net arp 192.168.1.1"

    Make sure that you set your bash script to be an executable file:

    chmod a+x /etc/cron.hourly/del_arp.sh
    • Mahi's avatar
      Mahi
      Icon for Cirrus rankCirrus

      I see, how would I check if the cron job is executed successfully?