LTM VE virtual server unreachable sometimes

Question

Hello ,&nbsp;we use a HA cluster of 2 LTM's running on&nbsp; version 13.1.1.5 . Up til some weeks ago the LTM were deployed on vcmp guests on a viprion 2400 platform .Early 2022 we migrated our LTM's from viprion towards LTM VE units running on vmware ESX&nbsp; . During migrations we didn't encounter any issue. (we basically used RMA process for replacing the units 1by1 with LTM VE's)basic setups (VIP + TCP port) is done for some applications on this cluster . Where we use a virtual server , together with SNAT for pointing to pool members . And we use mac-masquerading also for creating fake mac-addresses. For this purpose we put "promiscious mode" - "'forged transmit" - "mac address changes" to "accept" on vmware .Vmware is running on HP blade enclosure . Where normally 1 blade = 1 vmware host . During setup we asked vmware team to always have both units in HA cluster , running on different vmware hosts. (redudancy we thought)Since replacement we are getting for some setups complaints&nbsp; that during the night (period of low traffic) , access to the virtual server (VIP + TCP port) is lost for a short period . When we check out ltm logs , we see however that pool members are still reachable as there are no UP/Down events. Neither do we see "failover" messages in logs . So clearly the HA cluster remains stable &amp; pool member monitoring keeps working.Further investigation at network level was done &amp; we noticed that during the nights were the issue is seen , all mac addresses from active unit but also from standby unit . Even if we thought they are on different vmware hosts . Long story short , after a while vmware team confirmed us that even if your have server on different&nbsp; vmware hosts (blades)&nbsp; these still can use the same uplinks to network. Each blade has 4 uplinks to virtual server , and virtual server than has 4 uplinks to virtual connect modules . Vmware uses round robin for determining which uplink to use . Consequence , even when you are on different blade you have a 25% possibility of using same uplinks.When this occurs , we see that during periods with little traffic we loose connectivity . This doesn't occur when we are using different uplinks.We are suspecting this has to do with aging timer of mac-address of virtual server. Which is using the mac-masquerade address . We're suspecting that at vmware level the mac address (mac masquerade) is not known anymore , while at network level (cisco switches/routers) we are still seeing the mac-masquerade address . Thus you need to wait untill a ARP is done ar router level in order to get mac-addresses known again .Does anybody has similar experiences ? Is there anybody who has more info about how mac-masquerading addresses are learned at virtual switch level &amp; eventually how long they will be cached there ?

dario_garrido · Answer

Hello&nbsp;werner_verheyle.
There are some bugs related to MAC Masquerade and VE. Check them out:
https://cdn.f5.com/product/bugtracker/ID714303.html
https://cdn.f5.com/product/bugtracker/ID759794.html
&nbsp;

werner_verheyle · Answer

Hi ,&nbsp;articles are linked to X520 &amp; SRIOV. I've checked with our vmware team and we don't use this on our infrastructure.The weird point is also that most of the time it is all working fine without any issue . Only at nighttime , when there is little traffic , then we see occurences of the issue . Virtual server becomes unreachable for clients trying to reach it . It also for a short period and then becomes available again .&nbsp; If i read the above articles i would expect it not to work at all with mac-masquerading . This is not the case for us , it works fine most of the time .

dario_garrido · Answer

Hello Werner.
Are you using VMXNET 3&nbsp;drivers in your interfaces?
If yes, then I encourage you to open a support case.
&nbsp;

psfletchthetek · Answer

Is there any 24 hour timers anywhere? That trigger at night?Is it always the same time?

Forum Discussion

LTM VE virtual server unreachable sometimes

Recent Discussions

iRule to Change Default Pool (not redirect to another pool)?

What is the best practice for migrating from iseries to rseries?

iRule to Force Source IP to Specific Backend Node

checking the fan status on the device.

what is this serial number b1:e5:bd:8f:2x:58:xx:27 in device certificate?

Related Content

Sometimes you don't know what you don't know

ftp proxy sometimes does not work

Migration F5-DNS-VM from a Virtual Machine to other Virtual Machine

Bare Metal Blog: Throughput Sometimes Has Meaning

Trouble applying GoDaddy certificate to a virtual server

ABOUT DEVCENTRAL

RESOURCES

SUPPORT

PARTNERS