Forum Discussion

Stefan_Klotz's avatar
Stefan_Klotz
Icon for Cumulonimbus rankCumulonimbus
May 29, 2019

NAT not working after software update

Hi there,

long time ago, that I had a question here.

I'm currently working on a F5 project, where we have to perform a hardware refresh including a software update.

The old environment consists of two LTM-pairs, one without Route Domains and the second with two Route Domains. The old platform is running on 10.2.4.

The new environment consists only of one LTM-pair now with three Route Domains (all consolidated into one cluster). The new platform is running on 14.1.1.

All VS, NATs and floating-IPs are identical, means we first isolated the old environment (by disabling the tmm-interfaces) and then enabled the new environment.

During migration our pain point was, that the configured NATs were not working correctly and I'm now wondering of there is maybe a change in behavior or if we made something wrong. As an example, we saw the following result with tcpdump.

Configured NAT xxx.189.21.151 origin 192.168.35.51

 

Old working environment:

03:22:10.884644 IP xxx.189.5.152 > 192.168.35.51: ICMP echo request, id 15796, seq 980, length 64

03:22:10.884920 IP xxx.189.21.151 > xxx.189.5.152: ICMP echo reply, id 15796, seq 980, length 64

 

New not working environment:

03:21:14.934663 IP xxx.189.5.152 > xxx.189.21.151: ICMP echo request, id 15796, seq 125, length 64 in slot1/tmm0 lis=

03:21:15.940626 IP xxx.189.5.152 > xxx.189.21.151: ICMP echo request, id 15796, seq 126, length 64 in slot1/tmm0 lis=

 

So from this output it looks like, that the ICMP echo request will not correctly translated to the origin address.

Is this true or how can I interpret this? How can I verify that NATing is correctly working? Or do I have to check any additional (new) settings?

Or is this maybe any kind of ARP-issue (how can I force the new LTMs to propagate its new MAC-addresses)?

In case you require more details about the setup, please let me know.

Thank you!

 

Ciao Stefan :)

  • SWJO's avatar
    SWJO
    Icon for Cirrostratus rankCirrostratus

    If packet outbound interface and inbound interface are different vlan, BIGIP would be drop.

    then you have to modify vlan.k​eyed connections enable -> disable.

  • ​Hi Nils,

    thank you for this hint, but we already found this setting this night as well.

    We changed it to "All Traffic" like in your screenshot, but still the same behavior. Or is this setting not active immediately? And we also compared it with the settings of the old cluster running on 10.2.4, but here also the default "TCP & UDP only" were configured. Or is this any kind of change in behavior between these versions?

    As visible in the provided tcpdump, we verified that icmp will be accepted and is not denied. But it looks like that the internal communication between the BIGIP and the internal server is faulty. Either the packets will not be send out at all or a wrong context/Interface will be used. How can this be further analyzed?

    Thank you!

     

    Ciao Stefan :)

  • Hi again,

    as the upgrade was not successful the last time, we started another try this night.

    It looks much better, but still ping seems to be not forwarded correctly through the BIGIP.

    We identified the following strange behavior.

    When we ping from an outside device to an internal server, we see the icmp request packets via tcpdump, but no icmp replies.

    But when we try the ping the other way round we see the following in tcpdump:

    [logaric@lbz01:Active:In Sync] ~ # tcpdump -ni 0.0 host xxx.189.16.92 and icmp

    tcpdump: verbose output suppressed, use -v or -vv for full protocol decode

    listening on 0.0, link-type EN10MB (Ethernet), capture size 65535 bytes

    01:59:13.542536 IP xxx.189.16.92 > xxx.189.5.152: ICMP echo request, id 25352, seq 20, length 64 out slot1/tmm0 lis=/zone3/nat_192.168.34.92_xxx.189.16.92,SRC_NAT

    01:59:13.542736 IP xxx.189.5.152 > xxx.189.16.92: ICMP echo reply, id 25352, seq 20, length 64 in slot1/tmm0 lis=

    01:59:14.550496 IP xxx.189.16.92 > xxx.189.5.152: ICMP echo request, id 25352, seq 21, length 64 out slot1/tmm0 lis=/zone3/nat_192.168.34.92_xxx.189.16.92,SRC_NAT

    01:59:14.551599 IP xxx.189.5.152 > xxx.189.16.92: ICMP echo reply, id 25352, seq 21, length 64 in slot1/tmm0 lis=

    01:59:15.558573 IP xxx.189.16.92 > xxx.189.5.152: ICMP echo request, id 25352, seq 22, length 64 out slot1/tmm0 lis=/zone3/nat_192.168.34.92_xxx.189.16.92,SRC_NAT

    01:59:15.558908 IP xxx.189.5.152 > xxx.189.16.92: ICMP echo reply, id 25352, seq 22, length 64 in slot1/tmm0 lis=

    It looks successful, but the icmp reply never reaches the originating server. So for both use cases it looks like traffic get's lost between the BIGIP and the internal server.

    Just for interest, is it normal that the nat-listener is mentioned for the icmp-request, but is empty for the icmp-reply? Is this maybe indicating that the packets will be sent out the wrong interface/context?

    Do you have any further ideas based on these new findings? Or any other settings or troubleshootings we should try or verify?

    Thank you!

     

    Ciao Stefan :)

  • I would think this is related to some stall ARP entry on a networking device. Usually when this is the case I fail over the F5 devices which will indeed send gratuitous ARPs and fix the issue. Make sure all interfaces in the old BIG-IP were indeed shut, I have scenarios where the customer was shutting down interfaces in the switches which were supposed to be connected to the BIG-IP and they were indeed the wrong interfaces, creating chaos in the middle of the cut-over.

     

    Best Regards,

    Oscar Pucheta

    https://www.australtech.net

    https://www.linkedin.com/in/npucheta/