Forum Discussion

tihi_341714's avatar
tihi_341714
Icon for Nimbostratus rankNimbostratus
Mar 14, 2018

During ike rekey in a s2s IPsec config some tunnels won't reestablish

Hi,

 

I would like some help regarding an IPsec problem we are experiencing in our DC. We have a few different route domains in our F5. Two different RDs are configured for IPSec to two different remote sites.

 

The only thing common between the two connections is that both remote device is a Cisco ASA. One is an ASA5520 on 7.2(4) and the other one is an ASA5585 on 9.2(4)14.

 

Here are the details of the IPsec configuration:

 

PHASE1

 

Version:IKE v1

 

Authentication algorithm:SHA-1

 

Encryption algorithm:AES256

 

Perfect forward secrecy/dh-group:MODP1536

 

Lifetime:1440

 

Authentication method:PSK

 

Mode:Main

 

NAT Traversal:ON

 

DPD Delay:30 sec

 

Replay window size:64 packets

 

PHASE2

 

IPsec protocol:ESP

 

Mode:Tunnel

 

Authentication algorithm:SHA-1

 

Encryption algorithm:AES256

 

Perfect forward secrecy:MODP1536

 

Lifetime:1440

 

It has been verified by both sides multiple times that the configuration is exactly the same. Also, we are the ones using NAT-T. We have an external router where the public ip address is NATed to the F5.

 

The problem is that during ike rekeying some tunnels won't reestablish. Only some will, but not all. For example in one ipsec there are 3 traffic selectors. Traffic is flowing through in all 3 of them when everything is fine. After the rekeying only one will work and we have to clear the whole ipsec to make it work again.

 

What we found so far that the ASAs will start rekeying at 75% of the lifetime (so in our case around 18 hours)

 

https://www.cisco.com/c/en/us/support/docs/security/asa-5500-x-series-next-generation-firewalls/81824-common-ipsec-trouble.htmlvpndisc

 

According this document it's not a problem. However, almost always the tunnels won't come up. (There have been a few occasions when for some magical reason they came up but it's pretty rare..)

 

Log from the ASA when rekeying starts at 18 hours.

 

Mar 7 02:50:51 asa %ASA-4-113019: Group = 1.2.3.4, Username = 1.2.3.4, IP = 1.2.3.4, Session disconnected. Session Type: IPSecLAN2LANOverNatT, Duration: 18h:00m:29s, Bytes xmt: 4133553397, Bytes rcv: 2396963220, Reason: IKE Delete

 

Here are the logs from the racoonctl log, as it is too long to paste it here:

 

https://pastebin.com/H39ZbYLS

 

So the conclusion so far is that there is traffic between the peer IPs, even when the problem occurs. The traffic in the IPsec SAs goes back and forth continuously. When the IKE rekey happens the old IKE SA closes and a new one is created and the IPsec SAs are renewed. For a second the traffic in the IPsec SAs breaks but then continues to flow once again. But when the error happens not every IPsec SA reestablishes and we can only see timeouts in the logs.

 

I hope you can help. The clients are a "bit" mad about this issue.

 

Thanks.

 

  • The first thing to note is that I have no idea how well the ASA 7.2(4) software version works, there could be bugs there, but let's assume that this isn't peer specific.

     

    Re-key problems like this can occur when NAT Detection did not actually happen during the phase 1 SA setup. So ensure that NAT detection is enabled on all peers. For NAT-D to work, both peers have to agree on one of the NAT RFCs or drafts. You can tell that NAT-D worked, if your ESP traffic is encapsulated in UDP port 4500.

     

    By the way, the soft lifetime on an SA is 80%, upon which the BIG-IP will attempt to create a new SA, but the old SA will live until the hard lifetime expires.

     

    The ASA, at least on version 9.x, has a tendency to rip down all the tunnels during an SA delete but not find the time to notify the peer about all those prematurely deceased SAs. Most vendors that see a delete (not an expire) on a phase 1 SA will tear down all the phase2 as well. The BIG-IP does not do that, it assumes (correctly) that the phase 2 SA is still up. The upshot is that if the tunnels are generally initiated on the BIG-IP side, the SA will remain in use by the BIG-IP until it determines to renew that SA.

     

    If traffic was frequently initiated on either side of each (phase 2) tunnel/selector, you probably wouldn't notice the issue. Setting up a permanent PING on both sides for every tunnel/selector may actually work-around the issue if this is what you're seeing.

     

    It's worth noting that the problem you've described has consistently been reported by customers using an ASA peer, it doesn't seem to happen with other vendors. There are some bugfixes in place that take a more aggressive approach to deleting SAs so as to avoid the described scenario, but I do not believe all the required code is in a mainstream release (current is 13.1) yet. F5 Support could help you there.

     

  • zeiss_63263's avatar
    zeiss_63263
    Historic F5 Account

    The first thing to note is that I have no idea how well the ASA 7.2(4) software version works, there could be bugs there, but let's assume that this isn't peer specific.

     

    Re-key problems like this can occur when NAT Detection did not actually happen during the phase 1 SA setup. So ensure that NAT detection is enabled on all peers. For NAT-D to work, both peers have to agree on one of the NAT RFCs or drafts. You can tell that NAT-D worked, if your ESP traffic is encapsulated in UDP port 4500.

     

    By the way, the soft lifetime on an SA is 80%, upon which the BIG-IP will attempt to create a new SA, but the old SA will live until the hard lifetime expires.

     

    The ASA, at least on version 9.x, has a tendency to rip down all the tunnels during an SA delete but not find the time to notify the peer about all those prematurely deceased SAs. Most vendors that see a delete (not an expire) on a phase 1 SA will tear down all the phase2 as well. The BIG-IP does not do that, it assumes (correctly) that the phase 2 SA is still up. The upshot is that if the tunnels are generally initiated on the BIG-IP side, the SA will remain in use by the BIG-IP until it determines to renew that SA.

     

    If traffic was frequently initiated on either side of each (phase 2) tunnel/selector, you probably wouldn't notice the issue. Setting up a permanent PING on both sides for every tunnel/selector may actually work-around the issue if this is what you're seeing.

     

    It's worth noting that the problem you've described has consistently been reported by customers using an ASA peer, it doesn't seem to happen with other vendors. There are some bugfixes in place that take a more aggressive approach to deleting SAs so as to avoid the described scenario, but I do not believe all the required code is in a mainstream release (current is 13.1) yet. F5 Support could help you there.