Forum Discussion

tux143's avatar
tux143
Icon for Cirrus rankCirrus
Apr 13, 2020

SNAT pool Inet port exhaustion

We are running XMPP application and we have 500k users so in short 500k TCP persistent connection and total 4 XMPP servers, based on math i created 15 ips SNAT pool (assuming 15 ipaddress are enough for 500k tcp connection) but somehow i am still getting error "Inet port exhaustion" in /var/log/ltm

 

Apr 13 17:57:23 lb-a warning tmm1[12866]: 01010281:4: Inet port exhaustion threshold reached on 10.0.0.61 to 10.10.10.149:5222 (proto 6)

Apr 13 17:57:24 lb-a crit tmm[12866]: 01010201:2: Inet port exhaustion on 10.0.0.68 to 10.10.10.224:5222 (proto 6)

 

Question is how many connection single SNAT IP can make to servers?

  • An IP address can have roughly 65K connections. If you are hitting limits and seeing log messages regarding port exhaustion, then I would suggest to check the persistence settings on the VIP as well as the load balancing method. Technically, you have 4 backend servers (65K connections per self IP to each backend server), then you have 15 SNAT IPs (15 x 4 x 65K...should be a lot). Assuming good load balancing and persistence settings, you should see even traffic distribution. Also check idle timeout settings and long-lived connections to make sure you're nuking clients that are inactive.

     

    Also, you mention SNAT pool. Make sure the VIP is using the SNAT pool in order to properly use all 15 IPs.

  • Currently persistence setting is "None" (what does None means?)

     

    We have Round-Robin for 4 xmpp servers and i can see in statistics traffic equality distributed on 4 servers.

     

    Yes we do have SNAT pool enabled on VIP level, without enabling SNAT pool i can't each to 65k users :)

  • Persistence="none" means you are not sticking any clients to any particular web servers.

     

    Regarding your logs, the two IPs shown are the following:

    10.0.0.61

    10.0.0.68

     

    Are those IPs used in other virtual servers? In other words, are other VIPs/apps also using those SNAT IPs and contributing to SNAT port exhaustion.

     

    I would also do a show ltm pool <name-of-pool> to check pool member stats of your XMPP servers. What are the max concurrent connections ever held by each XMPP server. There will be current, max, and total. Total is all connections over time...not needed right now. Max will show the max concurrent connections ever seen by that pool member. I'm curious how high each server got in terms of connections.

     

    Also, if setup correctly as it sounds then you might be running into something else. There may be long-lived connections consuming those IPs as active flows. Not sure. I would recommend opening an F5 support case and providing a fresh qkview next time you see SNAT port exhaustion.

  • Hey, I beleive you have to apply 900sec time out value to solve this issue. I was facing similar issue.

    The following list includes the default SNAT timeout values:

    • IP address SNAT translation object
    • The default idle timeout is Indefinite.
    • SNAT pool translation object
    • The default idle timeout is Indefinite.
    • SNAT automap
    • The idle timeout is Indefinite

    Hope it will help you

  • Thank Samir,

     

    I will look into, but i found very strange thing. I have two F5 pair

     

    10200 - 12.0.0 software code - Same SNAT pool configuration working fine without filling ephemeral ports

    10350 - 13.1.0.8 software code - Same SNAT pool configuration **not** working and filling ephemeral ports

     

    I will try your logic to set 900s timeout but if you see i am using XMPP persistent TCP connection, there is no re-use of any connection if client make connection it stay connected for lifetime (with keep-alive etc)

     

    • Samir's avatar
      Samir
      Icon for MVP rankMVP

      Raise case with F5 support. They will help here.

  • I am comparing TCP profile of 12.0.0 vs 13.1.8 and there are lots of different options, i wondering something going on there.

     

    If i set idletimeout on VS then do i need to match that idletimeout on SNAT pool also? which will win?

  • Update:

    I dump full connection tables and i can see per SNAT port utilization is 20k around and i start seeing warning in logs, even i reduce SNAT TCP idle timeout to 300sec, there are lots of free source port available but i am getting errors. any thought?

    tmsh show sys connection max-result-limit 30000 ss-client-addr 10.22.0.61