Exchange2010 SNAT pool persistence

Problem this snippet solves:

Exchange 2010 CAS Array

Exchange 2010 introduces a major change in topology with its CAS (Client Access Service) Array mechanism. In earlier versions of Exchange, the user would ask a CAS server for a reference to a Mailbox server, then connect directly to it. Now, a user will connect to his mailbox store VIA the CAS server. This was done to achieve a level of Mailbox server high-availability but now requires a CAS server to become a highly-available system in its own right.

F5 has published an excellent guide to integration of LTM and Exchange 2010 (http://www.f5.com/pdf/deployment-guides/f5-exchange-2010-dg.pdf). However, there are a few things about it that need to be taken into account for larger deployments.

CAS Server Role

CAS servers in Exchange 2010 wear many hats. They host ActiveSync services for mobile devices, host Outlook Web Access instances, perform RPC over HTTPS for mobile Outlook clients, and support direct MAPI connections via RPC from those same Outlook clients. Outlook prefers to use direct RPC if possible, and you need only compare the experience of downloading a global address book from RPC over HTTPS to direct RPC to know how important the RPC support can be.

The CAS Array is merely a group of CAS servers that, through configuration, are able to share a common namespace. They're designed to be placed behind a loadbalancer - even when speaking RPC.

Load Balancing RPC

The instructions related to configuring for RPC clients within the F5 Deployment Guide are unfortunately incorrect when dealing with larger CAS arrays. Two items must be considered: the persistence profile and the SNAT pool configuration.

The guide indicates that a source address persistence profile (exch-rpc-persist) with a timeout of 3600 seconds should be created. However, Exchange RPC connections are guided by a portmapper service that has an internal timeout of 7200 (!) seconds. Additionally, you must check the option to "Match Across Services". RPC will use port 135 to map between 4 and 10 additional TCP connections on other destination ports. These connections must all persist to the same CAS member or the connection will fail.

There is a mechanism ostensibly for security that causes negotiated RPC connections to fail with a context fault if the source IP address changes during the connection. It is expected that a well-behaved RPC client, when changing IP addresses, will shut down and re-establish RPC connections in an orderly manner. Insert a device that makes NAT-type IP changes between the client and the user, and you must ensure that the address does not change on-the-fly.

The deployment guide states that a SNAT pool would need to be used to support a large Exchange infrastructure. However, SNAT pool masquerading IPs are typically assigned in a round-robin fashion. Applying a SNAT pool to the RPC VIP will cause the RPC context to abort anytime the SNAT IP changes; with larger SNAT pools, this may happen very often and will manifest itself in the Outlook client failing back to RPC-over-HTTPS over a period of time. With other MAPI-integrated products, it may cause an inability to connect at all.

The solution is to make the SNAT pool IP assignment as persistent as possible to the incoming connection's IP address. The following iRule will do this:

Code :

when RULE_INIT {

   # Set your SNAT pool members.
   #   
   # The list should contain all the same IP addresses as your SNAT pool configuration does.
   # Note that this does not exempt you from applying a SNAT pool at the VS level; you must
   # still do that. This must be done in RULE_INIT until such time as there is a way to get
   # a list of SNAT pool members via an iRule function.   
   
   set static::snpool { 10.x.x.1 10.x.x.2 10.x.x.3 10.x.x.4 }

   # If using OPTION 2, uncomment the following to seed the FNV hash.
   set static::fnv_hash 0x811c9dc5
   set static::fnv_prime 0x01000193

}

when CLIENT_ACCEPTED {

    #-----------
    # OPTION 1
    # Convert the incoming IP to Hex and set the snat pool member based on the modulo of the full IP.
    #
    # This option is fine and quite quick for most purposes, but depending on how your organization
    # assigns IP addresses via DHCP may not lead to a high degree of randomness. This could cause more
    # incoming connections to prefer a particular SNAT pool address. Uncomment the following lines to
    # enable it.

    #set octets [split [IP::remote_addr] .]
    #if { [llength $octets] != 4 } {
    #    set octets [lrange [concat $octets 0 0 0] 0 3]
    #} 
    #binary scan [binary format c4 $octets] H8 packed_address
    #set packed_address [format 0x%x [expr $packed_address & 0xffffffff]]

    #----------
    # OPTION 2
    # Generate an FNV 64-bit hash of the remote IP address on a connected client.
    #
    # This option requires more CPU time at the F5 but will inject a good degree of randomness
    # in SNAT selection.

    for { set fnv_i 0 } { $fnv_i < [string length [IP::remote_addr]] } { incr fnv_i } {
        binary scan [IP::remote_addr] @${fnv_i}H2 fnv_str_i
        set fnv_hash [expr {$static::fnv_hash ^ "0x$fnv_str_i"}]
        set fnv_hash [expr {$fnv_hash * $static::fnv_prime}]
    }
    set packed_address [format 0x%x [expr $fnv_hash & 0xffffffff]]

    # Select a SNAT based on the modulo of the FNV hash or hex-converted IP address according
    # to the size of the SNAT pool configured above.

    snat [lindex $static::snpool [eval { expr $packed_address % [llength $static::snpool] }]]

    #log local0. "Selected [lindex $static::snpool [eval {expr $packed_address % [llength $static::snpool] }]] from pool of size [llength $static::snpool] due to findng modulo [eval {expr $packed_address % [llength $static::snpool]}] from [IP::remote_addr]"
     
}
Published Mar 17, 2015
Version 1.0
  • Does this iRule wok well with route domains? We've tried to used this iRule but we end up with SNAT pool using only single address from the defined SNAT pool addresses. We use the route domains, so I'm thinking whether this could be a reason?