The (hopefully) definitive guide to load balancing Lync Edge Servers with a Hardware Load Balancer

 

Having worked on a few large Lync deployments recently, I have realized that there is still a lot of confusion around properly architecting the network for load balancing Lync Edge Servers. Guidance on this subject has changed from OCS 2007 to OCS 2007 R2 and now to Lync Server 2010, and it's important that care is taken while planning the design. It's also important to know that although a certain architecture may seem to work, it could be very far from best practice. I'll explain what I mean by that below.

 

The main purpose of Edge Services is to allow remote (whether they are corporate, anonymous, federated, etc) users to communicate with other external/internal users and vice versa. If you're looking to extend your Lync deployment to support communication with federated partners, public IM services, remote users and such, then you'll want to make sure you deploy your Edge Servers properly.

 

This post will discuss some requirements and best practices for deploying Edge Servers, and then we'll go into some suggested architectures. For this discussion, let's assume that there are 3 device types within your DMZ; your firewall, your BIG-IP LTM, and your Lync Edge Server farm.

 

Requirement 1: Your Edge Servers need at least 2 network interfaces; one or more dedicated to the external network, and one dedicated to the internal. The external and internal interfaces need to be on separate IP networks.

 

The Edge Server will host 3 separate external services; Access, Web Conferencing, and Audio/Visual (A/V). If you plan on exposing all 3 services for remote users, you have a choice of using one IP for all 3 services on each server and differentiate them by TCP/UDP port value, or go with a separate IP for each service and use standard ports.

  

Best Practice: This is more preference than best practice, but I like to use 3 separate IPs for these services. With alternative ports/port mapping, you can consolidate to a single IP, but unless you have a very specific reason for doing so, its best to stick with 3 separate IPs. You do burn more IPs by doing this, but you'll have to use non-standard ports for certain services if you use a single IP, and this could lead to issues with certain network devices that like certain traffic types on certain ports. Plus, troubleshooting, traffic statistics, logging are all cleaner if you are using 3 separate IPs.

  

Requirement 2: Traffic that is load balanced to the Lync Edge servers needs to return through the load balancer. In other words, if the hardware load balancer sends traffic to an Edge Server, the return traffic from that Edge Server needs to flow back through the load balancer. There are 2 common ways to ensure that return traffic flows through the load balancer. You can…

  1. Use routing, and have the Edge Servers point to the load balancer as their default gateway.
  2. Enable SNAT on the load balancer, which rewrites the source IP of the connection to a local network address as the traffic passes through the load balancer. In this case, the Edge Servers will believe that a local client generated the connection and send the responses back to that local address.

 

So there are your two options, which I will refer to as Routing and SNATting. With Routing, your Edge Server will rely on its routing table to route the return traffic out through the load balancer. No obscuring of the source IP address will happen on the load balancer, but you will have to make sure your default gateway & routing tables are correct. With SNATting, you can ensure return traffic goes back through the load balancer and not have to worry about the routing table to take care of this. The drawback to SNATting is that the load balancer will obscure the source IP of the packet as it passes through the load balancer.

 

I will explain below why the SNAT idea is less than ideal, primarily for A/V traffic.

 

Best Practice: You can SNAT traffic to the Web Conferencing and Access services on the Edge Server, but do not SNAT traffic to the A/V Edge Services. By obscuring the client's IP Address when using SNAT, you limit the ability for the A/V Services to connect clients directly to each other, and this is important when clients try to set up peer 2 peer communication, such as a phone call. When using SNAT, The A/V services will not see the client's true IP, so the likelihood of the Edge Server being able to orchestrate the 2 clients to communicate directly with each other is reduced to nil. You'll force the A/V services to utilize its fallback method, in which the P2P traffic will actually have to use the A/V server as a proxy between the 2 clients. Now this 'proxy' fallback mode will still happen from time to time even when your not SNATting at the BIG-IP (for example, multiparty calls will always use 'proxy'), but when you can, its best to minimize the times that users have to leverage this fallback method. So even though SNATting connections to the A/V Edge Service will seem to work, it is far from desirable from a network perspective!

 

FYI - Every load balanced service in a Lync Environment (including Lync FE's, Directors, etc) can be SNAT'ed except for the A/V Edge Service.

 

Requirement 3: Certain connections will need to be load balanced to the Edge Services, while certain connections will need to be made directly to those Edge Services.

 

Best Practice: Make sure clients can connect to the Virtual IP(s) that are load balancing the Edge Services, as well as make sure that clients can connect directly to the Edge Servers themselves. Typically users will hit the load balancer on their first incoming connection and get load balanced, but if a user gets invited to a media session that has started on an Edge Server, the invite they receive will point them directly to that server. NAT awareness was built into Lync 2010 to help in environments in which Edge Servers are deployed behind NATs. By enabling the NAT awareness, Edge Servers will refer clients to their respective NAT address in order to route the users in correctly.

 

Do I need to use routable IPs on the external interface of my Edge Servers? Microsoft says you do, and I would recommend doing so if you can. I have worked on deployments where non-routable IPs are being used (leveraging NATs to allow direct access) and not run into any issues. Just be sure that the Edge Servers are aware of their NAT address.

 

Best Practice: Suggested Deployment "DNAT in, SNAT out" on the Load Balancer
”DSNAT in, SNAT out” was derived from discussions with a certain MSFT engineer who helped me build this guidance. I’d love to give him credit (he knows Lync networking better than anyone I have ever talked to!!), but if named this person, his/her phone would never stop ringing for architecture guidance !!. Back to the subject, if you keep to "DNAT in, and SNAT out” for external-side Lync Edge traffic, your deployment will work! It sums it up very well!

 

So you're ready to architect your Edge Server Deployment. Lets take all the information from above and build a deployment. Keep these things mind…..

 

External Side of the Edge Servers
-Plan for VIPs on your BIG-IP to load balance the 3 external services that your Edge Server Provides (Access, WebConferencing, A/V)
-Plan for direct (non-load balanced) access to your Edge Servers by external clients
-Plan a method to allow Edge Servers to make outbound connections (forwarding VIP or SNAT on BIG-IP)
-Point the Edge Server's Default Gateway to the Self IP of the BIG-IP
-Point the BIG-IP's Default Gateway to the Router

-Do not SNAT traffic to the A/V Services on the Edge Servers
If you use non-routable IPs on the external Interfaces of the Edge Servers, create a NAT on the BIG-IP for each Edge Server. Make sure the Edge Servers are aware of these NAT addresses so they can hand them out to clients who need to connect directly to Edge Server.

 

Internal Side of the Edge Servers
-Plan for VIPs on your BIG-IP to load balance ports 443, 3478, 5061, and 5062 on the internal interfaces of your Edge Servers
-Plan for direct (non-load balanced) access to your Edge Servers
-Make sure your Edge Servers have routes to the internal network(s)
-You can SNAT traffic to the internal interface of the Edge Servers

I'll leave you with an example of a fully supported configuration (i.e. using routable IP Addresses all around). Keep in mind, this is not the only way to architect this, but if you have the available public IP address space, this will work.

 

 

 

Wow… so much for a short post. I welcome any and all feedback, and I promise to update this post with new information as it comes in. I'll also augment this post with more details & deployments as I find time to write them up, so check back for updates. This may even end up as a guide some day!

Version 1.0 date 7/14/2011

Version 1.1 date 2/15/2011 - Fixed a few typos. Fixed some heinous formatting

Published Jul 14, 2011
Version 1.0
  • We have implemented more or less successfully, but see a couple of things that are not in line with this guide:

     

    • We have implemented internet routable IP addresses on the Edges. Above it says the F5 VIP will be used on first connection or when using multiparty calls. We have now tested a couple of scenarios with external clients, but we NEVER see any connection being made to the internet routable IP addresses of the edges. All AV traffic flows via the VIP. Edges routes the traffic back to the F5 and SNAT is disabled.
    • What is best practice for load balancing the Front Ends? This article focuses on the Edges. We have now setup Front End load balancing via Round Robin, but also want to load balance with F5. I guess the same goes for the Front Ends: have routes to internal clients back to the F5 and don't do SNAT. Correct?
  • Hello. Months later we are back at this. Per chance is there a list of ports that need to be allowed for clients to connect direct back to the edge servers?
  • We are just entering into this and reading the deployment guide it indicates the edge external needs to be a public addressable IP. What security approved was to have the Edge server behind the F5 on a DMZ VLAN. So from the above it sounds like it is possible to do this by configuring Lync by enabling the AV NAT and providing the public address that F5 will route through to the Edge server. Can someone confirm this type of setup and what the downside of this approach is? Otherwise my Edge will need to go on the Public DMZ, which they want to avoid (yeah there would be a firewall between it and the public-- but no F5). Thanks!!
  • Hello,

     

     

    Thanks for the info...Great stuff!!! Can you elaborate more on what you mean by, "Best Practice: Suggested Deployment "DNAT in, SNAT out" on the Load Balancer." Thanks

     

     

  • Great article – thanks. We deployed and couldn’t get AV going at all from External until as above we put Public IPs on the Edge Server external interfaces and the F5 DMZ Interfaces connecting to the edge external. The bit that confused me was how routing outbound from the edge could work via the F5. We ended up logging a call and setup a Wildcard VS outbound on the DMZ (VLAN 9) in only as per SOL7595. VS Settings ; Type :Forwarding IP, Source IP :x.x.x.x%9/29 (mask covers all our possible edge IPs, as we found edge initiates traffic from each IP assigned).

     

    Dest Network 0.0.0.0%9, mask 0.0.0.0, service: all ports, all protocols, enabled on DMZ_VLAN only, Source addr translation None, Protocol Profile FastL4_Loosinit_LooseClose

     

    Where the above FastL4_Looseinit_Loosclose profile has the settings; Parent : fastL4, Reset on Timeout: Enabled, Idle Timeout: Immediate, Loose Init: Enabled, Loose Close: Enabled

     

    We then set the default gateway on the Edge server to the floating Self IP assigned to the F5 DMZ, and it all worked nicely.

     

    Couple of gotchyas we had on the way.

     

    Error 1 : Incorrectly setting the iApp to say Edge Internal Route is via the BIG IP, when in fact the edge routes directly to the internal VLAN via its internal interface, is bad. We could do external to external av calls, and internal to internal av, but internal to external or vice versa would not connect at all. Was getting a message “Call failed to establish due to a media connectivity failure when one endpoint is internal and the other is remote” :

     

    Error 2: Found av calls to internal would work for 5 seconds, and then the session would drop after 35 seconds. Audio inbound would work for the full 35 seconds, and audio outbound would work for 5 seconds only before terminating with Network Failure. Error in logs was “Call terminated on a mid-call media failure where one endpoint is internal and the other is remote”. This was due to our initial setup of the wildcard VS to route outbound, only accepting traffic from the primary IPs on the edge servers. Once we expanded the the Source IP to include the entire edge Server External IP range, traffic flowed no problem.

     

    Hope this helps someone else. Cheers

     

  • Amazing article !!! Solved a complex Lync implementation for a client using this article. Thanks a ton!
  • Hi, great artical even if it is old it is still relevent.

     

    have a lync 2010 edge servers load balanced using the F5-LTM and have poor audio quality for external users. i have been stepping through the IAPP for lync to see if there is anything missing with no luck. your guide has been useful, however i am not sure what a DNAT is?

     

     

    Can you confirm if using the Lync iAPP follows best practice or is there something i should be adjusting. i.e.load balancing method?