The (hopefully) definitive guide to load balancing Lync Edge Servers with a Hardware Load Balancer
Having worked on a few large Lync deployments recently, I have realized that there is still a lot of confusion around properly architecting the network for load balancing Lync Edge Servers. Guidance on this subject has changed from OCS 2007 to OCS 2007 R2 and now to Lync Server 2010, and it's important that care is taken while planning the design. It's also important to know that although a certain architecture may seem to work, it could be very far from best practice. I'll explain what I mean by that below.
The main purpose of Edge Services is to allow remote (whether they are corporate, anonymous, federated, etc) users to communicate with other external/internal users and vice versa. If you're looking to extend your Lync deployment to support communication with federated partners, public IM services, remote users and such, then you'll want to make sure you deploy your Edge Servers properly.
This post will discuss some requirements and best practices for deploying Edge Servers, and then we'll go into some suggested architectures. For this discussion, let's assume that there are 3 device types within your DMZ; your firewall, your BIG-IP LTM, and your Lync Edge Server farm.
Requirement 1: Your Edge Servers need at least 2 network interfaces; one or more dedicated to the external network, and one dedicated to the internal. The external and internal interfaces need to be on separate IP networks.
The Edge Server will host 3 separate external services; Access, Web Conferencing, and Audio/Visual (A/V). If you plan on exposing all 3 services for remote users, you have a choice of using one IP for all 3 services on each server and differentiate them by TCP/UDP port value, or go with a separate IP for each service and use standard ports.
Best Practice: This is more preference than best practice, but I like to use 3 separate IPs for these services. With alternative ports/port mapping, you can consolidate to a single IP, but unless you have a very specific reason for doing so, its best to stick with 3 separate IPs. You do burn more IPs by doing this, but you'll have to use non-standard ports for certain services if you use a single IP, and this could lead to issues with certain network devices that like certain traffic types on certain ports. Plus, troubleshooting, traffic statistics, logging are all cleaner if you are using 3 separate IPs.
Requirement 2: Traffic that is load balanced to the Lync Edge servers needs to return through the load balancer. In other words, if the hardware load balancer sends traffic to an Edge Server, the return traffic from that Edge Server needs to flow back through the load balancer. There are 2 common ways to ensure that return traffic flows through the load balancer. You can…
- Use routing, and have the Edge Servers point to the load balancer as their default gateway.
- Enable SNAT on the load balancer, which rewrites the source IP of the connection to a local network address as the traffic passes through the load balancer. In this case, the Edge Servers will believe that a local client generated the connection and send the responses back to that local address.
So there are your two options, which I will refer to as Routing and SNATting. With Routing, your Edge Server will rely on its routing table to route the return traffic out through the load balancer. No obscuring of the source IP address will happen on the load balancer, but you will have to make sure your default gateway & routing tables are correct. With SNATting, you can ensure return traffic goes back through the load balancer and not have to worry about the routing table to take care of this. The drawback to SNATting is that the load balancer will obscure the source IP of the packet as it passes through the load balancer.
I will explain below why the SNAT idea is less than ideal, primarily for A/V traffic.
Best Practice: You can SNAT traffic to the Web Conferencing and Access services on the Edge Server, but do not SNAT traffic to the A/V Edge Services. By obscuring the client's IP Address when using SNAT, you limit the ability for the A/V Services to connect clients directly to each other, and this is important when clients try to set up peer 2 peer communication, such as a phone call. When using SNAT, The A/V services will not see the client's true IP, so the likelihood of the Edge Server being able to orchestrate the 2 clients to communicate directly with each other is reduced to nil. You'll force the A/V services to utilize its fallback method, in which the P2P traffic will actually have to use the A/V server as a proxy between the 2 clients. Now this 'proxy' fallback mode will still happen from time to time even when your not SNATting at the BIG-IP (for example, multiparty calls will always use 'proxy'), but when you can, its best to minimize the times that users have to leverage this fallback method. So even though SNATting connections to the A/V Edge Service will seem to work, it is far from desirable from a network perspective!
FYI - Every load balanced service in a Lync Environment (including Lync FE's, Directors, etc) can be SNAT'ed except for the A/V Edge Service.
Requirement 3: Certain connections will need to be load balanced to the Edge Services, while certain connections will need to be made directly to those Edge Services.
Best Practice: Make sure clients can connect to the Virtual IP(s) that are load balancing the Edge Services, as well as make sure that clients can connect directly to the Edge Servers themselves. Typically users will hit the load balancer on their first incoming connection and get load balanced, but if a user gets invited to a media session that has started on an Edge Server, the invite they receive will point them directly to that server. NAT awareness was built into Lync 2010 to help in environments in which Edge Servers are deployed behind NATs. By enabling the NAT awareness, Edge Servers will refer clients to their respective NAT address in order to route the users in correctly.
Do I need to use routable IPs on the external interface of my Edge Servers? Microsoft says you do, and I would recommend doing so if you can. I have worked on deployments where non-routable IPs are being used (leveraging NATs to allow direct access) and not run into any issues. Just be sure that the Edge Servers are aware of their NAT address.
Best Practice: Suggested Deployment "DNAT in, SNAT out" on the Load Balancer
”DSNAT in, SNAT out” was derived from discussions with a certain MSFT engineer who helped me build this guidance. I’d love to give him credit (he knows Lync networking better than anyone I have ever talked to!!), but if named this person, his/her phone would never stop ringing for architecture guidance !!. Back to the subject, if you keep to "DNAT in, and SNAT out” for external-side Lync Edge traffic, your deployment will work! It sums it up very well!
So you're ready to architect your Edge Server Deployment. Lets take all the information from above and build a deployment. Keep these things mind…..
External Side of the Edge Servers
-Plan for VIPs on your BIG-IP to load balance the 3 external services that your Edge Server Provides (Access, WebConferencing, A/V)
-Plan for direct (non-load balanced) access to your Edge Servers by external clients
-Plan a method to allow Edge Servers to make outbound connections (forwarding VIP or SNAT on BIG-IP)
-Point the Edge Server's Default Gateway to the Self IP of the BIG-IP
-Point the BIG-IP's Default Gateway to the Router
-Do not SNAT traffic to the A/V Services on the Edge Servers
If you use non-routable IPs on the external Interfaces of the Edge Servers, create a NAT on the BIG-IP for each Edge Server. Make sure the Edge Servers are aware of these NAT addresses so they can hand them out to clients who need to connect directly to Edge Server.
Internal Side of the Edge Servers
-Plan for VIPs on your BIG-IP to load balance ports 443, 3478, 5061, and 5062 on the internal interfaces of your Edge Servers
-Plan for direct (non-load balanced) access to your Edge Servers
-Make sure your Edge Servers have routes to the internal network(s)
-You can SNAT traffic to the internal interface of the Edge Servers
I'll leave you with an example of a fully supported configuration (i.e. using routable IP Addresses all around). Keep in mind, this is not the only way to architect this, but if you have the available public IP address space, this will work.
Wow… so much for a short post. I welcome any and all feedback, and I promise to update this post with new information as it comes in. I'll also augment this post with more details & deployments as I find time to write them up, so check back for updates. This may even end up as a guide some day!
Version 1.0 date 7/14/2011
Version 1.1 date 2/15/2011 - Fixed a few typos. Fixed some heinous formatting