Scaling things seems like such a simple task, doesn’t it? Open a new checkout line. Call a new teller to the front. Hire another person.
But under the covers in technology land, where networking standards rule the roost, it really isn’t as simple as just adding another “X”. Oh, we try to make it look that simple, but it’s not. Over the years we (as in the industry ‘we’) have come up with all sorts of interesting ways to scale systems and applications within the constraints that IP networking places upon us.
One of those constraints is that physical (L2) and logical (L3) addresses have a one-to-one mapping. Oh, I know it looks like packets are routed based on IP address but really, they aren’t. They’re routed based on physical (MAC) addresses. And every switch (and router) keeps track of that mapping. 10.1.1.1? That goes to physical device A. 10.1.1.2? That goes to physical device B.
That makes scalability a lot harder than you might think because horizontal scaling uses clusters of things, all with their own IP-MAC combination. After all, if a user makes a request to 10.0.0.1, it’s going to be mapped somewhere to a single, specific device. When you’re trying to distribute traffic across multiple devices (to scale) means somehow figuring out how to remap those associations. And while you could do that, it would totally destroy network performance and throughput.
Think of the router like a building. When it’s a house it probably has one name (logical) and one address (physical). It’s easy to route to it, we know where it is.
But if we’re trying to scale up housing, we might replace the house with an apartment building. Now we have one logical (names) going to the multiple physical (address). They can’t all share the same logical address. The post man can get to the right address, but can’t figure out how to find Bob to deliver the mail.
So we add a bellman (proxy, load balancer, ADC) to the equation and give him the logical and physical address. All the apartments get their own physical and logical address (like “Apt 3”) and we ask the bellman to deliver the mail to not only Bob but Alice and Mary and Frank, too.
We give the bellman a Virtual IP (VIP) address to represent the cluster of “things” (apartments). That’s how we mask the complexity of actually distributing load across multiple devices (servers, apps, instances). That user doesn’t talk directly to any app instance or server; it talks to the VIP, which in turn determines to which instance to forward the request. Because we’re sitting in the middle, we can exploit the one-to-one mapping between the physical and the logical by presenting ourselves as the endpoint and then distribute requests and traffic across a cluster to achieve really high scale of all sorts of things. The IP-MAC association is preserved and there’s no need for tricks to get around it.
But (and you knew there was a but coming, didn’t you?) there are times when that one-to-one mapping thing gets in the way. After all, a VIP is still an IP, and it is constrained by the same one-to-one mapping requirement. So when you need to scale the VIP by adding more devices (cause your apartment building is really flourishing and one bellman just isn’t enough anymore), you run into the same problem as we just ran into when we tried to scale the app. To which node in the VIP cluster should traffic be routed? Who gets to hold the VIP address so the upstream router or switch knows where to send those packets? Turns out the solution is similar in nature, we have to go upstream to figure out how to fix it.
Except upstream is a standard switch or router that expects –nay, it demands – a one-to-one mapping between physical (L2) and logical (L3) addresses.
This is where SDN comes in handy and provides some secret (network) scalability sauce.
One of the neat things about OpenFlow-based SDN is its use of OpenFlow and in particular the use of match/action programmability. Rather than hard-coding routes in the configuration, OpenFlow-enabled devices are able to programmatically consult a flow table that allows it to match attributes – like source IP address – with an action, like forwarding to a specific port or changing the destination MAC (physical) address.
You see where I’m going with this, right?
By taking advantage of this capability we can use an upstream, OpenFlow-enabled SDN switch to change the destination MAC (physical) address based on matching attributes. That means that we can work around that pesky one-to-one mapping problem that might prevent us from scaling out the bellman (proxy). But that’s not all we need. Scalability today is one part technical and another part operational. We can’t maintain operational efficiency if we have to manually adjust flow tables whenever we add (or remove) a proxy based on demand. So we need some magic, and that magic is programmability.
Whenever the cluster changes (a proxy is added or removed) we need to be able to update the upstream (SDN) switch so it knows how to distribute traffic to the proxies. To do that, we use programmability to use the control-plane separation inherent in SDN and update the flow tables, keeping them accurate so traffic is directed appropriately automagically. So not only can we increase the size of the apartment building, but we can scale out the number of bellmen we need make sure mail is delivered to the right apartment at the right time.
It turns out that in large implementations – like those you might find in a service provider’s network – that this is a really handy thing to be able to do, especially as Network Functions Virtualization (NFV) takes hold and app services are increasingly delivered via virtual appliances. Scalability becomes critical by necessity; virtual appliances are increasing capacity at breakneck speed but still aren’t able to match their purpose-built hardware counterparts. Scale of “the network”, therefore, is key to keeping up with demand and keeping costs down. But to do that requires some magic, and that magic turns out to be SDN.
You can dig even deeper into just such a solution by taking a gander at my colleague Christian Koenning’s blog right here on DevCentral, or go straight to the source and check out the iApp/iRule solution we’ve developed to automagically handle this scenario.