Measuring Management and Orchestration

Early (very early, in fact) in the rise of SDN there were many discussions around scalability. Not of the data plane, but of the control (management) plane. Key to this discussion was the rate at which SDN-enabled network devices, via OpenFlow, could perform “inserts”. That is, how many times per second/minute could the management plane make the changes necessary to adjust to the environment.

It was this measurement that turned out to be problematic, with many well-respected networking pundits (respected because they are also professionals) noting that the 1000 inserts per second threshold was far too low for the increasingly dynamic environment in which SDN was purported to be appropriate.

That was years ago, when virtualization was still the norm. Certainly the scalability of SDN solutions has likely increased, but so too has the environment. We’re now moving into environments where microservices and containers are beginning to dominate the application architecture landscape, characterized by more dynamism than even highly virtualized data centers.

In some ways, that’s due to the differences between virtualization and containers that make it possible to launch a container in seconds where virtual machines took perhaps minutes. Thus the rate of change when microservices – and containers – are in play is going to have a significant impact on the network.

Which brings us back to software-defined strategies like SDN. In an environment where containers are popping up and shutting down at what could be considered an alarming rate, there’s simply no way to manually keep up with the changes required to the network. The network must be enabled through the implementation of some kind of software-defined strategy. Which brings us to the need to measure management and orchestration capabilities.

DevOps has, at its core, a pillar based on measurement. Measurement of KPI (Key Performance Indicators) relative to operations like MTTR (Mean Time to Recover) and Lead Time to Change.  But it does not, yet, bring with it the requirement that we measure the rate at which we’re able to make changes – like those necessary to launch a container and add it to a load balancing pool to enable real-time scalability. Software-defined strategies tell us these tasks – making sure the network connections and routes are properly added or adjusted and adding a new container or VM to the appropriate load balancing pool – should be automated. The process of “scaling” should be orchestrated.

And both should be measured. Operationally these are key metrics – and critical to determining how much “lead time” or “notice” is necessary to scale without disruption – whether that disruption is measured as performance degradations or failures to connect due to lack of capacity. This isn’t measuring for metrics’ sake – this is hard operational data that’s required in order to properly automate and orchestrate the processes used to scale dynamically.

We need to measure management and orchestration; we need to include a “mean time to launch” as one of our operational metrics. If we don’t, we can’t figure out when we need to launch a new instance to dynamically scale to meet demand. And if we can’t dynamically scale, we have to do it manually.

No one wants to do that.

Which means, ultimately, that we must bring the network in to the DevOps fold. Whether as NetOps or NetDevOps or SDN, we need to start bringing in the management and orchestration capabilities of the network, particularly when it applies to  those application-affine services that are impacted by changes to the application environment (like load balancing and app security).

Published Dec 17, 2015
Version 1.0
No CommentsBe the first to comment