It’s called a feedback loop, not a feedback black hole.
One of the key components of a successful architecture designed to mitigate operational risk is the ability to measure, monitor and make decisions based on collected “management” data. Whether it’s simple load balancing decisions based on availability of an application or more complex global application delivery traffic steering that factors in location, performance, availability and business requirements, neither can be successful unless the components making decisions have the right information upon which to take action.
Monitoring and management is likely one of the least sought after tasks in the data center. It’s not all that exciting and it often involves (please don’t be frightened by this) integration. Agent-based, agentless, standards-based. Monitoring of the health and performance of resources is critical to understanding how well an “appl... on a daily basis. It’s the foundational data used for capacity planning, to determine whether an application is under attack and to enable the dynamism required of a dynamic, intelligent infrastructure supportive of today’s operational goals.
We talk a lot about standards and commoditization and how both can enable utility-style computing as well as the integration necessary at the infrastructure layers to improve the overall responsiveness of IT. But we don’t talk a lot about what that means in terms of monitoring and management of resource “health” – performance, capacity and availability.
The ability of any load-balancing service depends upon the ability to determine the status of an application. In an operationally mature architecture that includes the status of all components related to the delivery of that application, including other application services such as middle-ware and databases and external application services. When IT has control over all components, then traditional agent-based approaches work well to provide that information. When IT does not have control over all components, as is increasingly the case, then it cannot collect that data nor access it in real-time. If the infrastructure components upon which successful application delivery relies cannot “see” how any given resource is performing let alone whether it’s available or not, there is a failure to communicate that ultimately leads to poor decision making on the part of the infrastructure.
We know that in a highly virtualized or cloud-computing model of application deployment that it’s important to monitor the health of the resource, not the “server”, because the “server” has become little more than a container, a platform upon which a resource is deployed and made available. With the possibility of a resource “moving” it is even more imperative that operations monitor resources. Consider how IT organizations that may desire to leverage more PaaS (Platform as a Service) to drive application development efforts forward faster. Monitoring and management of those resources must occur at the resource layer; IT has no control or visibility into the underlying platforms – which is kind of the point in the first place.
The feedback from the resource must come from somewhere. Whether that’s an agent (doesn’t play well with a PaaS model) or some other mechanism (which is where we’re headed in this discussion) is not as important as getting there in the first place. If we’re going to architect highly responsive and dynamic data centers, we must share all the relevant information in a way that enables decision-making components (strategic points of control) to make the right decisions. To do that resources, specifically applications and application-related resources, must provide feedback.
This is a job for devops if ever there was one. Not the ops who apply development principles like Agile to their operational tasks, but developers who integrate operational requirements and needs into the resources they design, develop and ultimately deploy. We already see efforts to standardize APIs designed to promote security awareness and information through efforts like CloudAudit. We see efforts to standardize and commoditize APIs that drive operational concerns like provisioning with OpenStack. But what we don’t see is an effort to standardize and commoditize even the simplest of health monitoring methods. No simple API, no suggestion of what data might be common across all layers of the application architecture that could provide the basic information necessary for infrastructure services to take actions appropriately.
The feedback regarding the operational status of an application resource is critical in ensuring that infrastructure is able to make the right decisions at the right time regarding each and every request. It’s about promoting dynamic equilibrium in the architecture; an equilibrium that leads to efficient resource utilization across the data center while simultaneously providing for the best possible performance and availability of services.
It is critical that developers not only understand but take action regarding the operational needs of the service delivery chain. It is critical because in many situations the developer will be the only ones with the means to enable the collection of the very data upon which the successful delivery of services relies. While infrastructure and specifically application delivery services are capable of collaborating with applications to retrieve health-related data and subsequently parse the information into actionable data, the key is that the data be available in the first place. That means querying the application service – whether application or middle-ware and beyond – directly for the data needed to make the right decisions. This type of data is not standard, it’s not out of the box, and it’s not built into the platforms upon which developers build and deploy applications. It must be enabled, and that means code.
That means developers must provide the implementation of the means by which the data is collected; ultimately one hopes this results in a standardized health-monitoring collection API jointly specified by ops and dev. Together.