Cloud/Automated Systems need an Architecture

Introduction

Architecture. The physical world we inhabit is built on it, literally. One of the great parts about working in technology is that it has allowed me to travel the world and appreciate many things, architecture being one of them. Up until recently I’ve always seen these two worlds, architecture and technology as being very seperate. That changed recently... through many different events professionally and personally my brain finally got out of it’s own way and started grasping for a link between these two forces in my mind. One of the many ah hah! moments I’ve had over the past year resulted in what I’m writing about today. So, if you’ll indulge me… lets tell a story!

For years I’ve been trying to pin down what ‘Cloud’ is. It’s not an uncommon question. In fact, the term Cloud itself has become one that induces panic attacks and eye twitching the world around. Working with customers everyday I saw two scenarios play out over and over again:

I meet with a customer, we agree that something needs to be done related to ‘Cloud’… we try and try for months, even years, and never, jointly, reach a state of production readiness. Pressure builds and builds until something or someone breaks and then a new strategy is proposed and we rinse and repeat.
I meet with a customer, learn the organization has made a decision regarding the ‘Cloud’ vendor/solution/technology of choice. A number of requirements are laid out, we jointly try to meet these requirements, and eventually end up in the spiral of desperation that is ‘descope, deprioritize’. What does get implemented is a shell of the original vision and rarely meets the original business objectives.

For years we’ve been faced with project after project like those described above. Sure, there have been successes. But if i’m being honest with myself, not near enough. This got me thinking about architecture; the process of designing something functional and aesthetically pleasing within the constraints of how we can manipulate the physical world. How does an architect see their creation come to life? Do they just start digging holes and welding beams? No! They build models of their vision. Those models are then sent to other groups that specialize in things like HVAC, Electrical and Plumbing. They are refined time and time again to adhere to the constraints of the real world without losing sight of the original vision. Only once these models are finalized is construction started. While Architecture is rooted in a creative process it’s ultimate expression into the real world is bound by the rules of physics, budgets and timelines.

Can we apply this same methodology to the design of Automated Systems? Yes!

Does an Architecture that properly addresses Layer 4-7 services exist? No!

My colleagues and I stepped back from trying to just do ‘something’. We stopped. We started to build our models. We refined and tested. What resulted is a generalized Architecture for Automated Systems. In this series of articles we will explore this Architecture and how F5 and our customers can use it to ‘build a model’ that can then be expressed into real-world implementations. To get started this article will layout the foundational concepts for Automated Systems and then use that knowledge to build out the foundational models for our Architecture.

Automation Concepts

One of the great successes over the past year has been a free ’Intro to Automation & Orchestration’ class that F5 has developed and delivered to customers worldwide (if you are interested in taking the class contact your account team). The story of this course and the DevOps methodology behind it will be detailed in a separate article, however, the concepts we teach in the class form the foundation for our Architecture. Those concepts are:

Appropriate Abstraction
Domain Specific Knowledge
Source of Truth
Imperative Processes
Declarative Interfaces
Orchestration & DevOps

Appropriate Abstraction

In order to successfully automate complex systems you must understand the concept of Abstration. More importantly, to abstract in an appropriate manner. To explain this concept lets use the following slide:

On the left you have a pile of lumber. In the middle you have a mostly built house. On the right you have a fully completed house. Now, imagining that you are in the market for a house, lets examine each.

The pile of lumber represents the fully custom house. If you want to turn that pile of lumber into a house you would have to learn lots of skills and invest a large amount of time and effort into building that house.

The mostly built house represents the semi-custom, new construction home that allows you to pick a floorplan that you like and customize some of the finishes in the house. The reason this type of home is so prevalent, is because the builder can leverage economies of scale. The home owner also benefits because they essentially pick what is important to them and the builder takes care of the rest.

The completed home represents the pre-existing home that you would purchase from an existing home owner. If you’ve ever purchased an existing home you’ll know that most of the time the purchase of the home is just the first step. Some refresh and renovation of the home is usually required to suit your individual needs. If you’ve ever done this you'll know that changing an existing home can open a Pandora’s box of issues that become very costly.

How does this link back to technology? Lets map these concepts over:

Pile of Lumber: What most systems look like today. Fully customizable but NOT repeatable. Large requirement for expert level knowledge of the system(s). Long lead times for deployment.

Mostly Built House: What we should actually work towards. Customizable within reason, but repeatable. Lowered requirement for expert level knowledge.
Predictable lead times for deployment.

Pre-Existing Home: What everyone tries to sell you. The proverbial ‘easy-button’. The ‘cloud-in-a-box’. Sure, you get something that works. However, changing anything inside to suit the needs of your business usually opens a Pandora’s box of issues.

So, back to Appropriate Abstraction. The key idea here is to make sure that as you abstract services in an Automated System, it’s done in a manner that leads to the ‘Mostly-built house’. Doing this requires us to understand that not every system or service can be automated. There will always be a need for the custom built services. The decision behind whether to abstract a service should be based on achieving economies of scale rather than just ‘automate everything’. Conversely, providing an ‘easy-button’ does one thing; force a vendors expression of a use case onto your environment. That may be ok with simple services and systems, however, this does not represent the majority of systems and applications in real-world environments.

Appropriate Abstraction allows you to ‘assemble the button’ for others to push.

Domain Specific Knowledge

Now that we’ve explained Appropriate Abstraction, lets take a look at Domain Specific Knowledge. Domain Specific Knowledge is the specific knowledge that an individual (or system) must have before they can complete a process. Using the example above, constructing a new home from a pile of lumber would require a very high level of Domain Specific Knowledge in many trades (concrete, framing, electrical, HVAC, painting, tile, etc). Conversely, purchasing a fully built house (with the assumption that nothing needs to be renovated) requires very low Domain Specific Knowledge as it relates to home construction.

Why talk about this? Well, you’ll see in the following sections that the level of Domain Specific Knowledge has a direct impact on how various automated systems work together (or don’t) on the path to production deployments.
Furthermore, systems are built by people. It is well known that people, while very capable, cannot keep up with the rate of change of all the underlying systems. Therefore the only solution is to limit the NUMBER of systems that have to be learned rather than limiting the DEPTH of knowledge in those systems. In other words narrow but deep instead of wide and shallow.

Source of Truth

A Source of Truth (SOT) is defined as a system or object that contains the authoritative representation of a service and it’s components. For example, in traditional environments the SOT is the running configuration on a particular device. When automating systems it is critical to understand that the SOT may NOT reside on the device itself. While each device will have a version of the running configuration, we make a distinction that the authoritative source for that data may be somewhere else (off-device).

This distinction has some implications:

Changes for a service should be pushed or pulled from the SOT to sub-ordinate devices
Out-of-band changes must be handled very carefully (preferably, totally avoided)
The SOT must provide security for service metadata.

Implementing a single SOT, for a single technology vendor is complicated. When multi systems are joined together via Orchestration the problem becomes much harder. In this instance it is important to make the distinction between a Truth and the Source of that Truth. A Truth is simply a piece of data. That Truth can be distributed and manipulated by multiple systems as long as the authoritative source of that Truth is well defined and consistent.

In complex systems there are often multiple sources of truth. The top-level Source of Truth only knows the information contained in the Abstracted representation of a service. Vendor-specific automation tools may apply more data during automated operations as you move to less abstracted interfaces of the service. As long as each Truth is tied to one and only one Source of Truth things work fine.

Imperative Processes

An Imperative Process is simple. You execute thousands of imperative processes every day. An imperative process is the step-by-step set of actions required to achieve an outcome. A simple example is making a jam sandwich.

This process can be separated into a sequence of ordered steps:

Gather Ingredients
- Bread
- Butter
- Strawberry Jam
Butter 2 slices of bread
Spread some strawberry jam on one of the slices of bread
Place the second slide on top of the first slice
Cut the sandwich in half and enjoy!

Now, lets say we have a friend over for lunch one day who doesn't share your specific sandwich preferences. The complexity around Imperative Processes arises when you have to apply customizations, or ‘branches’, to the process.
At every step of the process above you have the potential for options. For example, lets say your friend has the following requirements:

Can’t have butter due to cholesterol issues
Is allergic to strawberries
Has their arm in a cast due to a boating accident
Prefers their sandwich is cut though the center, not diagonally (seriously!)

Could the process above be used to create this ‘custom’ sandwich? No.

Instead we branch the process at each step based on the requirements. The resulting process starts to get very complicated. If you imagine building this process as a tree each ‘option’ results in another branch. If you try to enumerate all those branches to their outcome you can see how we quickly reach a scenario where the set of problems is unsolvable.

The main take away from this concept should be that we must ‘prune the tree’. While Imperative Processes will always be required, it’s the job of the expert for a particular technology, or solution, to understand which use cases can be appropriately abstracted. From there you must minimize the number of branches to the lowest set that delivers the service as intended.

Declarative Interfaces

So lets take the sandwich analogy one step further. People have gotten wind of the superior quality of your jam sandwiches. You’ve decided to start a Reggae-themed Jam Sandwich restaurant called Jammin’. How can you deliver your jam sandwiches to the masses?

The answer is something almost everyone is familiar with… the ubiquitous Drive-Thru. The Drive-Thru concept is a perfect illustration of a Declarative Interface. Consumers simply declare the sandwich they want from a pre-defined menu. Some options for customization are present, however, these options are limited because the intent of the Drive-Thru is to deliver jam sandwiches as fast as possible for low, low prices. The process behind making the sandwich (and all the logistics of running a restaurant) are totally abstracted away from the consumer.

When looking at Automated Systems it’s important to understand that when you properly combine Appropriate Abstraction with Imperative Processes the result is a Declarative Interface that should require a low level of Domain Specific Knowledge to consume. The underlying Imperative Processes could be simple or complex, however, that complexity does not have to be exposed to the consumer of the service.

Orchestration & DevOps

For years the belief has been that a top-level orchestrator should implement all the Imperative Processes required for EVERY technology component in an Automated System. This assumption has huge implications. You are exponentially increasing the requirement for Domain Specific Knowledge across an organization. Going forward, orchestration needs to be done differently.

Orchestration should consume abstracted, declarative interfaces ONLY. This allows the Domain Specific Knowledge required for one system (e.g. F5 BIG-IP) to be de-coupled from the Domain Specific Knowledge required by the Orchestration system (Ansible, vRo, etc.) By focusing on Abstraction and Declarative Interfaces, Orchestration in a large system is possible without a requirement for Domain Specific Knowledge in every technology component.

If these rules are followed the resulting interfaces allow integration of the Automated System with Agile and/or DevOps methodologies. Adopting DevOps methodologies requires organizational (people) change, however, once that change is in progress the underlying systems must provide interfaces that seamlessly integrate with the DevOps methodology and toolchain.

The Architecture

The Fire Triangle

The picture below is something that may seem familiar. It’s a depiction of the ‘fire triangle’. This picture is used to convey the concept that combustion requires three components for a sustained chain reaction:

Oxygen
Heat
Fuel

The premise is simple. If you want to create fire, you need all three. On the other hand if you have to extinguish a fire you only need to remove one component from the triangle. The age old ‘stop, drop and roll’ technique actually extinguishes fire by eliminating the Oxygen component (the rolling motion essentially chokes the fire of oxygen).

What does this have to do with our Architecture? Well, much like fire needs three components to burn; Automated Systems require three separate models, working together, to be successful. If any one of these models falls apart, our chances of success are extinguished.

The Cloud/Automation Triangle

Throughout the Architecture we will discuss a set of ‘Truths’ and ‘Attributes’ that apply to the Architecture and it’s component Models. The Truths are assumptions and rules that must be made which cannot be broken. Attributes are less strictly defined, however, must adhere to the Truths in their parent model and Architecture.

Experience has guided us in creating three discreet models that must work together seamlessly to deliver on the promise of Automated systems:

Service Model
Deployment Model
Operational Model

Each of the components must be well defined and serve to form a stable foundation for building Automated Systems over time. We will cover each of these models in detail throughout this series of articles.

Evolution of a Model

At the beginning of this article I explained how Architects iterate over their vision until they have enough in place to start construction. This iteration is key to how we actually meet business and production objectives over time.
Rather than trying to define, in detail, how every objective is met we adopt the DevOps concepts of Continuous Improvement (CI) and Continuous Deployment (CD). The idea is to implement each of the models discussed above in phases that form a feedback loop:

To support this (and more fundamentally, DevOps methodologies), the Architecture must leverages CI/CD as a base Truth. As we iterate over deployments the insights, challenges and shortcomings of the current Production Phase deployment should be prioritized and ordered then fed back into a Design Phase. A new iteration of the Models that addresses those challenges is then deployed to Production.

The overall goal is to leverage DevOps CI/CD methodologies and toolchain to enable constant iteration of the underlying models in the Architecture until a steady-state is achieved (if that ever really happens). In short, don’t try and do everything all at once. Instead define what that is and then break that down into iterations of each model to acheive that end state.

Architectural Truths

As explained in the previous section, a set of Truths is required to bind ourselves within production realities. From the Architectural level these truths are:

Enable a DevOps Methodology and Toolchain
Lower or eliminate Domain Specific Knowledge
Leverage Infrastructure-as-Code
Don’t sacrifice Functionality to Automate
Provide a Predictable Cost Model
Enable delivery of YOUR innovation to the market

Some of these points have already been discussed throughout this article.

Rather than repeating we will focus on the specific items that have not been discussed:

Leverage Infrastructure-as-Code

One of the key concepts we discussed earlier was Source of Truth. In order to adhere to the guidelines around SOT, it’s important to treat service metadata as code. This means that all metadata should be contained within a Source of Truth that naturally maintains access control, revision histories and the ability to compare metadata at different points in time. Of those already adopting an Infrastructure-as-Code model, the majority of deployments leverage Source Code Management tools such as Git for these functions, however, many other solutions exist. The common thread between all of these tools is that configuration truths and metadata are handled with the same lifecycle process as developer's source code.

Don’t sacrifice functionality to automate

This truth speaks to two different points:

The decision to automate a system or service. If critical functionality is given up for the sake of automation then a different decision has to be made. Rather than sacrificing functionality it is important that vendors and customers work to define how advanced functionality can be automated as much as possible and work to that goal.
An understanding that if functionality is being sacrificed then maybe the system or service was not abstracted properly to begin with. Or, maybe it can’t be abstracted. Either way, the decision to automate that service should be re-visited and abstraction should be applied properly. Or, the service should not be automated at this time (it could always be covered in subsequent iterations).

Provide a predictable cost model

It’s simple. Provide a model that can convey the cost of a service given appropriate scale data is provided. This means that Automated Systems should account for and prevent runaway situations that result in cost overruns.

Enable delivery of YOUR innovation to the market

Throughout this article we’ve talked about a number of technical topics. But this truth is firmly rooted in the business space. When implemented correctly Automated Systems can serve as a competitive advantage by enabling delivery of innovation to market as fast as possible.

Till next time

Phew; we covered a lot today. This is a good start but there's more! Continue on to the following articles in this series as we dive into how Service, Deployment and Operational Models should be built.

Next article: The Service Model for Cloud/Automated Systems Architectures

Published Jun 06, 2017

Version 1.0