Performance in the Cloud: Business Jitter is Bad

#fasterapp #ccevent While web applications aren’t sensitive to jitter, business processes are.

One of the benefits of web applications is that they are generally transported via TCP, which is a connection-oriented protocol designed to assure delivery. TCP has a variety of native mechanisms through which delivery issues can be addressed – from window sizes to selective acks to idle time specification to ramp up parameters. All these technical knobs and buttons serve as a way for operators and administrators to tweak the protocol, often at run time, to ensure the exchange of requests and responses upon which web applications rely. This is unlike UDP, which is more of a “fire and forget” protocol in which the server doesn’t really care if you receive the data or not.

Now, voice and streaming video and audio over the web has always leveraged UDP and thus it has always been highly sensitive to jitter. Jitter is, without getting into layer one (physical) jargon, an undesirable delay in the otherwise consistent delivery of packets. It causes the delay of and sometimes outright loss of packets that are experienced by users as pauses, skips, or jumps in multi-media content.

While the same root causes of delay – network congestion, routing changes, time out intervals – have an impact on TCP, it generally only delays the communication and other than an uncomfortable wait for the user, does not negatively impact the content itself. The content is eventually delivered because TCP guarantees that, UDP does not.

However, this does not mean that there are no negative impacts (other than trying the patience of users) from the performance issues that may plague web applications and particularly those that are more and more often out there, in the nebulous “cloud”. Delays are effectively business jitter and have a real impact on the ability of the business to perform its critical functions – and that includes generating revenue.

BUSINESS JITTER and the CLOUD

David Linthicum summed up the issue with performance of cloud-based applications well and actually used the terminology “jitter” to describe the unpredictable pattern of delay:

Are cloud services slow? Or fast? Both, it turns out -- and that reality could cause unexpected problems if you rely on public clouds for part of your IT services and infrastructure.

When I log performance on cloud-based processes -- some that are I/O intensive, some that are not -- I get results that vary randomly throughout the day. In fact, they appear to have the pattern of a very jittery process. Clearly, the program or system is struggling to obtain virtual resources that, in turn, struggle to obtain physical resources. Also, I suspect this "jitter" is not at all random, but based on the number of other processes or users sharing the same resources at that time.

-- David Linthicum, “Face the facts: Cloud performance isn't always stable”

But what the multitude of articles coming out over the past year or so with respect to performance of cloud services has largely ignored is the very real and often measurable impact on business processes. That jitter that occurs at the protocol and application layers trickles up to become jitter in the business process; a process that may be critical to servicing customers (and thus impacts satisfaction and brand) as well as on the bottom line. Unhappy customers forced to wait for “slow computers”, as it is so often called by the technically less adept customer service representatives employed by many organizations, may take to the social media airwaves to express displeasure, or cancel an order, or simply refuse to do business in the future with the organization based on delays experienced because of unpredictable cloud performance.

Business jitter can also manifest as decreased business productivity measures, which it turns out can be measured mathematically if you put your mind to it.

Understanding the variability of cloud performance is important for two reasons:

You need to understand the impact on the business and quantify it before embarking on any cloud initiative so it can be factored in to the overall cost-benefit analysis. It may be that the cost savings from public cloud are much greater than the potential loss of revenue and/or productivity, and thus the benefits of a cloud-based solution outweigh the risks.
Understanding the variability and from where it comes will have an impact and help guide you to choosing not only the right provider, but the right solutions that may be able to normalize or mitigate the variability. If the primary source of business jitter is your WAN, for example, then it may be that choosing a provider that supports your ability to deploy WAN optimization solutions would be an appropriate strategy. Similarly, if the variability in performance stems from capacity issues, then choosing a provider that allows greater latitude in load balancing algorithms or the deployment of a virtual (soft) ADC would likely be the best strategy.

It seems clear from testing and empirical (as well as anecdotal) evidence that cloud performance is highly variable and, as David puts it, unstable. This should not necessarily be seen as a deterrent to adopting cloud services – unless your business is so highly sensitive to latency that even milliseconds can be financially damaging – but rather it should be a reality that factors into your decision making process with respect to your choice of provider and the architecture of the solution you’ll be deploying (or subscribing to, in the case of SaaS) in the cloud.

Knowing is half the battle to leveraging cloud successfully. The other half is strategy and architecture.