QoS without Context: Good for the Network, Not So Good for the End user

#fasterapp #webperf #ado One of the most often mentioned uses of #OpenFlow revolves around QoS . Which is good for network performance metrics, but not necessarily good for application performance.

In addition to issues related to virtual machine provisioning and the dynamism inherent in cloud computing environments, QoS is likely the most often mentioned “use” of OpenFlow within a production network. Certainly quality of service issues are cropping up more and more as end-user performance climbs the priority stack for CIOs faced with increasing challenges to meet expectations.

Remembering, however, the history of QoS, one has to ask why anyone thinks it would solve performance issues any better today than it has in the past? Even if it becomes “flow” based instead of “packet-based”, is the end result really going to be a better, i.e. faster, end-user experience?

Probably not. And here’s why…

QoS techniques and technologies revolve around the ability to control the ingress and egress of packets. Whether it’s applied at a flow or packet level is irrelevant, the underlying implementation comes down to prioritization of packets. This control over how packets flow through the network is only available via devices or systems which IT controls, i.e. within the confines of the data center.

Furthermore, QoS is designed to address network-related issues such as congestion and packet-loss that ultimately degrade the performance of a given application. Congestion and packet-loss happen for a number of reasons, such as oversubscription of bandwidth, high utilization on systems (making them incapable of processing incoming packets efficiently), and misconfiguration of network devices.

Let’s assume one (or more) of said conditions exist. Let’s further stipulate that using OpenFlow, the network has been dynamically adjusted to compensate for said conditions.

Can one assume from this that end-user performance has improved? Hint: the answer is no.

The reason the answer is no is because network conditions are only one piece of the performance equation. QoS cannot address the other causes of poor performance: high load, poor external (Internet) network conditions, external bandwidth constraints, slow clients, mobile network issues, and inefficiencies in protocol processing*.

Ah, but what about using OpenFlow to dynamically adjust perimeter devices providing QoS on the egress-facing side of the network?

Certain types of QoS – rate shaping in particular – can assist in improving protocol-related impediments. Too, bandwidth management (which has migrated from its own niche market to simply being attached to QoS) may mitigate issues related to bandwidth constrained connections – but only from the perspective of the pipe between the data center and Internet. If the bandwidth constraint is on the other end of the pipe (the “last mile”), this technique will not improve performance, because the OpenFlow controller has no awareness of that constraint. In fact, an OpenFlow controller is going to largely be applying policies blind with respect to anything outside the data center.

ROOT CAUSES

When we start looking at the causes of poor end-user performance we see that many of them are outside the data center. Type of client, client network, type of content being delivered, status of the server serving the content. All these factors make up context and without visibility into that, it is impossible to redress those factors impeding performance. If you know the end-user device is a tablet, and it’s connecting over a mobile network, you know performance is most likely going to be improved by reducing the size of the content being delivered. Techniques like image optimization and minification as well as caching will improve performance. QoS techniques? Not so much, if at all.

The problem is that QoS, like many attempts at improving performance, focus on one small piece of the puzzle rather than on the whole. There is almost never a mention of the client-side factors, and not even so much as a head nod in the direction of the application, even though it’s been shown that various applications have widely varying performance characteristics when delivered over different kinds of networks.

Without context, QoS rarely achieves noticeable improvements in performance from the perspective of the end-user. Context is necessary in order to apply the right techniques and policies at the right time to ensure optimal performance for the user given the application being served. Context provides the insight, the visibility,

QoS, on its own, is not “the answer” to the problem of poorly performing applications (and it’s even less useful in cloud computing environments where a lack of control over the infrastructure required to implement is problematic). It may be part of the solution, depending on what the problem may be, but it’s just part of the solution. There are myriad techniques and technology that can be applied to improve performance; success always depends primarily on applying the right solution at the right time.

To do that, requires contextual-awareness – as well as the ability to execute “any of the above” to redress the problems.

But would QoS improve, overall, the performance of applications? Sure – though perhaps only minimally and to the end-user, imperceptibly. The question becomes is IT more concerned with metrics proving they are meeting SLAs for network delivery or on improving the actual experience of the end-user?

*Rate shaping (a QoS technique) does attempt to mitigate issues with TCP by manipulating window-sizes and timing of exchanges, which partially addresses this issue.