The Art of Scale: Microservices, The Scale Cube and Load Balancing

Microservice architectures are the new hotness, even though they aren't really all that different (in principle) from the paradigm described by SOA (which is dead, or not dead, depending on whom you ask).

One of the things this decompositional approach to application architecture does is encourage developers and operations (some might even say DevOps) to re-evaluate scaling strategies. In particular, the notion is forwarded that an application should be built to scale and then infrastructure should assist where necessary.

It was just this notion that led me to a discussion on a particularly useful explanation of scaling strategies called "the scale cube" which is introduced and explained further in The Art of Scalability. Go ahead and open it up and bookmark it; it's a good read and I highly recommend it.

The aforementioned discussion provides an overview of the three axes perspective on scale: x, y and z. Reading the descriptions, it became fairly apparent to me (who lives with one foot in the network and the other in the app) that the use of layer 7 load balancing was a way to implement in some cases and augment in others.

X-axis scaling

X-axis scaling is essentially a typical horizontal (scale out) scaling pattern implemented using a load balancer. Simple and effective for many types of applications, this pattern has been the age old "go to" for quickly scaling out apps that were not perhaps built to scale in the first place. Monolithic applications are almost always scaled out (x-axis) because they were not developed with other scalability models in mind and their reliance on state (via cookies or server-side sessions stored in memory, not databases) makes other scalability models nearly impossible to deploy successfully.

X-axis scalability can easily be implemented using layer 4 (TCP) load balancing if state is not important, but more often than not requires layer 7 (HTTP) load balancing due to the need to examine headers or other variables to ensure persistence to the right application instance (think sticky sessions).

How does this apply to microservices? Consider that instead of apps, each microservice is scaled out (along the x-axis) using an app proxy or application delivery controller (ADC). This allows operations to tune each app proxy or ADC based on the specific purpose of the microservice, improving performance by applying image optimization, compression or even caching where appropriate to the specific service. In a monolithic application, an ADC will be a better choice for this scalability model because of its ability to interpret requests and optimize responses with the benefit of context. In cloud-scale microservice architectures, an app proxy may be the better option when considering cost per service and the relatively simple delivery needs of a given service. 

Y-axis scaling

Y-axis scaling is essentially a layer 7-based sharding pattern when applied to infrastructure. Y-axis scaling relies on the decomposition of applications into services. It is highly appropriate for SOA or RESTful APIs that group like functionality into a service. For example, verb-based decomposition focused on "login" or "checkout" or noun-based decomposition with an emphasis on "customer" or "partner."  The key is that there is some mechanism within each request - either in the URI or in the HTTP headers - that enable the app proxy or ADC to determine to which service the request needs to be forwarded.

Sharding can be implemented in the app, itself, using a routing object to dissect the URI or that functionality can be offloaded to the network and implementing using the data path programmability associated with an app proxy or ADC. This programmability allows operators or developers to implement targeted logic that dissects the URI and determines to which service the request should be directed. This pattern can be (and often is) implemented along with an X-axis scaling strategy for the specific service.

The combination of both Y and X axis scaling is increasingly a good choice for bifurcated networks which split "core" networking from "app" networking. The core network usually provides a significantly capable load balancing service managed by the network team while the app network includes app proxies or virtual ADCs that are managed by operations or developers.

While this pattern can be implemented on monolithic applications, particularly monolithic web applications that rely on URI-based interactions, care must be taken with respect to state. That is, one cannot simply route to service B for "checkout" when it depends on session-level data that may be stored already in service A or C. Shared nothing application architectures do not  lend themselves well to sharding strategies based on application function or content type. Rather, such applications should be scaled using a more traditional approach. Shared session application architectures, however, are very well suited to this type of scalability strategy because the application state is shared across instances, and all services will have access to the necessary data.

Z-axis scaling

Z-axis scaling is a cross between X and Y scaling strategies, using a data sharding-based approach. This strategy is commonly used to shard databases, but can also be used to scale monolithic applications based on some user characteristic.

Z-axis scaling is like X-axis scaling in that it relies on cloning of application instances. The difference is that some other component - like an app proxy or ADC - is responsible for distributing requests based on some other information, like the data being requested or the user identity. As long as the data is accessible to the app proxy or the ADC (increasingly iintermediaries have the ability to reach out and query databases or directories to obtain additional information) 



When using Z-axis scaling each server runs an identical copy of the code. In this respect, it’s similar to X-axis scaling. The big difference is that each server is responsible for only a subset of the data. Some component of the system is responsible for routing each request to the appropriate server. One commonly used routing criteria is an attribute of the request such as the primary key of the entity being accessed. Another common routing criteria is the customer type. For example, an application might provide paying customers with a higher SLA than free customers by routing their requests to a different set of servers with more capacity.

This pattern is also useful for premium access policies, where certain users or customers are afforded higher performance guarantees.These instances may be further augmented with additional services or scaled out faster to improve performance. Only certain customers are allowed to access these "gold" instances, and such determinations might be made based on API key, cookie value, or membership in a specific group as determined by a database or directory lookup.


The point is that the scaling strategies associated with application architecture can be duplicated and/or augmented by the use of a app proxy or ADC. It is almost always the case that such an intermediary will be necessary to scale an application. That's because reality is that it's just as bad to let network logic (routing) seep into business logic as it is business logic to seep into the presentation (GUI) layer.

Keep your logics separate, and use the tools available to act on the scaling strategy best suited for your application or service.

Published Nov 17, 2014
Version 1.0

Was this article helpful?


  • Would I be right in thinking that you've got "shared Nothing" and "Shared Services" the wrong way round in the last paragraph of the section on Y Scaling. I thought "Shared Nothing" was better for Sharding as nothing is shared?
  • Top of the page says: Updated 21-Oct-2014 • Originally posted on 17-Nov-2014 How is this possible :-)