on 21-Sep-2011 06:14
It’s how much load that really generates and how it scales to meet the challenge.
There’s some amount of debate whether Facebook really crossed over the one trillion page view per month threshold. While one report says it did, another respected firm says it did not; that its monthly page views are a mere 467 billion per month.
In the big scheme of things, the discrepancy is somewhat irrelevant, as neither show the true load on Facebook’s infrastructure – which is far more impressive a set of numbers than its externally measured “page view” metric. Mashable reported in “Facebook Surpasses 1 Trillion Pageviews per Month” that the social networking giant saw “approximately 870 million unique visitors in June and 860 million in July” and followed up with some per visitor statistics, indicating “each visitor averaged approximately 1,160 page views in July and 40 per visit — enormous by any standard. Time spent on the site was around 25 minutes per user.”
From an architectural standpoint it’s not just about the page views. It’s about requests and responses, many of which occur under the radar from metrics and measurements typically gathered by external services like Google. Much of Facebook’s interactive features are powered by AJAX, which is hidden “in” the page and thus obscured from external view and a “page view” doesn’t necessarily include a count of all the external objects (scripts, images, etc…) that comprises a “page”. So while 1 trillion (or 467 billion, whichever you prefer) is impressive, consider that this is likely only a fraction of the actual requests and responses handled by Facebook’s massive infrastructure on any given day.
Let’s examine what the actual requests and responses might mean in terms of load on Facebook’s infrastructure, shall we?
Loading up Facebook yields 125 requests to load various scripts, images, and content. That’s a “page view”. Sitting on the page for a few minutes and watching Firebug’s console, you’ll note a request to update content occurs approximately every minute you are on a page. If we do the math – based on approximate page views per visitor, each of which incurs 125 GET requests – we can math that up to an approximation of 19,468 RPS (Requests per Second).
That’s only an approximation, mind you, and doesn’t take into consideration the time factor, which also incurs AJAX-based requests to update content occurring on a fairly regular basis. These also add to the overall load on Facebook’s massive infrastructure. And that’s before we start considering the impact from “unseen” integrated traffic via Facebook’s API which, according to the most recently available data (2009) was adding 5 billion requests a day to that load. If you’re wondering, that’s an additional 57,870 requests per second, which gives us a more complete number of 77,338 requests per second.
SOURCE: 2009 Interop F5 Keynote
Let’s take a moment to digest that, because that’s a lot of load on a site – and I’m sure it still isn’t taking into consideration everything. We also have to remember that the load at any given time could be higher – or lower – based on usage patterns. Averaging totals over a month and distilling down to a per second average is just that – a mathematical average. It doesn’t take into consideration that peaks and valleys occur in usage throughout the day and that Facebook may be averaging only a fraction of that load with spikes two and three times as high throughout the day.
That realization should be a bit sobering, as we’ve seen recent DDoS attacks that have crippled and even toppled sites with less traffic than Facebook handles in any given minute of the day.
The question is, how do they do it? How do they manage to keep the service up and available despite the overwhelming load and certainty of traffic spikes?
Facebook itself does a great job of discussing exactly how it manages to sustain such load over time while simultaneously managing growth, and its secret generally revolves around architectural choices. Not just the “Facebook” application architecture, but its use of infrastructure architecture as well. That may not always be apparent from Facebook’s engineering blog, which generally focuses on application and software architecture topics, but it is inherent in those architectural decisions.
Take, for example, an engineer’s discussion on Facebook’s secrets to scaling to over 500 million users and beyond. The very first point made is to “scale horizontally”.
This isn't at all novel but it's really important. If something is increasing exponentially, the only sensible way to deal with it is to get it spread across arbitrarily many machines. Remember, there are only three numbers in computer science: 0, 1, and n. (Scaling Facebook to 500 Million Users and Beyond (Facebook Engineering Blog))
Horizontal scalability is, of course, enabled via load balancing which generally (but not always) implies infrastructure components that are critical to an overall growth and scalability strategy. The abstraction afforded by the use of load balancing services also has the added benefit of enabling agile operations as it becomes cost and time effective to add and remove (provision and decommission) compute resources as a means to meet scaling challenges on-demand, which is a key component of cloud computing models.
In other words, in addition to Facebook’s attention to application architecture as a means to enable scalability, it also takes advantage of infrastructure components providing load balancing services to ensure that its massive load is distributed not just geographically but efficiently across its various clusters of application functionality. It’s a collaborative architecture that spans infrastructure and application tiers, taking advantage of the speed and scalability benefits afforded by both approaches simultaneously.
Yet Facebook is not shy about revealing its use of infrastructure as a means to scale and implement its architecture; you just have to dig around to find it. Consider as an example of a collaborative architecture the solution to some of the challenges Facebook has faced trying to scale out its database, particularly in the area of synchronization across data centers. This is a typical enterprise challenge made even more difficult by Facebook’s decision to separate “write” databases from “read” to enhance the scalability of its application architecture. The solution is found in something Facebook engineers call “Page Routing” but most of us in the industry call “Layer 7 Switching” or “Application Switching”:
The problem thus boiled down to, when a user makes a request for a page, how do we decide if it is "safe" to send to Virginia or if it must be routed to California?
This question turned out to have a relatively straightforward answer. One of the first servers a user request to Facebook hits is called a Load balancer; this machine's primary responsibility is picking a web server to handle the request but it also serves a number of other purposes: protecting against denial of service attacks and multiplexing user connections to name a few. This load balancer has the capability to run in Layer 7 mode where it can examine the URI a user is requesting and make routing decisions based on that information. This feature meant it was easy to tell the load balancer about our "safe" pages and it could decide whether to send the request to Virginia or California based on the page name and the user's location. (Scaling Out (Facebook Engineering Blog))
That’s the hallmark of the modern, agile data center and the core of cloud computing models: collaborative, dynamic infrastructure and applications leveraging technology to enable a cost-efficient, scalable architectures able to maintain growth along with the business.
Today’s architectures – both application and infrastructure – are growing necessarily complex to meet the explosive growth of a variety of media and consumers. Applications alone cannot scale themselves out – there simply aren’t physical machines large enough to support the massive number of users and load on applications created by the nearly insatiable demand consumers have for online games, shopping, interaction, and news. Modern applications must be deployed and delivered collaboratively with infrastructure if they are to scale and support growth in an operationally and financially efficient manner.
Facebook’s ability to grow and scale along with demand is enabled by its holistic, architectural approach that leverages both modern application scalability patterns as well as infrastructure scalability patterns. Together, infrastructure and applications are enabling the social networking giant to continue to grow steadily with very few hiccups along the way. Its approach is one that is well-suited for any organization wishing to scale efficiently over time with the least amount of disruption and with the speed of deployment required of today’s demanding business environments.