F5 Friday: Spelunking for Big Data

Managing the other kind of performance in a data center requires the ability to analyze a whole lotta data. Big operational data.

“Big data” right now is nearly as hyped as cloud computing . The vast amounts of data collected that need to be shared, integrated, replicated, backed up, and managed is growing at a phenomenal rate.

But when folks talk about “big data” they’re focused primarily on application data, on user-generated data, on business data. They are not generally concerned with the other “big data” that threatens to overwhelm data center operations on a daily basis: operational data.

Every day, in data centers across the world, gigabyte upon gigabyte of log data is generated. Some of it is mundane bandwidth and throughput data. Some of it is routine web application data, reporting on number of requests received in any given period of time. Other data contains more gnarly information, such as who and what device was trying to inject malicious code into a web application. It’s all important data, and when you combine the gigabytes of log files from just about every device in the data center, well, that’s BIG data.

Without the means to aggregate, search, and analyze all that data as a view of “the data center” (as opposed to individual components), however, it’s just bits and bytes and wasted disk. Administrators and operators need a way to aggregate and correlate events across the entire data center so they can more easily find and understand any given event or problem that may be occurring as well as providing a holistic view of data center performance. And by performance I mean not just “how fast does my application go” but “how well is my web application firewall performing its responsibilities.” After all, one of the ways in which IT justifies the acquisition of solutions is by providing a Return On Investment (ROI) based on the solution performing its intended task.

MANAGING the OTHER KIND of PERFORMANCE

If you deploy F5 BIG-IP Access Policy Manager (APM) as an access management solution, you’d like to know that it’s actually doing just that – and how well it’s doing it. Without that data it’s hard to compute the ROI and provide the business with “proof” that its investments in data center solutions are paying back the organization as expected.

The problem is that while individual solutions may report on how well they are performing, they are unlikely to integrate and correlate data from other systems to provide a holistic view of “the other kind of performance.” That’s where those standards and management solutions come in handy. Leveraging standards and integration methods to aggregate data from across data center components and even data centers (including cloud computing providers), solutions exist that can provide the visibility into the “other kind” of performance of data center components necessary to understand not only how each component is performing but also see the “big picture” across the entire data center.

Now, the way in which you paint that big picture differs. You can, of course, go spelunking through the data center yourself to find the data you need and manually aggregate it. Such manual processes do not scale well, of course, and as data grows so does the time and effort required to perform such a task. The big operational data in today’s data centers makes that a Herculean task that, on reflection, you’ll find is probably much better suited to an automated solution. A good option is a solution like Splunk, which phonetically sounds a whole lot like “spelunk” and unsurprisingly that’s not just coincidence. What Splunk does is exactly what you may think it does: it explores the entire data center, indexing and aggregating and correlating data from just about every kind of system, platform, and device. Not only does it provide a single-point of entry into the “big data” of enterprise infrastructure, but it also allows analysis of that data from simple to complex queries, enabling operators and admins to fully explore the depths of big data in the enterprise from the comfort of their console.

Now available (for free, as in gratis) is Splunk for F5 (Version 2.0). Not only does this version support APM, but also includes integrated data from F5 BIG-IP Application Security Manager (ASM) and FirePass as well.

For more details on this offering, please check out fellow blogger Pete Silva’s latest post, “Do You Splunk 2.0”.

Happy Spelunking!