Programmability in the Network: Because #BigData is Often Unstructured

#devops #node #proxy #sdn Adaptability of network-hosted services requires programmability because data doesn't always follow the rules

Evans Data recently released its Data & Advanced Analytics Survey 2013 which focuses on "tools, methodologies, and concerns related to efficiently storing, handling, and analyzing large datasets and databases from a wide range of sources." Glancing through the sample pages reveals this nugget on why developers move from traditional databases to more modern systems, like Hadoop:

The initial motivating factors to move away from traditional database solutions were the total size of the data being processed – it being big data – and the data’s complexity or unstructured nature.

 

It's that second reason (and the data from Evans says it's only second by a partial percentage point) that caught my eye. Because yes, we all know big data is BIG. REALLY BIG. HUGE. Otherwise we'd call it something else. It's the nature of the data, the composition, that's just as important - not only to developers and how that data is represented in a data store, but to the network and how it interacts with that data.

See, the "network" and network-hosted services (firewalls, load balancers, caches, etc...) are generally used to seeing very clearly (RFC) defined structured data. Switches are routers are fast because the data they derive decisions from is always in the same, fixed schema.

In an increasingly application-driven data center, however, it is the application - and often its data - that drives network =decisions. This is particularly true for higher-order network services (L4-7) which specifically act on application data to improve performance, security and increasingly, supportive devops-oriented architectural topologies such as A/B testing, Canary Deployments and Blue/Green Architectures.

That data is more often than not unstructured. It's the mechanism by which the unstructured big data referenced by Evans is transferred to the application that ultimately deposits it in a data base somewhere. Any data, structured or not, traverses a network of services before it reaches its penultimate destination.

Programmability is required for devops and networking teams to implement the architectures and services necessary to support those applications and systems which exchange unstructured data. Certainly there is value in programmability when applied to structured data, particularly in cases where more complex logic is required to make decisions, but it is not necessarily required to enable that value. That's because capabilities that act on structured (fixed) data can be integrated into a network-hosted service and exposed as a configurable but well-understood feature.

But when data is truly unstructured and where there is no standard - de facto or otherwise - then programmability in the network is necessary to unlock architectural capabilities. The reason intermediaries can be configured to "extract and act" on data that appears unstructured, like HTTP headers, is because there are well-defined key-value pairs. Consider "Cookie" and "Cache-control" and "X-Forward-For" (not officially part of the standard, hence the "X" but accepted as an industry, de facto standard) as good examples. While not fixed, there is a structure to HTTP headers that lends itself well to both programmability and "extract and act" systems.

To interact with non-standard headers, however, or to get at unstructured data in a payload, requires programmability on the level of executable logic rather than simple, configurable options. A variety of devops-related architectures and API proxy capabilities require programmability due to the extreme variability in implementation. There's simply no way for an intermediary or proxy to "out of the box" support such a wide-open set of possibilities because the very definition of the data is not common. Even though it may be structured in the eyes of the developer, it's still unstructured because there is no schema to describe it (think JSON as opposed to XML) and it follows no accepted, published standard.

The more unstructured data we see traversing the network, the more we're going to need programmability in the network to enable the modern architectures required to support it.

Published Oct 14, 2013
Version 1.0
  • This is why we need support for both custom TCL procedures and built-in JSON parsing functions in iRules.