SPDY versus HTML5 WebSockets

#HTML5 #fasterapp #webperf #SPDY So much alike, yet so vastly a different impact on the data center …

A recent post on the HTTP 2.0 War beginning garnered a very relevant question regarding WebSockets and where it fits in (what might shape up to be) an epic battle.

The answer to the question, “Why not consider WebSockets here?” could be easily answered with two words: HTTP headers. It could also be answered with two other words: infrastructure impact.

But I’m guessing Nagesh (and others) would like a bit more detail on that, so here comes the (computer) science.

Different Solutions Have Different Impacts

Due to a simple (and yet profound) difference between the two implementations, WebSockets is less likely to make an impact on the web (and yet more likely to make an impact inside data centers, but more on that another time). Nagesh is correct in that in almost all the important aspects, WebSockets and SPDY are identical (if not in implementation, in effect). Both are asynchronous, which eliminates the overhead of “polling” generally used to simulate “real time” updates a la Web 2.0 applications. Both use only a single TCP connection. This also reduces overhead on servers (and infrastructure) which can translate into better performance for the end-user. Both can make use of compression (although only via extensions in the case of WebSockets) to reduce size of data transferred resulting, one hopes, in better performance, particularly over more constrained mobile networks.

Both protocols operate “outside” HTTP and use an upgrade mechanism to initiate. While WebSockets uses the HTTP connection header to request an upgrade, SPDY uses the Next Protocol Negotiation (proposed enhancement to the TLS specification). This mechanism engenders better backwards-compatibility across the web, allowing sites to support both next-generation web applications as well as traditional HTTP.

Both specifications are designed, as pointed out, to solve the same problems. And both do, in theory and in practice. The difference lies in the HTTP headers – or lack thereof in the case of WebSockets.

Once established, WebSocket data frames can be sent back and forth between the client and the server in full-duplex mode. Both text and binary frames can be sent full-duplex, in either direction at the same time. The data is minimally framed with just two bytes. In the case of text frames, each frame starts with a 0x00 byte, ends with a 0xFF byte, and contains UTF-8 data in between. WebSocket text frames use a terminator, while binary frames use a length prefix.

-- HTML5 Web Sockets: A Quantum Leap in Scalability for the Web

WebSockets does not use HTTP headers, SPDY does. This seemingly simple difference has an inversely proportional impact on supporting infrastructure.

The Impact on Infrastructure

The impact on infrastructure is why WebSockets may be more trouble than its worth – at least when it comes to public-facing web applications. While both specifications will require gateway translation services until (if) they are fully adopted, WebSockets has a much harsher impact on the intervening infrastructure than does SPDY.

WebSockets effectively blinds infrastructure. IDS, IPS, ADC, firewalls, anti-virus scanners – any service which relies upon HTTP headers to determine specific content type or location (URI) of the object being requested – is unable to inspect or validate requests due to its lack of HTTP headers. Now, SPDY doesn’t make it easy – HTTP request headers are compressed – but it doesn’t make it nearly as hard, because gzip is pretty well understood and even intermediate infrastructure can deflate and recompress with relative ease (and without needing special data, such as is the case with SSL/TLS and certificates).

Let me stop for a moment and shamelessly quote myself from a blog on this very subject, “Oops! HTML5 Does it Again”:

One of the things WebSockets does to dramatically improve performance is eliminate all those pesky HTTP headers. You know, things like CONTENT-TYPE. You know, the header that tells the endpoint what kind of content is being transferred, such as text/html and video/avi. One of the things anti-virus and malware scanning solutions are very good at is detecting anomalies in specific types of content. The problem is that without a MIME type, the ability to correctly identify a given object gets a bit iffy. Bits and bytes are bytes and bytes, and while you could certainly infer the type based on format “tells” within the actual data, how would you really know? Sure, the HTTP headers could by lying, but generally speaking the application serving the object doesn’t lie about the type of data and it is a rare vulnerability that attempts to manipulate that value. After all, you want a malicious payload delivered via a specific medium, because that’s the cornerstone upon which many exploits are based – execution of a specific operation against a specific manipulated payload. That means you really need the endpoint to believe the content is of the type it thinks it is.

But couldn’t you just use the URL? Nope – there is no URL associated with objects via a WebSocket. There is also no standard application information that next-generation firewalls can use to differentiate the content; developers are free to innovate and create their own formats and micro-formats, and undoubtedly will. And trying to prevent its use is nigh-unto impossible because of the way in which the upgrade handshake is performed – it’s all over HTTP, and stays HTTP. One minute the session is talking understandable HTTP, the next they’re whispering in Lakota, a traditionally oral-only language which neatly illustrates the overarching point of this post thus far: there’s no way to confidently know what is being passed over a WebSocket unless you “speak” the language used, which you may or may not have access to.

The result of all this confusion is that security software designed to scan for specific signatures or anomalies within specific types of content can’t. They can’t extract the object flowing through a WebSocket because there’s no indication of where it begins or ends, or even what it is. The loss of HTTP headers that indicate not only type but length is problematic for any software – or hardware for that matter – that uses the information contained within to extract and process the data.

SPDY, however, does not eliminate these Very-Important-to-Infrastructure-Services HTTP headers, it merely compresses them. Which makes SPDY a much more compelling option than WebSockets. SPDY can be enabled for an entire data center via the use of a single component: a SPDY gateway. WebSockets ostensibly requires the upgrade or replacement of many more infrastructure services and introduces risks that may be unacceptable to many organizations.

And thus my answer to the question "Why not consider WebSockets here” is simply that the end-result (better performance) of implementing the two may be the same, WebSockets is unlikely to gain widespread acceptance as the protocol du jour for public facing web applications due to the operational burden it imposes on the rest of the infrastructure.

That doesn’t mean it won’t gain widespread acceptance inside the enterprise. But that’s a topic for another day…