True DDoS Stories: SSL Connection Flood

#adcfw

I have a particular fascination with DDoS and I collect what I call True DDoS Stories. This isn’t to say that I ignore traditional smash-and-grab penetrations or SQL injection incidents, but there’s something about an actor (or cast of actors) harnessing thousands of machines to attack a target that reminds me of a Wizard conjuring a thunderstorm and directing it to attack an encamped enemy.

SSL Termination at the Server

I was on the road and was approached by a firm that had suffered a particularly severe DDoS attack that kept their high-profile site down for multiple weeks. Here’s a quick sketch of the general layout of their network.

Unlike most of their competitors, they are flowing SSL traffic all the way through the data center to be terminated at the application servers. Lori MacVittie has posted multiple blogs entries about why it’s better to terminate it at the Application Delivery Controller (ADC), but let’s overlook that for now. This particular site had a service level agreement (SLA) whereby any SSL connection that was initiated by the client must stay active for a particular interval before timing out.

The Attack

The identity of the attackers remains unknown, but the firm suspected Anonymous. Whoever the attackers were, they began opening thousands of legitimate SSL connections. The connections were passed all the way through the DDoS prevention system, to the load balancers, through the firewall and IPS, and to the application servers that then established sessions and began the long time-outs. The SSL sessions contained no payload and were never closed by the client side. It was a classic connection flood, except this time within established SSL sessions.

The application server stacks were provisioned well enough to handle the load, and the number of empty SSL sessions climbed into the millions. With SSL terminated at the application server, the front-side device with the smallest capacity to handle concurrent connections will fail. In this case it was the load balancer (a competitor of ours who I won’t name). Like many devices, when it reached it concurrent connection limit, it failed hard and stopped processing traffic.

Normally load balancers can act as DDoS mitigation devices in that they divide the attack load by the number of active servers. This can mitigate smaller attacks, but here the load-balancer became the weak link in the chain. Usually we see the firewall fail first.

The attack continued for weeks and service was not fully restored until the DDoS attack ended.

Incorrect Mitigation Strategy #1 – Point Solutions

There are several firms out there that are making a name for themselves in DDoS mitigation – Arbor and Prolexic and the old Cisco Guard product (now discontinued). I won’t specify which solution the firm was using, but it didn’t help in this case. None of those solutions terminate SSL traffic so all are blind to the SSL connection floods. If you insist on architecture that terminates SSL at the application servers, you can pay your ISP $6,000 / hour for cloud-based scrubbing and it won’t help. Even if cloud-based services did terminate SSL, financial firms couldn’t use them, as it would mean sending their unencrypted traffic into someone else’s cloud. Most financial firms have policies that prohibit that.

Incorrect Mitigation Strategy #2 – More Weak Links

When I talk with customers about a unified security solution, one common rebuttal that I hear is “I don’t want to put all my eggs in one basket, and I don’t want to trust a single vendor.” This attack is an excellent example of the danger of that strategy. The problem isn’t the eggs-in-a-basket. The problem is which-is-the-weakest-link? Breaking one egg of many doesn’t matter that much, but when a link in a chain breaks, the whole chain becomes useless. So if you want to keep device sprawl as an architectural benefit then you have to ensure that all devices in the chain can handle an attack. More devices = more weak links.

Correct Mitigation Strategy – Full Proxy

A full-proxy architecture with dynamic reaping would have prevented this firm’s attack. An intelligent, full-proxy ADC with SSL termination will wait for application payload (usually HTTP) before it establishes a connection to the back-end servers. Often it does this so that it can insert load-balancing cookies or other HTTP headers. In such architecture, all the empty SSL sessions would have piled up at the ADC. Should the ADC connection table become full, dynamic reaping closes inactive connections to free up new ones for authentic SSL connections.

There are other benefits to terminating SSL at the full proxy application delivery controller. Some firms terminate SSL at a full proxy ADC and then invoke DDoS scrubbing services behind it (because those services can now see the decrypted payload). However, financial firms are often required to re-encrypt the traffic as it leaves the ADC, so for them; the ADC is the only device that can mitigate an SSL attack. Lastly, the ADC can be an ideal device to locate hardware-protected FIPS 140 level 3 key services. Often these services can be expensive and consolidating them from dozens of servers into a pair of ADC controllers makes a lot of sense.

In my travels I hear about firewalls failing under attack quite often. It’s ironic that you buy firewalls to protect you, but lately they are becoming the weak-link in an attack. When an SSL infrastructure fails due to an SSL attack it feels like the same thing. SSL is supposed to be a technology that protects good data, but when deployed incorrectly, it can become a vector for mischief.