Being the incredible horrible planner I am, I started to order invitations early last week for a party I’m throwing for my wife’s graduation and it turns out they wanted double the cost of the invitations in overnight shipping! So…I sent evites. It took a day, however, to actually get them out. I started the process but was interrupted by the EC2 outage. I only know that for sure because the evite site I used was very quick to tell me in their error message that the problem was with the “Amazon EC2 Datacenter.” Was Amazon down? Yes. Is it Amazon’s fault the evite site couldn’t deliver? Absolutely not. The only failure that’s really noteworthy is that the issues they faced cascaded beyond a single availability zone and impacted others. That shouldn’t happen—Amazon has some explaining to do on that front.
Infrastructure as a service is a platform, not a design. To set it and forget it in EC2 is just begging for problems, as hundreds of app owners found out last week. “The Cloud” is hot, trendy, sexy, whatever you want to call it, but it’s not a panacea. It’s difficult enough to find all the hard and soft points of failure in your own datacenter, but the problem is even more exacerbated when most of the systems your application runs on is abstracted and inaccessible for you to isolate problems.
Everything fails, all the time
--Werner Vogels, CTO Amazon.com
So for a better experience in deploying applications to the cloud, you must assume that everything will break at every point. That means that multiple availability zones in a single region is probably not a smart move. If your application is mission critical, perhaps even multiple regions with a single vendor is not a smart move. It’s time to stop looking to the cloud as the “easy button” and face reality—you still need people with solid network and systems design skills to get you from an application in the cloud to a cloud application.