Lessons From Recent Power Outages

Robert Seastrom rs at seastrom.com
Wed Jul 18 19:29:48 CDT 2012

If you've already read Louie's reply, then you'll find a lot of what I'm about to write repetitive.

EC2 and S3 are commercial versions of the technology that underpins amazon.com.

In case you didn't notice, amazon.com didn't go down, and assuming you had Internet service at your cooling center, it was possible to order twinkies, CDs, and books throughout.

Why was that?  Maybe someone at Amazon read the manual and was careful with the implementation because revenue lost when the site is down is hundreds of thousands of dollars per minute.  Management tends to get antsy about that sort of thing when the stakes are high and put sufficient resources behind the effort to make sure that stuff doesn't roll over and die.

Since Amazon's storefront architecture is resilient across multiple datacenters, they're able to build the individual datacenters on the cheap.  And they do.  Amazon's business model does not support building their datacenters to the same spec as Equinix, Terremark, or RagingWire.  They build their colo and their infrastructure on the cheap because, well, if they lose one datacenter it's not a big deal.  Then they commercialized it and published the APIs and specs so that folks could make their stuff resilient on the same platform if they wanted to as well.  That's a big if.

The folks who had massive service impacts due to the loss of one availability zone and a load-balancer-in-the-sky from Amazon made a conscious decision that they were OK with only a couple of 9s of availability in order to save some bucks.  Most of the time that's a gamble that pays off, some of the time it's not.  Guess what, this was one of those times.

I daresay that a company like Netflix who is in the business of offering you "unlimited movies for $8/month" probably is making some conscious decisions to sacrifice reliability in the interest of cheapness too.

You know what?  Amazon made the right choice.  Know how I can tell?  Look here and see if you can see the derecho:


Maybe Netflix took it on the chin and the street is gonna punish it?  Nope:


We've been conditioned by our cell phones to accept crap for toll quality.  We're being conditioned by "the cloud" to accept crap for data availability.  In both cases the choice is made in the interest of saving money vs.doing it "right".  But here in the future, do the economics support a purely engineering driven solution?  No.

It's not the end of the world.


More information about the Tacos mailing list