Sunday, May 1, 2011

Amazon's lengthy cloud outage shows the danger of complexity

Amazon's lengthy cloud outage shows the danger of complexity

"Reddit, Foursquare, and Quora were among the many sites that went down recently due to a prolonged outage of Amazon's cloud services. On Thursday April 21, Amazon Elastic Block Store (EBS) went offline, leaving the many Web and database servers depending on that storage broken. Not until Easter Sunday (April 24) was service restored to all users.

Amazon has now published a lengthy description describing what went wrong, and why the failure was both so catastrophic and so lengthy.

Amazon has cloud computing data centers in five locations around the world; Virginia, Northern California, Ireland, Singapore, and Tokyo. Within each region, services are divided into what the company calls Availability Zones: physically and logically separate groups of computers.

This design allows customers to pick a level of redundancy that they feel is most appropriate; hosting in multiple regions provides the most robustness, but at the highest cost. Hosting in multiple Availability Zones within the same region is cheaper, and guards against problems affecting any one zone.

- Sent using Google Toolbar"

No comments: