0
0
0
---
关于Amazon云宕机的网贴收集
陈皓 发表于 2011年04月27日 22:49 | Hits: 3641
最近,互联网上最大的事可能是Amazon的AWS宕机了,而且好几天都没有完全恢复。整个Internet都在讨论这个事,Internet很不高兴,后果可能很严重。可能是因为这个事件对中国没有影响,所以中文这边相关的文章不多,大家可以参考一下和讯网的这篇《伤不起!亚马逊史前最大宕机事件的启示》。
国外有人把所有和这个事件相关的贴子都收集了起来,都是一些相当不错的贴子和文章,尤其是一些经验教训的贴子,很受教,转给大家看看。这个贴子的来源在这里。
个别公司的经历,有好有坏
- How Heroku Survived the Amazon Outageon the Heroku status page
- How SimpleGeo Stayed Up During the AWS Downtimeby Mike Malone
- How SmugMug survived the Amazonpocalypseby Don MacAskill (Hacker Newsdiscussion)
- How Bizo survived the Great AWS Outage of 2011 relatively unscathed…by Someone at Bizo
- Joe Stump’s explanationof how SimpleGeo survived
- How Netflix Survived the Outage
- Why Twilio Wasn’t Affected by Today’s AWS Issueson Twilio Engineering’s Blog (Hacker Newsthread)
- On reddit’s outage
- What caused the Quora problems/outage in April 2011?
- Recovering from Amazon cloud outageby Drew Engelson of PBS.
- PBS was affected for a while primarily because we do use EBS-backed RDS databases. Despite being spread across multiple availability-zones, we weren’t easily able to launch new resources ANYWHERE in the East region since everyone else was trying to do the same. I ended up pushing the RDS stuff out West for the time being. From Comment
Amazon Web Services 讨论区
有一些有经验的人共享了很多相当不错的宕机的经历。
- Amazon Web Services Discussion Forum
- Cost-effective backup plan from now on?
- Life of our patients is at stake – I am desperately asking you to contact
- Why did the EBS, RDS, Cloudformation, Cloudwatch and Beanstalk all fail?
- Moved all resources off of AWS
- Any success stories?
- Is the mass exodus from East going to cause demand problems in the West?
- Finally back online after about 71 hours
- Amazon EC2 features vs windows azure
- Aren’t Availability Zones supposed to be “insulated from failures”?
- What a lot of people aren’t realizing about the downtime:
- ELB CNAME
- Availability Zones were used in a misleading manner
- Tip: How to recover your instance
- Crying in Forum Gets Results, Silver-level AWS Premium Support Doesn’t
- Well-worth reading: “design for failure” cloud deployment strategy
- New best practice
- Don’t bother with Premium Support
- Best practices for multi-region redundancy
- “Postmortum“
- Learning from this case
- Amazon, still no instructions what to do?
- Anyone else prepared for an all-nighter?
- Is Jeff Bezos going to give a public statement?
- Rackspace, GoGrid, StormonDemand and Others
- Jeff Barr, Werner Vogels and other AWS persons – where have you been???
- After you guys fix EBS do I have do anything on my side?
- Need Help!!! Lives of people and billions in revenue are at risk now!!!
- I’ve Got A Suspicion
- Farewell EC2, Farewell
There were also many many instances of support and help in the log.
总结
- Amazon EC2 outage: summary and lessons learnedby RightScale
- AWS outage timeline & downtimes by recovery strategyby Eric Kidd
- The Aftermath of Amazon’s Cloud Outageby Rich Miller
立场:这是用户的错
- So Your AWS-based Application is Down? Don’t Blame Amazonby The Storage Architect
- The Cloud is not a Silver Bulletby Joe Stump (Hacker Newsthread)
- The AWS Outage: The Cloud’s Shining Momentby George Reese (Hacker Newsdiscussion)
- Failing to Plan is Planning to Failby Ted Theodoropoulos
- Get a life and build redundancy/resiliency in your appson the Cloud Computing group
立场:这是Amazon的错
- Stop Blaming the Customers – the Fault is on Amazon Web Servicesby Klint Finley
- AWS is down: Why the sky is fallingby Justin Santa Barbara (Hacker Newsthread)
- Amazon Web Services are down– Huge Hacker News thread
教训和启示
- People Using Amazon Cloud: Get Some Cheap Insurance At Leastby Bob Warfield
- Basic scalability principles to avert downtimeby Ronald Bradford
- Amazon crash reveals ‘cloud’ computing actually based on data centersby Kevin Fogarty
- Seven lessons to learn from Amazon’s outageBy Phil Wainewright
- The Cloud and Outages : Five Key Lessonsby Patrick Baillie (Cloud Computing Groupdiscussion)
- Some thoughts on outagesby Till Klampaeckel
- Amazon.com’s real problem isn’t the outage, it’s the communicationby Keith Smith
- How to work around Amazon EC2 outagesby James Cohen (Hacker Newsthread)
- Today’s EC2 / EBS Outage: Lessons learnedon Agile Sysadmin
- Amazon EC2 has gone down -what would a prefered hosting platform be?on Focus
- Single Points of Failureby Mat
- Coping with Cloud Downtime with Puppet
- Amazon Outage Concerns Are Overblownby Tim Crawford
- Where There Are Clouds, It Sometimes Rainsby Clay Loveless
- Availability, redundancy, failover and data backups at LearnBoostby Guillermo Rauch
- Cloud hosting vs colocationby Chris Chandler (Hacker Newsthread)
- Amazon’s EC2 & EBS outageby Arnon Rotem-Gal-Oz
Vendor很生气
- Amazon Outage Proves Value of Riak’s Visionby Basho
- Magical Block Store: When Abstractions Fail Usby Mark Joyent (Hacker Newsdiscussion)
- On Cascading Failures and Amazon’s Elastic Block Storeby Jason
- An unofficial EC2 outage postmortem – the sky is not fallingfrom CloudHarmony
(全文完)
相关文章
- 2011年04月28日 --Amazon的书为什么卖到了$2000万
我要给这篇文章打分:
可以不填写评论, 而只是打分. 如果发表评论, 你可以给的分值是-5到+5, 否则, 你只能评-1, +1两种分数. 你的评论可能需要审核.