- WEB STARTUPS
- WEB JOBS
- ALL TOPICS
Amazon Down Archive
From what I can tell, for the last 15 or so minutes, Amazon Web Services S3 simple storage service hasn’t been functioning correctly. I can’t get images to load correctly across a variety of sites and S3 servers.
Just seconds ago the Amazon health service dashboard added an error note, “2:40 PM PDT We are currently investigating an increase in error rates affecting S3.”
We will update this post as we learn more. Please report in if S3 storage is not functioning correctly for you.
Update from Amazon: 3:17 PM PDT The increased error rates lasted from 2:27 to 2:54 PM PDT. The service has fully recovered and is operating normally.
(now get back to work!)
The Amazon S3 storage service is currently showing elevated error rates. We’ve noticed several images not loading correctly and we’ve heard from multiple CN readers with the same issue on their sites. The issues are apparently only hitting the U.S. Standard centers — other S3 centers including Northern California, Europe and Asia are functioning correctly.
The Amazon service health dashboard shows an update as of 5:01 Pacific Time noting, “We are investigating elevated error rates.”
As of November 2009, Amazon S3 stored over 82 billion objects.
Leave a comment if you use the S3 service and are experiencing issues.
Amazon has posted an announcement regarding what happened last weekend with their S3 storage service and the downtime of nearly 8 hours. We covered the outage extensively here on CenterNetworks. Overall the downtime ran from 8:40am Pacific Time to 5:00pm Pacific Time. It’s cute how they call it an "availability event" – I need to add this to my list of synonyms for the words dead, down, outage and not working.
Here’s their final conclusion:
We’ve now determined that message corruption was the cause of the server-to-server communication problems. More specifically, we found that there were a handful of messages on Sunday morning that had a single bit corrupted such that the message was still intelligible, but the system state information was incorrect. We use MD5 checksums throughout the system, for example, to prevent, detect, and recover from corruption that can occur during receipt, storage, and retrieval of customers’ objects. However, we didn’t have the same protection in place to detect whether this particular internal state information had been corrupted. As a result, when the corruption occurred, we didn’t detect it and it spread throughout the system causing the symptoms described above. We hadn’t encountered server-to-server communication issues of this scale before and, as a result, it took some time during the event to diagnose and recover from it.
Overall I would say that Amazon did a good job in keeping everyone informed via their health status page. They have made some changes that will let their servers become more "chatty" in the future and hopefully prevent this type of outage and severe downtime from happening again.
Maybe it’s a new trend – servers going down! Today it appears that Amazon has decided to join the crowd. Amazon is currently down as of 2:15pm Eastern and has been down for at least a bit, reports several CN readers.
When I attempt to load amazon.com, I get:
Http/1.1 Service Unavailable
That’s right, big arse Amazon seems to wanna be hip and trendy so they took the service down, or should I say unavailable. We’ve all seen what Twitter can do with some downtime and other large Internet services are feeling left out. YouTube for example had an outage recently to raise their hip level.
We are considering taking CN down later today because heck, we are the coolest tech blog out there today!
On a serious note, where’s Amazon’s status reporting service? How much income is being lost each minute they are down? It does look like Amazon’s S3 is up based on files loading here on CN.
As usual, please report in if Amazon is down where you are.
Update: 4:05pm Eastern – Amazon now reports the following “pretty” outage message:
Update: 2:22pm Eastern – Twitter couldn’t let Amazon have the fun alone – they decided to go down as well!
Update: 2:25pm Eastern – Greg Sandoval has done the math – “Based on last quarter’s revenue of $4.13 billion, a full-scale global outage would cost Amazon more than $31,000 per minute on average.”