As some of you probably noticed and others read about, users and applications utilizing Amazon S3’s Cloud Computing infrastructure experienced service interruptions. SecoBackup users have been largely unaffected from the interruption in Amazon S3’s service. This is largely a result of special resilience built into software that can handle interruptions, network timeouts, temporary disconnects and general error rates from web services. More on this later. Lets start with the outage itself. Back in February, Alex Iskold had blogged about how Amazon’s infrastructure continues to be the #1 Cloud Computing infrastructure and how Web Services are here to stay and grow. To quote him :-
We are witnessing a fundamental shift in our ability to compute and this is just the beginning. Amazon is at the forefront of making massively parallel, web scale compute services available to the world.
Today’s downtime is both similar to the one in February and also dissimilar in other ways.
Business as usual
This is the first major outage since February and probably the third major outage since the inception of Amazon S3 more than 2 years back in March 2006. Its similar in that its comes after several months of uninterrupted operations of cloud computing service, the biggest and the most popular of its kind. It continues to be true that Amazon’s Cloud Computing infrastructure is probably more reliable and available than a home grown LAMP stack implementation (with the exception of the few experts in this field). To quote Alex again from February :-
The truth is that we cannot do it better than Amazon. They spent a massive amount of money, talent and most importantly time, trying to solve this problem.
As Werner Vogels showed in his blog, Amazon S3 is growing at a tremendous pace, now at more than 14 billion objects at S3.
Cloud Computing on the move
Cloud computing has been evolving all this time. Many things have changed. One of the big changes is greater transparency. Amazon web services now have a status page that had up to the minute updates on what the status and progress on the issues were throughout the day. Kudos to AWS for the transperency! Here is a screen shot of what it says at the end of the day (now that the service has been restored).
Amazon has SLAs and will credit you back based on the downtime.
How SecoBackup Community Edition Complements cloud computing
SecoBackup Community Edition, provides free online backup for PCs and Servers to Amazon S3 with compression, de-duplication and encryption built into it. Many users use SecoBackup as a more reliable and less expensive alternative to perform server backups to Amazon S3 with no limitations – for example, it can back up 10GB sized files to S3 without running into Amazon S3’s size limit of 5GB.
A few of users reported that they were experience difficulty to access Amazon S3, but otherwise the backup application was functioning normally. During the outage, SecoBackup automatically detected that Amazon S3 was not reachable and deferred the backups to S3 to later. When Amazon S3 server became operational, SecoBackup automatically detected and uploaded the new files created/updated during the outage. Here’s a screenshot of one of our laptops that had a presentation that was saved during the outage. After the S3 service, SecoBackup automatically detected connectivity to Amazon S3 and backed it up to Amazon S3. Here’s the screenshot -
SecoBackup is specifically designed to run seamlessly and recovery automatically after service interruptions. Service interruptions can occur due to many reasons :-
- Network connectivity problems in your local LAN, at your ISP or at Amazon S3
- Planned and Unplanned Downtime of Amazon S3 servers
- Laptop roaming. Laptops may get disconnected from a network, or may change network adapters. A laptop may be offline for sever hours or several weeks, during which new files may get created or existing files may be updated
Cloud Computing and WAN in general is affected by these well known issues. With software like SecoBackup that utilizes Cloud Computing infrastructure, designing for this becomes critical. Since SecoBackup was built specifically for Cloud Computing, it really shines in this area.
SecoBackup tracks changes to your local files even while Amazon S3 Service is unavailable. SecoBackup automatically reconnects the application to S3 when the service becomes accessible or available again. Many of the capabilities built into SecoBackup nicely complement Cloud Computing. Here are a few :-
- Network Awareness. SecoBackup automatically tracks the status of the network and connectivity to Amazon S3. This allows SecoBackup to perform operations in the context of the status of the network.
- Tracking Changed Files while offline. Even when the network is unavailable, SecoBackup is tracking the changed files. This architecture guarantees that no files are missed.
- Queueing of backup tasks. Backup tasks are queued up for transmission to Amazon S3, the queue is persistent and recoverable. After a machine reboot or a network reconnect, the status of the queue is fully recovered.
- Checkpointing. Backup tasks are checkpointed at a fine level of granularity. For large files, say 100GB files, checkpointing can occur at fine granularity. For example, if there is a network disconnect after 25.555GB of data is uploaded, then after network reconnect, backup resumes at the 25.555GB mark! No wastage of bandwidth in retries.
- Design for Failure. In the wide area internet, failures such as time outs, short bursts of disconnects, DNS problems are common. SecoBackup is designed to deal with failure to minute detail.
With Cloud Computing infrastructure, failures are a given. Amazon S3’s outages have been few and far between, we continue to think very highly of Amazon S3’s robustness of their service and Werner Vogel’s vision for cloud computing. Software like SecoBackup, that is built specially for cloud computing can nicely complement Amazon S3’s Web Services.
Please feel free to comment on this post to share your experiences with Amazon S3, esp with SecoBackup.