SecoBackup’s resilience and today’s Amazon downtime
As some of you probably noticed and others read about, users and applications utilizing Amazon S3’s Cloud Computing infrastructure experienced service interruptions. SecoBackup users have been largely unaffected from the interruption in Amazon S3’s service. This is largely a result of special resilience built into software that can handle interruptions, network timeouts, temporary disconnects and general error rates from web services. More on this later. Lets start with the outage itself. Back in February, Alex Iskold had blogged about how Amazon’s infrastructure continues to be the #1 Cloud Computing infrastructure and how Web Services are here to stay and grow. To quote him :-
We are witnessing a fundamental shift in our ability to compute and this is just the beginning. Amazon is at the forefront of making massively parallel, web scale compute services available to the world.
Today’s downtime is both similar to the one in February and also dissimilar in other ways.
Business as usual
This is the first major outage since February and probably the third major outage since the inception of Amazon S3 more than 2 years back in March 2006. Its similar in that its comes after several months of uninterrupted operations of cloud computing service, the biggest and the most popular of its kind. It continues to be true that Amazon’s Cloud Computing infrastructure is probably more reliable and available than a home grown LAMP stack implementation (with the exception of the few experts in this field). To quote Alex again from February :-
The truth is that we cannot do it better than Amazon. They spent a massive amount of money, talent and most importantly time, trying to solve this problem.
As Werner Vogels showed in his blog, Amazon S3 is growing at a tremendous pace, now at more than 14 billion objects at S3.

Cloud Computing on the move
Cloud computing has been evolving all this time. Many things have changed. One of the big changes is greater transparency. Amazon web services now have a status page that had up to the minute updates on what the status and progress on the issues were throughout the day. Kudos to AWS for the transperency! Here is a screen shot of what it says at the end of the day (now that the service has been restored).

Amazon has SLAs and will credit you back based on the downtime.
How SecoBackup Community Edition Complements cloud computing
http://secobackup.com/products-secobackup.html
SecoBackup Community Edition, provides free online backup for PCs and Servers to Amazon S3 with compression, de-duplication and encryption built into it. Many users use SecoBackup as a more reliable and less expensive alternative to perform server backups to Amazon S3 with no limitations – for example, it can back up 10GB sized files to S3 without running into Amazon S3’s size limit of 5GB.
A few of users reported that they were experience difficulty to access Amazon S3, but otherwise the backup application was functioning normally. During the outage, SecoBackup automatically detected that Amazon S3 was not reachable and deferred the backups to S3 to later. When Amazon S3 server became operational, SecoBackup automatically detected and uploaded the new files created/updated during the outage. Here’s a screenshot of one of our laptops that had a presentation that was saved during the outage. After the S3 service, SecoBackup automatically detected connectivity to Amazon S3 and backed it up to Amazon S3. Here’s the screenshot -

SecoBackup is specifically designed to run seamlessly and recovery automatically after service interruptions. Service interruptions can occur due to many reasons :-
- Network connectivity problems in your local LAN, at your ISP or at Amazon S3
- Planned and Unplanned Downtime of Amazon S3 servers
- Laptop roaming. Laptops may get disconnected from a network, or may change network adapters. A laptop may be offline for sever hours or several weeks, during which new files may get created or existing files may be updated
Cloud Computing and WAN in general is affected by these well known issues. With software like SecoBackup that utilizes Cloud Computing infrastructure, designing for this becomes critical. Since SecoBackup was built specifically for Cloud Computing, it really shines in this area.
SecoBackup tracks changes to your local files even while Amazon S3 Service is unavailable. SecoBackup automatically reconnects the application to S3 when the service becomes accessible or available again. Many of the capabilities built into SecoBackup nicely complement Cloud Computing. Here are a few :-
- Network Awareness. SecoBackup automatically tracks the status of the network and connectivity to Amazon S3. This allows SecoBackup to perform operations in the context of the status of the network.
- Tracking Changed Files while offline. Even when the network is unavailable, SecoBackup is tracking the changed files. This architecture guarantees that no files are missed.
- Queueing of backup tasks. Backup tasks are queued up for transmission to Amazon S3, the queue is persistent and recoverable. After a machine reboot or a network reconnect, the status of the queue is fully recovered.
- Checkpointing. Backup tasks are checkpointed at a fine level of granularity. For large files, say 100GB files, checkpointing can occur at fine granularity. For example, if there is a network disconnect after 25.555GB of data is uploaded, then after network reconnect, backup resumes at the 25.555GB mark! No wastage of bandwidth in retries.
- Design for Failure. In the wide area internet, failures such as time outs, short bursts of disconnects, DNS problems are common. SecoBackup is designed to deal with failure to minute detail.
With Cloud Computing infrastructure, failures are a given. Amazon S3’s outages have been few and far between, we continue to think very highly of Amazon S3’s robustness of their service and Werner Vogel’s vision for cloud computing. Software like SecoBackup, that is built specially for cloud computing can nicely complement Amazon S3’s Web Services.
Please feel free to comment on this post to share your experiences with Amazon S3, esp with SecoBackup.
Thanks!
Scott said,
July 21, 2008 @ 4:49 am
We noticed an alert on the SecoBackup application during the morning time. But it went away in the evening.
How much time before SecoBackup will detect that Amazon S3 connectivity is restored?
Scott
SecoBackup said,
July 21, 2008 @ 4:53 am
SecoBackup will detect connectivity to S3 within a few seconds of it appearing again.
A simple way to test this is to disconnect your network cable for a few minutes, and then plug it back in.
SecoBackup will deal with disconnects in a clean and robust manner.
Thanks,
The SecoBackup Team
Ravi R. said,
July 21, 2008 @ 4:57 am
We set up several new directories to backup during the day and noticed that uploads were not done. It appears that we were able to connect back to S3 sometime in the afternoon, and SecoBackup has since uploaded all the new files we added. We love the disconnected capabilities of SecoBackup. Essentially we dont have to worry about the network or S3 when it comes to backup.
backup is the perfect application for cloud computing right now. We care about backups eventually happenning. 6 hours od delay in backup is just fine for us.
if you are familiar with other backup services, you might know that restores and backups can take up to days. Amazon S3 and SecoBackup is a godsend to the backup world.
–RR
Scott said,
July 21, 2008 @ 5:01 am
Thanks SecoBackup,
its great to know that you can detect a network reconnects in seconds and transmit the remaining changes to Amazon S3.
I have always wondered how partially uploaded files work. How does SecoBackup know that the file is already partially uploaded and start from where it left off. How about the 5GB limit?
Scott
Scott said,
July 21, 2008 @ 5:03 am
another good note on this subject.
http://www.readwriteweb.com/archives/more_amazon_s3_downtime.php
linking here for reference of readers…
Administrator said,
July 21, 2008 @ 3:42 pm
Hi Scott,
SecoBackup is designed with Small Business and Enterprises in mind. SecoBackup lets you back up files of any size. Customers commonly use it for large backup archives or exchange backups with several 10s of GB of file size.
SecoBackup particularly shines at dealing with large volume of data and large sized files. If there is a network error in the middle of a large file backup (like yesterdays outage), SecoBackup will have checkpointed the upload and when network connectivity is restored, it picks up from the point it left off. This happens in a reliable and robust manner, working through multiple and elongated network failures, machine reboots, login/logouts or any other event.
A design philosophy of SecoBackup is to failure resilience. Through any service interruptions, SecoBackup does the right thing without your intervention!
Seth Caldwell said,
July 21, 2008 @ 10:10 pm
When I first started using S3 I thought it was the most amazing thing ever. Within 24 hours, I had my entire web imaging system (heavy user contribution) using s3. However, I wish amazon’s .NET methods had included some setup in web.config that would have specified a local cache folder should their service ever go down. I’m programming this myself now, and using a getFileLink() method around any s3 hosted file. It checks s3 status every 5 minutes. If s3 is down, it uses the local cache for the next 5 minutes.
Essentially, it adds our own local servers to the ‘cloud’. I’m surprised s3 didn’t provide this feature in the first place.
Seth – http://www.collarfree.com
Egbert said,
August 22, 2008 @ 4:00 pm
I guess that for a backup service, the interuption of the backup process is not too much of an immediate business issue. It is a disaster remediation process, and not core business. HOWEVER, in times of need (the disaster occurred) where the RESTORE is required immedately, that’s when you are actually hit by the S3 downtime.