Helder Almeida - Fotolia
Last week's AWS S3 outage wasn't the first for Amazon Web Services -- and probably not the last -- but the four-hour disruption was enough to cause customers and their channel partner advisers to reflect on cloud vulnerability.
On the customer side, the outage is likely to influence CIOs to weigh their options, noted SearchCIO features writer Jason Sparapani in his column, "Amazon cloud outage: A CIO survivor's guide." "Should they spread their applications among different cloud providers, invest more in hybrid cloud -- which combines use of the public cloud with an internal private cloud -- or find a way for apps to function even when a cloud provider doesn't?"
Industry analysts quoted in the article suggested CIOs evaluate their IT architectures, examine their incident response plans and investigate their dependencies on any given AWS region. The Feb. 28 incident originated in a data center in AWS' U.S. East-1 region.
Redundancy is one approach for mitigating the risk of a cloud outage. Strategies include using multiple regions within a single cloud provider. The AWS cloud operates within 16 regions around the world, six of which are in North America.
In his article, "Amazon S3 outage spotlights disaster recovery tradeoffs," Trevor Jones, news writer for TechTarget's Data Center and Virtualization Media Group, covers other redundancy safeguards: "IT shops also can rely on a range of disaster-recovery-as-a-service tools on the market. There also are techniques to spread workloads across regions and to back them up in other public clouds or on premises."
Partners react to AWS S3 outage
From a partner perspective, Greg Pierce, chief cloud officer at Concerto Cloud Services, a managed cloud provider based in Tampa, Fla., suggested the AWS Simple Storage Service (S3) outage argues in favor of a hybrid cloud model.
Greg Piercechief cloud officer, Concerto Cloud Services
"Though the public cloud has many fantastic benefits, the recent AWS outage shows that for true mission-critical applications that companies rely on for revenue, a hybrid approach with private cloud is an absolute necessity," he said. "Companies cannot put all of the proverbial eggs in one basket -- with public cloud being the basket."
Pierce said organizations are increasingly moving to a hybrid cloud model, even pulling back from 100% public cloud deployments. He said the hybrid approach lets enterprises place "Mode 1" applications -- or business-critical apps -- in a private cloud, while putting "Mode 2" applications and disaster recovery footprints in the public cloud.
But given the relative infrequency of hourslong cloud blackouts, are customers willing to pony up for greater redundancy? According to Jones' article, the answer is often no.
"Many customers are willing to work without a net, to a certain degree, after weighing the cost and complexity that comes with such high levels of redundancy," he wrote.
Yet, the cost of hedging one's cloud bets may be worth it to a business particularly vulnerable to cloud outages, such as an e-tailer. Pierce said the higher uptime offered via a private cloud generates tangible savings, when considering the potential for lost sales.
"Forty-seven percent of customers who go to an e-commerce website and find it's down never return," Pierce said. "How many collective customers were lost on the last day of the month in February? It's massive. The right mix of hybrid can easily and cost-effectively prevent that from happening."
A lack of awareness?
As it turns out, cost may not be the only factor keeping customers away from redundancy measures, some of which are available from the cloud providers themselves. In the case of S3, customers can avail themselves of AWS' Cross-Region Replication (CRR) feature, said Dan Robinson, senior engagement manager at TriCore Solutions LLC, a consulting and managed cloud services provider based in Norwell, Mass.
Businesses using CRR will incur extra storage costs in the different AWS regions, Robinson noted.
But, he said, those costs pale in comparison to the cost of an outage.
But if CRR is there for the asking, why were so many companies evidently operating without it? Robinson, who said he was taken aback at the number of companies affected by the AWS S3 outage, thinks many organizations were simply unaware of the Cross-Region Replication capability.
"It does come down to a lot of the companies in the cloud don't fully understand what is out there and what is available," Robinson said. "If people were just aware of what they could do with S3, [the outage] would have been a nonissue for the majority of people."
S3 CRR, however, isn't without its nuances. Applications using S3 as the back end would require some work in the event of an outage, Robinson said. S3 uses a global namespace, a feature that aims to simplify the management of distributed file systems. As a consequence, applications using S3 would need to be repointed to the target S3 bucket, Robinson said. Buckets are S3's logical storage units, where customers can upload their data. The replication scenario is simpler if customers are using S3 just as storage, he noted.
The bottom line: How customers use S3 CRR to work through an outage will depend on how they employ S3 to begin with, Robinson said.
Weighing costs and benefits
Channel partners, in their role as IT advisers, could find themselves in a position to help customers think through the implications of a cloud blackout and what steps they need to take to protect themselves.
Robinson said organizations can conduct a cost-benefit analysis to determine what level of redundancy and what type of disaster recovery plan will best serve their needs. Recovery time objectives, recovery point objectives and service-level agreements all play a role in determining whether an organization needs to build a hybrid cloud for protection, work with multiple cloud providers, opt for intra-cloud redundancy or pursue some other alternative.
In some cases, the customer's preferred alternative may be to remain content with the cloud's out-of-the-box reliability. Lydia Leong, a vice president and distinguished analyst with the IT Leaders group at Gartner, said many customers looking at S3's reliability, which she said has been very high, "make the decision that they don't want to pay for replication. [They'll] deal with acceptable risk."
Leong said some customers will probably re-evaluate the AWS S3 outage and their cloud risk along the following lines: "Can [they] live with an outage like this every couple of years, or do they feel like they need more redundancy? And, if so, what's the best way to get that?"
Robinson likened a cloud protection plan to purchasing insurance.
"It's an insurance policy, and the insurance policy can be very high or very low in terms of costs," he said.
Additional reporting by Jason Sparapani.
Read about the shifting benefits of cloud computing
Find out about the distribution of cloud risks within a business
Learn about the CIO's multiple cloud roles