[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More


5 Lessons the Delta Outage Should Teach Us About Datacenter Security and Disaster Prevention

Published on
4,351 Points
1 Endorsement
Last Modified:
The Delta outage: 650 cancelled flights, more than 1200 delayed flights, thousands of frustrated customers, tens of millions of dollars in damages – plus untold reputational damage to one of the world’s most trusted airlines. All due to a catastrophic, cascading technical failure that apparently started with a “small fire” in Delta’s datacenter.
Multiple news outlets have relayed this story about the fire, so I can’t speak to how Delta has its IT network designed and deployed. But I can say three things for sure.
First, our hearts go out to Delta for having to go through the mother of all business disruptions. It’s a tribute to the organization’s leadership, tenacity and resourcefulness that just a few days later, they were back online and operating normally again.
Second, if what I’m reading is true, this entire mess may have been avoidable — or at least, easier contained.
Third, I was one of the Delta travellers last week that was inconvenienced by the outage.  It wasn’t fun.
Since our inception in 2011, we’ve been promoting cloud services as a means to decrease an organization’s risk. Much of the current cloud conversation is around cybersecurity and how, in our datacenters, we deploy state-of-the-art security measures by employing world-class security experts who have a command of best practices, the digital threat landscape and compliance standards.
But what we—and I dare say other cloud service providers—do not talk about nearly as often is disaster prevention. The term disaster prevention goes beyond disaster recovery (DR) and data backups, and yet most companies aren’t prepared for the unexpected.  We consider high availability in multiple datacenters to be “table steaks” in the modern cloud/infrastructure world.  This outage is proof that it’s not.
According to the Disaster Recovery Preparedness Benchmark, more than 60% of those who took the survey do not have a fully documented DR plan. Another 40% admitted that the DR plan they currently have did not prove very useful when it was called on to respond to their worst disaster recovery scenario.
Unfortunately, floods, tornados, storms, earthquakes, blackouts, and yes, fires happen. Theft and sabotage are security concerns, too. When a datacenter gets physically compromised, very expensive hardware (not to mention the sensitive data that resides on it) has a way of walking out the door. And in cases of in-house (on-premise) data centers, entire servers have been wiped in the hands of disgruntled IT staff.
And so, the lessons to learn from the Delta outages are:
1) Ensure physical security, safety and personnel security measures are in place - including having appropriate background checks and security clearance for employees, partners and vendors.
2) Ensure there are rigorously tested, proven failover protocols in place. If you are working with a cloud provider, clearly understand their failover offerings. For Concerto environments, automatic failover to another data center is included for mission-critical applications. Many providers sell this as an add-on service.
3) Compare your own organization’s SLA with that of a proven cloud provider. Too many companies who manage their own datacenters do so with an undefined SLA to their organization. Determine what is appropriate for your computing workloads and risk should disaster strike.
4) IT leaders must balance the uptime requirements and risk across a myriad of applications, and I respectfully suggest you treat Delta’s story as a call to action. It may be time to conduct a comprehensive audit of your datacenter security and disaster protocols, just to be sure. And if/when your organization wants to reduce your risk with an uncompromising “four nines” SLA and disaster prevention services —we’ll be here to help you find higher ground.
5) Have a solid communication plan in place for after something bad happens.  Hey, things will go wrong.  If everyone knows what to do (and what to say) and how to make it up to customers, it will help minimize the impact.
Related Information:
Comparing Cloud Providers: 10 Questions to Ask about Uptime
Microsoft SLA Uptime Service Credits: Decoding the Fine Print
Ten Vital Facts Every Exec Must Know About Cloud

By clicking you agree to the Terms of Use and Privacy Policy.

Join & Write a Comment

Concerto provides fully managed cloud services and the expertise to provide an easy and reliable route to the cloud. Our best-in-class solutions help you address the toughest IT challenges, find new efficiencies and deliver the best application expe…
Delivering innovative fully-managed cloud services for mission-critical applications requires expertise in multiple areas plus vision and commitment. Meet a few of the people behind the quality services of Concerto.

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month