The following blog is by J. Peter Bruzzese, a Microsoft MVP (Exchange/Office 365), technical author/journalist/and speaker for Microsoft and others. For nearly a decade, he wrote the Enterprise Windows column for InfoWorld. J. Peter is the co-founder of both ClipTraining and Conversational Geek. He’s a strategic technical consultant for Mimecast. You can find him on Twitter at: @JPBruzzese.
A major outage in the US takes down a key Microsoft datacenter and a host of cloud services in the process. What to do when the “cloud” goes down?
Every vendor offering a cloud-based solution pours ungodly amounts of money into redundancy to ensure a single failure or even multiple failures go unnoticed by customers connected to their services. For months, it appears as if nothing can go wrong. And then…it does.
This week, Microsoft experienced Azure and Office 365 outages due to severe weather (lightning) taking out cooling systems in data centers located in San Antonio, Texas. This forced servers and services to shut down. The outage was focused on the South-Central U.S., but it affected customers around the globe. More specifically, the outage affected Exchange, SharePoint, Teams and a variety of other solutions with Azure AD being a problem for identity management, as well (which connects back to Office 365).
After most services were restored, customers were receiving error messages for Outlook and Skype saying they were being throttled due to a change to Azure AD for Office 365 authentication.
Without belaboring the situation, the real question is: “What did we learn from this outage?”
Cloud “haters” will tell you to avoid the cloud. That’s ridiculous at this stage of the game. When an airline has an incident do we stay out of the air? No, we learn from the failure. When it comes to cloud-based solutions, it’s important to understand that there is no perfect world where services never go down. Azure and Office 365 have gone down and will continue to go down. Microsoft will learn and improve, and we appreciate their efforts. But what does it mean when you have to cope with reality when an outage hits?
You may have a recovery plan for your on-prem environment – what happens when you experience a cloud outage? Do you have a plan to recover?
J. Peter continues his IT Admin's Guide to O365 Continuity, and recovery strategies for Mimecast customers, over at the Mimecast blog.