Office 365: Is 99.9% = reliable enough?

Per below, 99.9% up-time = 43.8 minutes / month being down.

For anyone who is presently using Office 365, is that 43.8 minutes typically in the form of planned Sunday maintenance, or sporadic down-time throughout the business day?

Thanks for any feedback or experiences.  We are seriously considering a migration from our on-premise exchange to Office 365 and want to hear from existing customers.

Thanks,
Mike

Here's the nines:

Availability %               Downtime/yr      Downtime/mo      Downtime/wk
90% ("one nine")      36.5 days      72 hours      16.8 hrs
95%      18.25 days      36 hours      8.4 hours
97%      10.96 days      21.6 hours      5.04 hours
98%      7.30 days      14.4 hours      3.36 hours
99% ("two nines")      3.65 days      7.20 hours      1.68 hrs
99.5%      1.83 days      3.60 hours      50.4 minutes
99.8%      17.52 hours      86.23 minutes      20.16 minutes
99.9% ("three nines")      8.76 hours      43.8 minutes      10.1 min
99.95%      4.38 hours      21.56 minutes      5.04 minutes
99.99% ("four nines")      52.56 minutes      4.32 minutes      1.01 min
99.999% ("five nines")      5.26 minutes      25.9 seconds      6.05 sec
99.9999% ("six nines")      31.5 seconds      2.59 seconds      0.605 sec
99.99999% ("seven nines")3.15 seconds      0.259 seconds      0.0605 sec
[MDT: 100%    Perfection     0             0                       0  ]
mike2401Asked:
Who is Participating?
 
tigermattCommented:
The precise details are something you would need to flesh out with professional legal advice dependent upon the terms in the Office 365 contract you establish between Microsoft and yourselves. All of the following must be taken with the caveat that I am not a lawyer.

However, in a nutshell, no, the intention is not that the service is down routinely on Sunday afternoons -- Microsoft, nor any other cloud services provider, would not have a business case if, at their scale, they were unable to conduct routine (or either non-routine) maintenance "online" while actively serving traffic.

The reliability instead normally relates to the guaranteed Service Level Agreement (SLA) after which you are entitled to compensation for downtime. Again, the specifics are in the terms; I haven't checked the O365 terms for a while, but if memory serves, credits were awarded based on the level of SLA breach, with a full subscription fee reimbursement for any given month in which the SLA should be substantially broken. This includes partial SLA breaches, i.e. the service is operational but aspects (e.g. inbound mail delivery) are temporarily unavailable.

The small print will typically further restrict liability to be the subscription fee; if the service were down for a substantial period, vendors typically offer no recourse to sue them for missing quoting for the $100,000 job which happened to turn up while the email platform was down, or for loss of productivity of staff. You just get back the monthly fee, a bitter taste in your mouth, and that's that.

The only 100% reliable system is the system which is perpetually switched off, and hence reliably unreliable. A 100% reliable, working system is simply not a realistic proposition, so one simply has to take steps to mitigate risk and recover quickly from failure. The track record of Office 365 has been good, in the high nines, and one assumes that dropping below this does not compute with the Microsoft board due to the loss of face in the competitive cloud market it would cause.

The risks you have to balance are quite independent of technical decisions, but rather wider business decisions:

Does the provided SLA and compensation arrangement offer sufficient recourse for your organization, in the unlikely event you have to claim (and are successful)?
Is there a substantial risk that heads would roll if the unthinkable were to happen, mail were to fail for a day and the company loses money?
What are the data backup arrangements? Are you expected to make backups, or are offline backups to tape made and archived on your behalf?
What are the implications for vendor lock-in? Can you move to a competing solution (which may not even exist yet) by taking a platform-independent dump of your data, or does this involve a fee?

I should also point out arguments on the basis of "Microsoft build the system and they are pros so it will be fine" gloss over important details typically to cause the reader to side with the "grass is greener in the cloud" viewpoint. In any event, this is not the guarantee backed up by the legalese, so the argument is moot.
In particular, while I agree Microsoft should know how to run a mail environment, their environment is substantially more complex than a small handful of servers in the company machine room. Failures in the cloud might be infrequent, but the increased complexity of the service means failures -- if they do occur -- are often more substantial. With such a large corporation, gaining access to people "in the know" during such circumstances could also be challenging. The famous 2-day Amazon AWS outage in 2011 is a constructive example.

I am not anti-Office 365 or anti-cloud services. I think they are great, when used appropriately. I am, however, anti the sensationalist reporting of cloud services, which is typically marketing hype rather than anything technically constructive. I encourage my customers -- and indeed, encourage you -- to consider the business interest in shifting to the cloud, rather than the technical benefit and to carefully consider the aspects of service delivery on which you cannot place a price. I also take extra special care not to jump simply because the pushy salesperson for <INSERT CLOUD SALES VENDOR HERE> needs to meet their end-of-month bonus.

--

tl;dr: for most organizations, O365 is cheaper, more cost effective, avoids the need to maintain on-site infrastructure by offering a consistent monthly payment, offloading operational concerns of hardware, power, licensing, cooling, patching, securing, etc. to somebody else. The interface with that "somebody else" needs to be carefully defined; if dealing with a large corporation, or a partner who resells the products of a large corporation, their SLA, liability in the event of outage, and ability to provide personal customer service should be carefully considered beyond the purported monthly cost savings.
0
 
Ricky MartireIT Design ArchitectCommented:
Hi,

Yes it is worth it, hasn't gone down yet for us!  What is more likely than Office365 to go down is your internet connection usually.  That is the bigger risk than the entire service going off.
0
 
AmitIT ArchitectCommented:
First watch this MS video.
http://channel9.msdn.com/Series/Migrating-to-Exchange-Online/01

This might be able to answer lot of your queries. Whether your organization is fit for Office 365 or not. Also, note office 365 servers are in vendor control with no user or admin intervention. So, it is easy to achieve 99.9% up time.
0
Problems using Powershell and Active Directory?

Managing Active Directory does not always have to be complicated.  If you are spending more time trying instead of doing, then it's time to look at something else. For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why

 
rgormanCommented:
On a given month you won't necessarily have 43 minutes of outage, it usually happens pretty infrequently and when it does I have seen the issue last for hours before we could reconnect to which ever service was affected.  Typically it isn't all the services though.  Also, what we have seen is that if there is an email issue, only some of the users are affected since mailbox distribution is across many servers and only some servers are typically affected at a time.

For the most part it has been a pretty reliable service for all those I have setup on it, and for myself.
0
 
Rob HensonFinance AnalystCommented:
I suspect the numbers quoted are purely mathematical.

For the "three nines" line I suspect the 8.76 hours is the downtime in the last 12 months. If you divide that by 12, you get the 43.8 minutes, ie a purely mathematical monthly average.
0
 
mike2401Author Commented:
Great video @Amit
0
 
mike2401Author Commented:
That's a really huge point I didn't think about @rgorman: if the mailboxes are distributed across many different servers, yes, it is likely that a problem might not affect ALL users.
0
 
mike2401Author Commented:
Wow @tigermatt :  Great points. Especially about the nines relating to SLA and cash back. We don't  want money back if the service is down, we simply want it not to be down :-)
0
 
mike2401Author Commented:
Yes @Rob Henson: just math.  It was not meant to suggest that every customer will have 48.3 minutes of downtime.
0
 
mike2401Author Commented:
Thank you everyone, great comments!

-Mike
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.