Solved

Service availability percentage ?

Posted on 2014-11-28
9
134 Views
Last Modified: 2014-12-19
What are the standard value for service or system availability can define in the real way

I want to have my all the IT services  in terms of percentage ? example AD service 99% up time  like that . is there any industry standard on the value can define  . if I say 100% or 60% will not work in the reality  . please advice me on the way to define my availability based on any standard ?

please help me to justify the above
0
Comment
Question by:cur
  • 3
  • 3
  • 2
  • +1
9 Comments
 
LVL 30

Assisted Solution

by:Rich Weissler
Rich Weissler earned 215 total points
ID: 40470980
You've entered into the realm of Service Level Agreements (SLA) and definitions of System Availability, and there is a fair amount you could read on the topic.

Yes, you can absolutely define your IT Services in terms of a percentage, but be crystal clear about what you are measuring.  You would normally define a service ("Directory Services", or "Name Resolution", or "Print Services", for example)  Then define what constitutes available, for example, user logins in under five seconds, or print spoolers accepting output within ten seconds.  Any time your systems are not available to meet the minimum service requirements, they are considered unavailable.  Add up all the time they are unavailable, and divide that by the period of time you are agreed to be available.  

I've seen some less good ways of defining IT services as a percentage.  In one system, which was actually used at a former employer of mine, they added up the total down time of each server during the week, and divided by the number of seconds in a week.  This would give a percentage.  (When I failed to get across to the powers that be, why this wasn't a useful number, I turned off a test server and left it off for a few weeks.  My 'IT Service number" was negative from then on, but everything meaningful was available.)  The first measure I gave above, provided an incentive to build fault tolerant systems.  The second measure had a tendency to provide a disincentive for failover and load balanced clusters.

To get more information about how this should be measured, and the meanings of the measure, I'd suggest digging into Service Level Agreements (SLA), especially as it pertains to ITIL.  I think the best advise I could possibly give you on how to define availability and how to define the percentages, is to come to an agreement with your customers.  They may by (and frequently are) internal customers, and it may take a few iterations to find out what is important to the customers... but carefully define both (1) what constitutes the IT Service, and (2) what constitutes available, from the orientation of the customer.
0
 
LVL 23

Assisted Solution

by:David
David earned 214 total points
ID: 40471078
Continuing the above vein, your negotiations should distinguish between scheduled and unscheduled outages -- the latter being the "unavailable" time Razmus mentions. Another factor is whether or not the database is required in off hours.  It may be feasible and cost effective to move downtime to weekends if the business requirement is Monday to Friday.

It's one thing to do critical patching off-hours, but pretty abusive if you require your day-shift DBA(s) to routinely come on-site at 02.00 on a Sunday morning, just because the customer didn't want to cooperate.

As a business moves toward around-the-clock operation, the risk of an outage, and the cost to mitigate that outage, will both rise quickly.  Physical databases can be protected with real application clusters, and a site can be protected with a redundant hot database (DataGuard, Golden Gate, etc.) that's in a different geographical area.

"Industry standards" are meaningless, as the whole thing comes down the risk tolerance of those who are approving the budget.  The age of your physical equipment, the versions of your software, the reliability of your electrical system make each shop unique.  Perhaps you would benefit in thinking in terms of how much an outage would cost versus the cost of preventing one.
0
 
LVL 25

Assisted Solution

by:madunix
madunix earned 71 total points
ID: 40471340
The following points that you should look for in an SLA (between You and your Service provider); a bad SLA misses out on some of these:
- classification of issues: what is considered critical priority and what is considered low priority. If you have an issue you do not want to spend time on the classification, this should already be clear in the end you need to make sure that you decide the priority in case of disagreement, not the service provider
- response times: if you log an incident, how long before you get someone on the line that can help you? These response times are typically separated per priority.
- work around times: what does you service provider guarantee how quickly you are up and running again? Note that work around means that some other minor functionality may not work anymore.
- final solution times: how much before the problem is resolved.
- uptimes: can your service provider guarantee uptimes. Typically this is expressed as a percentage: 99%, 99.9% .. all the way up to 99.999% aka five nines. Five nines is typically used for telco grade fully redundant systems, allowing only 5 minutes down time per year. What is offered here should also be reflected back in your purchase agreement.
- penalty clauses: what happens if the service provider cannot meet the SLA? Does he offer a discount or gives you money back?
- performance reporting: does your service provider provide you monthly metrics on their service  performance
- escalation path: who can you call if you are not happy about the service? You may want to discuss multiple levels of escalation up to the highest official of the service provider's organization. Typically escalation levels are also agreed  if response, work-around and/or final solution times are exceeded by an agreed margin.

http://en.wikipedia.org/wiki/High_availability
http://www.techrepublic.com/article/build-your-sla-with-these-five-points-in-mind/
http://www.cisco.com/c/en/us/support/docs/availability/high-availability/15117-sla.html
0
The New “Normal” in Modern Enterprise Operations

DevOps for the modern enterprise offers many benefits — increased agility, productivity, and more, but digital transformation isn’t easy, especially if you’re not addressing the right issues. Register for the webinar to dive into the “new normal” for enterprise modern ops.

 

Author Comment

by:cur
ID: 40475208
Thank you for your valuable information .what will happen this down time in continues way ? 1 % of the down time  in the continue way ?  is it normal or we need to define that too ?

coz whole year working without any issue and only the last 3 days of the year fail  create the bad customer satisfaction ?
0
 
LVL 30

Assisted Solution

by:Rich Weissler
Rich Weissler earned 215 total points
ID: 40475759
The definition of how much downtime is 'acceptable', and the reciprocations of that downtime are all things which needs to be defined with your customer, and spelled out in a Service Level Agreement (SLA).  Normally getting from 99% to 99.9% to 99.99% uptime costs resources, and the customer/consumer of IT services is in the best position to determine how much the uptime is worth to them.
0
 
LVL 23

Assisted Solution

by:David
David earned 214 total points
ID: 40477113
Agreed, this is supposed to be a bilateral (two-way) agreement:  who gets what and when if terms are met, and if terms aren't met then.......  If it's not in the written document, it's not a requirement.

A customer who demands 99.999% uptime should pay (a lot) more than someone who is okay with weekend maintenance windows.  Again to the above points, how much risk is involved and how much extra cost is involved to mitigate that risk?  So it's not an "industry" thing, but rather a unique arrangement between a service provider and a consumer.

Two side comments:  it really, really helps for the service provider to practice and test their ability to deliver, prior to pricing things out.  It's great to have well-documented plans and standby systems, until you remember no one has bothered to read the fine manual.....
0
 

Author Comment

by:cur
ID: 40477789
my question is if I say 40 hours of down time ? can it be in one go 40 hours of down time  will not worth for the business  ?  I hope we need to define in the weekly basis or monthly basis downtime ? isn't it  . otherwise first 11 months can run without any issue and the last month will have all the down time
0
 
LVL 23

Assisted Solution

by:David
David earned 214 total points
ID: 40478896
In my opinion, yes, if you want, but I don't feel that's realistic.  I may be missing your point, sorry if so. Also, it's very significant if your "forty hours" is measured against 52 40-hour weeks or against 52 168-hour weeks.  How long does it take for you recover your hardware, software, and customer data in a total catastrophe?  What if your systems, networks, and backup are kept in a building suddenly condemned due to fire on another floor?  How willing are you to "insure" events completely outside of your control?

As your customer, I don't particular care whether you offer 40 hours down per year.  I care about your promised / contracted response to a service call within x number of minutes. I care about knowing the estimated recovery time, and updates to it.  I care about unscheduled outages daily because of a software bug.

I care about my service provider assuring me that s/he will "own" the problem, and not point to the LAN people then shrug, "not my problem".  HTH.
0
 
LVL 30

Accepted Solution

by:
Rich Weissler earned 215 total points
ID: 40478908
Absolutely valid concerns.  (And we used to joke about shutting down in December and just going home, because we'd already reached our target service level, and the service levels were defined without an eye on this.)  Bring this up, and determine what you and your customer consider reasonable and doable.  (Again, it's perfectly acceptable to have another department or group internal to a single company as your 'customer'.)
0

Featured Post

Best Practices: Disaster Recovery Testing

Besides backup, any IT division should have a disaster recovery plan. You will find a few tips below relating to the development of such a plan and to what issues one should pay special attention in the course of backup planning.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

One of the most frustrating experiences a help desk technician will ever encounter is when a customer comes to them with a solution of their own invention and expects the tech to implement it. This often happens when people with a little bit of tech…
Why pager replacement is still an issue OnPage has what some might call a “hate/hate” relationship with pagers. Not much room for love. As we see it, pagers are an antiquated bit of technology. Pagers are dinosaurs which, like most dinosaurs, sho…
Windows 10 is mostly good. However the one thing that annoys me is how many clicks you have to do to dial a VPN connection. You have to go to settings from the start menu, (2 clicks), Network and Internet (1 click), Click VPN (another click) then fi…
This is used to tweak the memory usage for your computer, it is used for servers more so than workstations but just be careful editing registry settings as it may cause irreversible results. I hold no responsibility for anything you do to the regist…

821 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question