Link to home
Start Free TrialLog in
Avatar of Pau Lo
Pau Lo

asked on

Service availability management basics

3 questions;

1) If you were looking independently at an IT departments adherence to good practice service “availability management”, - in this question the service is file & print services and database services (sql-server, oracle rdbms) etc, what specifically at ground level should you be looking for. What metrics indicate good availability management, what evidence indicates good availability management. What exactly should admins be doing for good availability management.

2) And second part, what does poor availability management look like. What sort of lack of procedures leads to poor availability management I.e can you provide some practical examples of poor availability management? What is bad availability management, what metrics indicate bad availability management?

3) Can you give a non-IT guide to what technically you are doing different between availability mgmt. and performance mgmt.
Avatar of Keith Alabaster
Keith Alabaster
Flag of United Kingdom of Great Britain and Northern Ireland image

This is a home work question - and I won't do those. You'll just get a redirect to something like  Microsoft's MOF and SMF and can read for yourself.

If you'd like to rephrase into a way that expresses the issue you have to address then I will.
Avatar of Pau Lo
Pau Lo

ASKER

Your suspicions are wrong
Avatar of Pau Lo

ASKER

You may as well delete this question you've wrongly labelled it a homework question (again) its actually a risk team asking for some input from IT responsible for service delivery for some input based on experience in that field, always useful to hear tales from people in the field but seemingly that's not permitted anymore. There's no points to be rewarded for the above post and that post will probably stop any other responses, so all in all the question may as well be deleted.
ASKER CERTIFIED SOLUTION
Avatar of Keith Alabaster
Keith Alabaster
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Pau Lo

ASKER

'How do I judge the quality or effectiveness of Availability Management?'
Is probably much better wording on reflection
Firstly you will need to define several things, and personally I put these into my Service Catalogue which is included in our Enterprise Architecture documentation set and made available to our users.

1) 'What' is it that you are making available?
2) 'What' is the acceptable definition of available - from both your perspective and your users?
3) 'What' (if any) are the performance metrics that are tied to 'Available'?
4) 'What' exceptions/planned downtime/maintenance windows are allowed in a calendar month?

As examples for each to better describe what I mean I'll use a couple of variations. This MAY seem like waffle but everyone will have a different interpretation - especially if something becomes 'not available' - unless it is written down and communicated.

1) This could be a single internal application such as a CRM, a complete end-to-end web service such as Office 365 which uses multiple elements including Internet, firewalls plus a hosted solution, or it could be a hybrid of all of them.

2) Available can have different connotations - depending upon where you sit.... A server can be up and running perfectly, having 100% up-time but if the network connections have failed it is not available to users.  Similarly, on a service if one component has failed but all the other components are running as expected (or advertised), does this meet or fail expectations? Take Microsoftt exchange and all of its constituent parts.... Exchange is running perfectly but your ISP goes down temporarily so remote access to Outlook Web Access is inoperative. Would you (or your users) say the Exchange Service is now not available or just a constituant component?  

Pedantic? maybe, but can be a real difference financially if not set out in expectations.

3) Performance - we have all been caught out here if no definition is made. The 1GB Internet connection fails and you failover to the 5Mb ADSL braodband connetion. Access to web sites now takes 30 seconds. The system works - so it is available - but users say speed makes it unworkable and is therefore not available.....

4) Self evident and needs no real explanation except to confirm the obvious that anything declared can be taken off the Availability schedule without penalty.

We have several hundred entries in our Service Catalogue with each covering off the four points above. Actually we have a few additional entries which also include our own criteria (based upon the four at the beginning) which then tell us how much forward capacity (storage, network & Internet bandwidth, memory, CPU load, spare virtualisation hosts) we need to keep for each service we offer. In turn, this allows us to work out what the cost is for each service also.

We use system tools such as SCOM, Solarwinds, vCentral configured with those parameters to trigger alerts when they broke.

 The big one was resilience and wew arrived at three tiers:

a) Must be available at all times
b) Up within four hours
c) Whatever.........

For each service this allowed us to talk to the business and demonstrate the costs of each service based upon THEIR view of availability and a compromise was stuck in each case. Bottom line, we agreed what 'Availability' actually meant with the business users - based upon business value - and the cost of each category.

We judge the effectiveness of Availability Management by ensuring we do not over-provision costly IT Services, equipment and resources for business services that do not warrant them but those that do are sufficiently buffered to a level that will cover continuity until we can react to an issue.
Thanks :)