Availability auditing

pma111
pma111 used Ask the Experts™
on
I have been searching sometime for some form of checklist for when it comes to availability of file servers. I.e. when an auditor wants to go in and check for security issues there are tools and checklists galore available to them, but an availability/performance audit – nothing really out there. So if there was say a top 10 checks to do on checking availability/performance (or continuity options if the server fails/struggles) of a standard 2003/2008 file server are there any tools or auditors checklists you could point us in the direction of, I’d also love to know perhaps the top 5 reasons why such servers performance degrades or they go down? That may fall in with what to check.

Slightly off topic, but in terms of an audit of a server, aside from security + performance/availability, what other areas should, in your opinion be included in a review?
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®

Author

Commented:
Thanks thats more the audit process as opposed to technology/technical specifics on how to audit performance/availability of a 2003 server.

Author

Commented:
I cant see what security logging has got to availibility controls

Auditing and an audit/review arent the same thing
I may be misunderstanding what you're asking, but at any rate, here are my thoughts.

Typically, an SLA will describe the services provided, along with the roles of how each of these will be supported and maintained.  These will include quantifiable goals to assess performance, resolve issues, and any incentives/penalties, as well as costs.  From our point of view, we see this as the level of systems/servers, applications, and data availability, RPO, and RTO.  These are mostly relative to each customer/company, and their goals and business requirements.  This should in turn provide a way to objectively measure IT resources and their value.

Things we should consider.
Percentage of failures to deliver per month/day/hour, depending on the SLA requirements.
Quantifiable way to measure internal standards compliance.  Did we meet our availability goals or not?  Why or why not?  Was it because we didn’t meet compliance, or does ‘compliance’ need to be adjusted?
Availability measures.  What are our hours of operations, or between what hours do we promise xx% uptime?
Satisfaction.  Are we satisfied?  Can we do better?  How?
Responsiveness to an outage.  Did we meet the required SLA?
Cost.  Does the business need justify the cost of efforts to meet this SLA?

From here, metrics used to determine availability in your environment will depend upon internal SLA’s, costs, needs, available team skillsets, etc.  We use various tools in our environment from PRTG to scripts to determine uptime.  Perfmon is a great tool to gather baseline performance metrics.  We also measure errors in the event log, etc., and we use an internal ticketing system to keep track of ‘slow downs’ and other ‘availability’/’performance’ issues.

As for a specific checklist, it really comes down to what does your company require and what do you deem appropriate.  As previously stated, security auditing is really important.  Certain logs/events affect the CIA triangle, which directly affect SLA's, availability, and performance.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial