Solved

How do other admins monitor random RAID controllers?

Posted on 2011-02-15
6
1,093 Views
Last Modified: 2012-05-11
I inherited one serious hodge-podge network with a dozen different brands, models and ages of servers.  Many of these have built in RAID controllers but there is no central monitoring utility.  Most don't even have email notification should a drive fail (kind of defeats the purpose, no?).  I was wondering what other people were doing in this scenario.  Is there some sort of central SMART montioring software that could give me a clue?  Is it common for a RAID controller to put anything in the SMART data to indicate a drive had failed?  Ultimately I'd like to configure Ops View (nagios NSClient++ based monitoring software) to be able to report on it but for now I'd simply settle for the warm fuzzy of knowing an email would be sent to me should I lose a disk somewhere.  Constructive Thoughts?  Comments?
0
Comment
Question by:sifugreg
6 Comments
 
LVL 47

Expert Comment

by:dlethe
Comment Utility
RAID controllers all have vendor-unique APIs.  Nothing on the planet monitors everything.  Also SMART software is pointless, as that is only good for physical devices.  Sure some RAID vendors use it, but they don't present it to the operating system w/o special software.

Because it cant ... by design, the O/S sees just one disk, as example if you have a RAID5. It presents the logical volume which shows that it is online, even if you have a drive failure.

Now there are some products out there that will drill inside a few 3rd party controllers ... so what do you have (and what O/S)?

Anyway, there are some SNMP packages but they only kick in once you have a program that knows how to drill into the RAID, that can then send off a SNMP alert.

What do people do in the real world?  Well, it costs less money and is ultimately better to junk as many controllers as possible, and standardize on a family that has a good mix of support and device types.   When you standardize, you pay more up front, but you save money in long run because you know to buy disks that are qualified, and no issues with device management because the vendor has something that works across the board.

That is why people standardize on things HP or Dell servers ... or they standardize on LSI controllers which pretty much run on all operating systems, and they have a wide range from RAID1/10 only under $100, to controllers that can handle hundreds of drives that cost thousands of dollars.

0
 
LVL 6

Expert Comment

by:dax_bad
Comment Utility
We're monitoring our HP/Dell/IBM hardware with WhatsUp Gold premium through SNMP. All you need is the MIB packages and have the HW vendors own monitoring software installed.  You can often extract the MIB packages from the vendor monitoring software, then use a MIB walker to identify the OID's and use Whatsup Gold (or any other monitring tool supporting SNMP monitoring) to check the state of the hardware. Allthough sometimes you have a lot of different models with different RAID controllers we found that it is not possible to create a fully generic monitor template, but you have 2 options. 1

1) non-generic - Split the templates into model specific with a template for each drive in the server, 1 for battery + one for Controller state.
2) Semi-generic - By looking at the overall state of the RAID controller (warning/failed state indicates a drive / battery error), this can be semi generic and we found we could cover all our 15 different HP models with around 4 templates (the instance still varies amongst the same models sometimes, guess it depends on the firmware level).

If you need some more details let me now, i can give you some OID's for atleast HP models and 95% of our serverfarm is HP (1000+)

Cheers
Daniel
0
 
LVL 55

Accepted Solution

by:
andyalder earned 125 total points
Comment Utility
Most controllers drivers will write something to the OS log if there is a disk problem so you can monitor that instead of monitoring the hardware.
0
What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

 
LVL 6

Expert Comment

by:dax_bad
Comment Utility
Andyalder: RAID controllers doesn't write disk accellerator battery failures to the event log, but yes usually disk failure's are written or at least can be set to do this through the vendor monitoring software. You would however still need a central monitoring tool to make any use of it in larger server farms.

Cheers
Daniel
0
 
LVL 55

Expert Comment

by:andyalder
Comment Utility
LSI and HP ones do, not sure about the other ones out there.
0
 
LVL 1

Author Closing Comment

by:sifugreg
Comment Utility
Not very sexy but until I can get them all replaced, I've created a custom filter in my monitoring process to alert me of multiple warnings or any critical messages sent to the System Event Log.  Don't know why I didn't think about that.
0

Featured Post

Comprehensive Backup Solutions for Microsoft

Acronis protects the complete Microsoft technology stack: Windows Server, Windows PC, laptop and Surface data; Microsoft business applications; Microsoft Hyper-V; Azure VMs; Microsoft Windows Server 2016; Microsoft Exchange 2016 and SQL Server 2016.

Join & Write a Comment

this article is a guided solution for most of the common server issues in server hardware tasks we are facing in our routine job works. the topics in the following article covered are, 1) dell hardware raidlevel (Perc) 2) adding HDD 3) how t…
VM backups can be lost due to a number of reasons: accidental backup deletion, backup file corruption, disk failure, lost or stolen hardware, malicious attack, or due to some other undesired and unpredicted event. Thus, having more than one copy of …
This tutorial will walk an individual through the steps necessary to enable the VMware\Hyper-V licensed feature of Backup Exec 2012. In addition, how to add a VMware server and configure a backup job. The first step is to acquire the necessary licen…
This tutorial will show how to configure a new Backup Exec 2012 server and move an existing database to that server with the use of the BEUtility. Install Backup Exec 2012 on the new server and apply all of the latest hotfixes and service packs. The…

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now