How do other admins monitor random RAID controllers?

Posted on 2011-02-15
Last Modified: 2012-05-11
I inherited one serious hodge-podge network with a dozen different brands, models and ages of servers.  Many of these have built in RAID controllers but there is no central monitoring utility.  Most don't even have email notification should a drive fail (kind of defeats the purpose, no?).  I was wondering what other people were doing in this scenario.  Is there some sort of central SMART montioring software that could give me a clue?  Is it common for a RAID controller to put anything in the SMART data to indicate a drive had failed?  Ultimately I'd like to configure Ops View (nagios NSClient++ based monitoring software) to be able to report on it but for now I'd simply settle for the warm fuzzy of knowing an email would be sent to me should I lose a disk somewhere.  Constructive Thoughts?  Comments?
Question by:sifugreg
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
LVL 47

Expert Comment

ID: 34903486
RAID controllers all have vendor-unique APIs.  Nothing on the planet monitors everything.  Also SMART software is pointless, as that is only good for physical devices.  Sure some RAID vendors use it, but they don't present it to the operating system w/o special software.

Because it cant ... by design, the O/S sees just one disk, as example if you have a RAID5. It presents the logical volume which shows that it is online, even if you have a drive failure.

Now there are some products out there that will drill inside a few 3rd party controllers ... so what do you have (and what O/S)?

Anyway, there are some SNMP packages but they only kick in once you have a program that knows how to drill into the RAID, that can then send off a SNMP alert.

What do people do in the real world?  Well, it costs less money and is ultimately better to junk as many controllers as possible, and standardize on a family that has a good mix of support and device types.   When you standardize, you pay more up front, but you save money in long run because you know to buy disks that are qualified, and no issues with device management because the vendor has something that works across the board.

That is why people standardize on things HP or Dell servers ... or they standardize on LSI controllers which pretty much run on all operating systems, and they have a wide range from RAID1/10 only under $100, to controllers that can handle hundreds of drives that cost thousands of dollars.


Expert Comment

ID: 34904330
We're monitoring our HP/Dell/IBM hardware with WhatsUp Gold premium through SNMP. All you need is the MIB packages and have the HW vendors own monitoring software installed.  You can often extract the MIB packages from the vendor monitoring software, then use a MIB walker to identify the OID's and use Whatsup Gold (or any other monitring tool supporting SNMP monitoring) to check the state of the hardware. Allthough sometimes you have a lot of different models with different RAID controllers we found that it is not possible to create a fully generic monitor template, but you have 2 options. 1

1) non-generic - Split the templates into model specific with a template for each drive in the server, 1 for battery + one for Controller state.
2) Semi-generic - By looking at the overall state of the RAID controller (warning/failed state indicates a drive / battery error), this can be semi generic and we found we could cover all our 15 different HP models with around 4 templates (the instance still varies amongst the same models sometimes, guess it depends on the firmware level).

If you need some more details let me now, i can give you some OID's for atleast HP models and 95% of our serverfarm is HP (1000+)

LVL 55

Accepted Solution

andyalder earned 125 total points
ID: 34905085
Most controllers drivers will write something to the OS log if there is a disk problem so you can monitor that instead of monitoring the hardware.
Create the perfect environment for any meeting

You might have a modern environment with all sorts of high-tech equipment, but what makes it worthwhile is how you seamlessly bring together the presentation with audio, video and lighting. The ATEN Control System provides integrated control and system automation.


Expert Comment

ID: 34905106
Andyalder: RAID controllers doesn't write disk accellerator battery failures to the event log, but yes usually disk failure's are written or at least can be set to do this through the vendor monitoring software. You would however still need a central monitoring tool to make any use of it in larger server farms.

LVL 55

Expert Comment

ID: 34905223
LSI and HP ones do, not sure about the other ones out there.

Author Closing Comment

ID: 34943949
Not very sexy but until I can get them all replaced, I've created a custom filter in my monitoring process to alert me of multiple warnings or any critical messages sent to the System Event Log.  Don't know why I didn't think about that.

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
VMotion Direct Attached Storage 9 156
Is BackBlaze B2 Cloud Storage pricing too good to be true? 2 68
Archiving/Deleting Old E-Mail boxes 4 57
Data center, now-a-days, is referred as the home of all the advanced technologies. In-fact, most of the businesses are now establishing their entire organizational structure around the IT capabilities.
Finding original email is quite difficult due to their duplicates. From this article, you will come to know why multiple duplicates of same emails appear and how to delete duplicate emails from Outlook securely and instantly while vital emails remai…
In this Micro Tutorial viewers will learn how to restore their server from Bare Metal Backup image created with Windows Server Backup feature. As an example Windows 2012R2 is used.
To efficiently enable the rotation of USB drives for backups, storage pools need to be created. This way no matter which USB drive is installed, the backups will successfully write without any administrative intervention. Multiple USB devices need t…

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question