Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17


How do other admins monitor random RAID controllers?

Posted on 2011-02-15
Medium Priority
Last Modified: 2012-05-11
I inherited one serious hodge-podge network with a dozen different brands, models and ages of servers.  Many of these have built in RAID controllers but there is no central monitoring utility.  Most don't even have email notification should a drive fail (kind of defeats the purpose, no?).  I was wondering what other people were doing in this scenario.  Is there some sort of central SMART montioring software that could give me a clue?  Is it common for a RAID controller to put anything in the SMART data to indicate a drive had failed?  Ultimately I'd like to configure Ops View (nagios NSClient++ based monitoring software) to be able to report on it but for now I'd simply settle for the warm fuzzy of knowing an email would be sent to me should I lose a disk somewhere.  Constructive Thoughts?  Comments?
Question by:sifugreg
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
LVL 47

Expert Comment

ID: 34903486
RAID controllers all have vendor-unique APIs.  Nothing on the planet monitors everything.  Also SMART software is pointless, as that is only good for physical devices.  Sure some RAID vendors use it, but they don't present it to the operating system w/o special software.

Because it cant ... by design, the O/S sees just one disk, as example if you have a RAID5. It presents the logical volume which shows that it is online, even if you have a drive failure.

Now there are some products out there that will drill inside a few 3rd party controllers ... so what do you have (and what O/S)?

Anyway, there are some SNMP packages but they only kick in once you have a program that knows how to drill into the RAID, that can then send off a SNMP alert.

What do people do in the real world?  Well, it costs less money and is ultimately better to junk as many controllers as possible, and standardize on a family that has a good mix of support and device types.   When you standardize, you pay more up front, but you save money in long run because you know to buy disks that are qualified, and no issues with device management because the vendor has something that works across the board.

That is why people standardize on things HP or Dell servers ... or they standardize on LSI controllers which pretty much run on all operating systems, and they have a wide range from RAID1/10 only under $100, to controllers that can handle hundreds of drives that cost thousands of dollars.


Expert Comment

ID: 34904330
We're monitoring our HP/Dell/IBM hardware with WhatsUp Gold premium through SNMP. All you need is the MIB packages and have the HW vendors own monitoring software installed.  You can often extract the MIB packages from the vendor monitoring software, then use a MIB walker to identify the OID's and use Whatsup Gold (or any other monitring tool supporting SNMP monitoring) to check the state of the hardware. Allthough sometimes you have a lot of different models with different RAID controllers we found that it is not possible to create a fully generic monitor template, but you have 2 options. 1

1) non-generic - Split the templates into model specific with a template for each drive in the server, 1 for battery + one for Controller state.
2) Semi-generic - By looking at the overall state of the RAID controller (warning/failed state indicates a drive / battery error), this can be semi generic and we found we could cover all our 15 different HP models with around 4 templates (the instance still varies amongst the same models sometimes, guess it depends on the firmware level).

If you need some more details let me now, i can give you some OID's for atleast HP models and 95% of our serverfarm is HP (1000+)

LVL 56

Accepted Solution

andyalder earned 500 total points
ID: 34905085
Most controllers drivers will write something to the OS log if there is a disk problem so you can monitor that instead of monitoring the hardware.
Connect further...control easier

With the ATEN CE624, you can now enjoy a high-quality visual experience powered by HDBaseT technology and the convenience of a single Cat6 cable to transmit uncompressed video with zero latency and multi-streaming for dual-view applications where remote access is required.


Expert Comment

ID: 34905106
Andyalder: RAID controllers doesn't write disk accellerator battery failures to the event log, but yes usually disk failure's are written or at least can be set to do this through the vendor monitoring software. You would however still need a central monitoring tool to make any use of it in larger server farms.

LVL 56

Expert Comment

ID: 34905223
LSI and HP ones do, not sure about the other ones out there.

Author Closing Comment

ID: 34943949
Not very sexy but until I can get them all replaced, I've created a custom filter in my monitoring process to alert me of multiple warnings or any critical messages sent to the System Event Log.  Don't know why I didn't think about that.

Featured Post

Enroll in September's Course of the Month

This month’s featured course covers 16 hours of training in installation, management, and deployment of VMware vSphere virtualization environments. It's free for Premium Members, Team Accounts, and Qualified Experts!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Many businesses neglect disaster recovery and treat it as an after-thought. I can tell you first hand that data will be lost, hard drives die, servers will be hacked, and careless (or malicious) employees can ruin your data.
The main intent of this article is to make you aware of ‘Exchange fail to mount’ error, its effects, causes, and solution.
In this Micro Tutorial viewers will learn how to use Boot Corrector from Paragon Rescue Kit Free to identify and fix the boot problems of Windows 7/8/2012R2 etc. As an example is used Windows 2012R2 which lost its active partition flag (often happen…
This tutorial will walk an individual through the steps necessary to enable the VMware\Hyper-V licensed feature of Backup Exec 2012. In addition, how to add a VMware server and configure a backup job. The first step is to acquire the necessary licen…

670 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question