• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 3318
  • Last Modified:

How to monitor Server Raid harddisk health status on 2008 R2 server?

In order to prepare for purchase new harddisk for replacement, i want to I want to monitor individual harddisk helath status in IBM Server X3650 M3 Raid 5.
But i can't find any S.M.A.R.T attributes information from the MegaRaid software.
Therefore, i tried install third party software (Arconis Disk Monitor, GSmartControl), but all failed to collect SMART data.

I am thinking should i need to enable SMART feactue from somewhere first?
Could someone tell me how can i do?
Thank you.

Microsoft Server 2008 R2
  • 2
  • 2
2 Solutions
No, it isn't that .. the megaraid API is quite brutal to code with and requires a developer's NDA with LSI to obtain it.  Futhermore, the degree of difficulty is complicated because of the need to install some additional drivers.

Bottom line, writing S.M.A.R.T. code for this controller set is a big job and few vendors are going to make the effort.  Our company does some one-off products for the megaraid, but it isn't anything we offer to end-users for even just any megaRAID controller and drive, and it isn't cheap.

The IBM tivoli product set will do this.
Why do you want to monitor the S.M.A.R.T. data manually? The MegaRaid controller monitors it for you and MSM will report to you if the disk is out of spec if you setup alerts.
There are many benefits to monitor these settings manually.  First and foremost it empowers you to understand if a HDD is in a degrading condition, but hasn't yet triggered the S.M.A.R.T. alert.    Consider if you already had a HDD failure, and one of the remaining disks is right on the threshold of triggering an alert. If that is the case, then one should prioritize backing up over a rebuild.

Or what if you simply want to see how many hours worth of cumulative usage each HDD has, or see if any registers are trending upwards so you have a predictive-predictive failure.  

We once had a customer who repositioned their servers at the top of some cheap wobbly racks and their performance dropped to a small fraction of what they had before.  Nothing was showing up in event logs, everything passed full diagnostics, yet performance was maybe 25% of what they were having in one system before the move.  By looking at the S.M.A.R.T. registers, we were able to determine root cause. Again, the drives were not triggering alerts, because there were no errors, just high number of retries.

It is like anything else, reading and understanding S.M.A.R.T. empowers the user to make informed decisions before a device triggers an alert.   Not having ability to acquire pre-failure data is like asking why do you want to have informational messages in event logs, because the system will give you a warning if there is something to worry about.
Since none of the server manufacturers offer the facility there is no real option but to rely on what they do provide.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Get 10% Off Your First Squarespace Website

Ready to showcase your work, publish content or promote your business online? With Squarespace’s award-winning templates and 24/7 customer service, getting started is simple. Head to Squarespace.com and use offer code ‘EXPERTS’ to get 10% off your first purchase.

  • 2
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now