Solved

Disk failure notification on Dell RAID

Posted on 2011-02-19
10
1,466 Views
Last Modified: 2012-05-11
I have a Dell Power Edge 2800 with a 3 disk RAID 5. It recently failed on reboot and I discovered that 2 of the 3 disks had failed. I assume that one of the disks in the array had failed some time ago and that the array was running in degraded mode. The second drive probably failed on reboot (after a power outage) and so the machine failed to boot.

I check the logs fairly regulary (bi-monthly?) and don't recall ever seeing any type of disk failure warnings in the log. Dell Open Manage was installed but I am not that familiar with it. I had assumded that it would create hardware related events in the logs but perhaps not.

Can someone clarify for me how hardware failure notifications can be handled on Dell Server hardware? I would like to be automatically notified of critical hardware failures if possible without having to install any 3rd party montitoring solutions.
0
Comment
Question by:pmckenna11
  • 3
  • 3
  • 2
  • +2
10 Comments
 
LVL 1

Accepted Solution

by:
michaelkovac earned 250 total points
ID: 34933897

Here's a good thread on the subject.
Open Manage will show you in detail what's going on but logging is dismal.
http://en.community.dell.com/support-forums/servers/f/177/t/19206983.aspx
0
 
LVL 6

Expert Comment

by:joe_massimino
ID: 34933919
Didn't your server flash a warning light/LED when something goes wrong? All of my Dell servers do at least that. The event viewer shows most all hardware failures, nothing that important is ever left for you to hunt for. So, your question baffles me.  
0
 
LVL 2

Author Comment

by:pmckenna11
ID: 34934034
Thanks for the link to the thread. I will check through it and see if any of the suggested fixes will work for me but it is probably time to look into something more robust then open manage.

The server is at a remote location so I am not able to check lights on the drives. Kind of lame if you have to physically look at a flashing light on the server to know there is a problem.

I am thinking that maybe both drives failed since I last checked the event logs on this server. Seems unlikely by otherwise how can you explain that there were no hardware failure notifications in the Windows logs? If anyone has a different explanation I  would love to hear it. I work hard to catch problems proactively and to lose a RAID 5 server from disk failure is quite embarassing!!
0
 
LVL 1

Expert Comment

by:michaelkovac
ID: 34934266

To Windows there was no drive failure because the Perc controller abstracts its RAID config from the OS in form of virtual disks. To the OS it looks just like a physical hard drive. If any element of that disk fails, the controller 'takes care' of the issue by rebuilding the RAID and/or if available using hot spares and rebuilding the array (not so RAID 0 which just fails). You should really always use a hot spare just so that you're not stuck in this kind of predicament. Raid 5 arrangements are notorious for total failure when a single drive fails (search Google and you will find some scary stats on that)
http://www.zdnet.com/blog/storage/why-raid-5-stops-working-in-2009/162

If you had setup a softraid in the OS you'd get messages in the Wndows system logs, but performance wouldn't be that great
0
 
LVL 2

Author Comment

by:pmckenna11
ID: 34935678
Interesting article on RAID 5. Doesn't really apply in this case because we are using 146G SCSI drives so the chance of a read error is small but still interesting. I will keep it in mind on other servers.

But still I am left with how does one know when a drive fails? Even with a hot spare backup (I am adding one to the rebuilt server as I type this) I stilll need to know that a piece of hardware has failed. I don't want to have to manually launch OpenManage to constantly check on storage health. I have been poking in the OpenManage interface but don't see any built in notification functionality (fixes were pointed to in an earlier message).

Also I get that windows thinks the virtual disk is fine but I thought OpenManage sent a notification of hardware problems that showed up in the logs. I guess not!!!!
0
Do email signature updates give you a headache?

Do you feel like all of your time is spent managing email signatures? Too busy to visit every user’s desk to make updates? Want high-quality HTML signatures on all devices, including on mobiles and Macs? Then, let Exclaimer solve all your email signature problems today!

 
LVL 1

Assisted Solution

by:MarkThomasLee
MarkThomasLee earned 250 total points
ID: 34936180
With Dell Servers, an event was flagged and is visible via Open Manage, however, there is usually a light on the front of the box that will flash amber when there is a hardware alert.  then of course the preboot process would also display an error that is visiable if the DELL preboot flash screen is disabled.  Usually though, on RAID or other Disc errors, the preboot will hang at the disc controller portion of the preboot process.  These notifications are outside the OS so there isn't a thread or mechanism other than Open Manage to gain access to these alerts - other than the flashing amber light on the front of the box. As far as Windows is concerned.  even though it's a 3 disc RAID Windows see's it as 1 physical drive. That's because the hardware handles the creation of the logical drive spread out over the 3 drives.  Long story short, it's a hardware array which windows is blind to. The only array's Windows is aware of are software arrays - ones that are created using disk management msc.

Sorry buddy, That really sucks.  I just went through the same thing at a new client's office - chart less medical office with no backup.  they are hurting.  Data recovery time.  CBL here we go!

I hope this info helps
M
0
 
LVL 1

Expert Comment

by:michaelkovac
ID: 34940094
Do you have any SNMP server for system management and control? Dell OpenManage has an SNMP agent which can respond to pings. Example:
http://www.tools4ever.com/products/monitormagic/policies/dell.asp
0
 
LVL 16

Expert Comment

by:Shaik M. Sajid
ID: 34941841
first symptom on the server if u have a phusucal look on it ... the led on the hdd become orrange instead of Green=healthy

0
 
LVL 1

Expert Comment

by:MarkThomasLee
ID: 34945137
Yes I do use SNMP for server management.  Our organization uses HP Business class workstations, and Proliant servers, so the SNMP events are caught through HP's Insight manager, or lightsout services if the box fails.  The other nice thing about HP's is in the BIOS there is a place to set up event notifications via sms or txt messaging or via email. This type of notification is in addition to SNMP and Windows Event Logging, it's done at the hardware level of the box.  Cool stuff.  I'ts saved offices from a complete melt down.  
0
 
LVL 2

Author Comment

by:pmckenna11
ID: 35044200
It appears that Openmanage needs to be used in conjunction with OpenManage Server Administrator Managed Node in order to get the reporting. From what I gather Server Administrator uses snmp info from local and remote servers and is capable of generating the desired alerts along with other functionality. I looked briefly at Server Administrator and may setup a box with it installed to monitor all my Dell servers.

Goes without saying that it is absolutely ridiculous that Openmangement does not have it's own alert functionality and that you have to go through all this hassle just to get an alert emailed to you (or use a work around as previously suggested)
0

Featured Post

What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

Some time ago I faced the need to use a uniform folder structure that spanned across numerous sites of an enterprise to be used as a common repository for the Software packages of the Configuration Manager 2007 infrastructure. Because the procedu…
Welcome to my series of short tips on migrations. Whilst based on Microsoft migrations the same principles can be applied to any type of migration. My first tip is around source server preparation. No migration is an easy migration, there is a…
Sending a Secure fax is easy with eFax Corporate (http://www.enterprise.efax.com). First, Just open a new email message.  In the To field, type your recipient's fax number @efaxsend.com. You can even send a secure international fax — just include t…
This video discusses moving either the default database or any database to a new volume.

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now