Solved

What happens when a RAID drive fails?

Posted on 2008-10-01
10
1,233 Views
Last Modified: 2012-06-27
I've never actually had a situation where a drive as part of a RAID array has failed in a server. So, suppose one of the drives in the RAID 1 mirror in one of my HP ProLaint ML110s (please don't comment about how the "decent ProLiants start at ML370" - this particular server is for a small network with limited funds) failed, what would happen?

Secondly, if a drive did fail, I assume I'd obviously need to use some tool to determine which drive has failed, and replace it. Would I need to initialize the new drive once it is installed? I am assuming it needs to be made a part of the appropriate RAID array, but after this stage, I assume the data will be rebuilt from the remaining drive?

We are talking hardware RAID here - configured in the RAID configuration pages available when the server boots through the BIOS screens.

Thanks!
0
Comment
Question by:tigermatt
10 Comments
 
LVL 7

Expert Comment

by:aboredman
ID: 22618680
I will make the assumption that the ml110 is not hot swap.

When you drive fail you will get a message in the windows event log (it is important to monitor it often). For hotswap machine you usually have an indicator on the drive.

You then need to shut down and replace the drive (if not hot swap)

Restart the machine and the raid array should take care of everything else.

BTW: You will probably see a warning message in the RAID controller post in the boot process if one drive is at the FAILED or PREDICTIVE FAILURE status.
0
 
LVL 17

Assisted Solution

by:Andres Perales
Andres Perales earned 125 total points
ID: 22618710
Tiger,
The best place would for you to call HP support and speak to them directly, every company has different guidance.
I use mostly dell, we we get a hard drive that fails that drive light turns to orange / red, we also get a noticfication in our open manage dells, server monitor systems that come with their server line.  Normally we just pull the bad drive and insert the new drive and the rebuild is automatic ( on certain servers) Other servers we have been told to shut down the server, pull the bad drive bring system back up and then it will start to rebuild automatically.  All the while getting light indications from the front of the servers as well as notifications and progress using open manage.
0
 
LVL 3

Expert Comment

by:omic_admin
ID: 22619304
abordman is right, but do pay attention to the failed drive! You don't want to take out the one that is working. non-hotswaps need to be shut down before you replace the drive, but it would be best to just restart first, go into the RAID bios, and determine the HD id, and verify the failed drive before replacing.
0
Use Case: Protecting a Hybrid Cloud Infrastructure

Microsoft Azure is rapidly becoming the norm in dynamic IT environments. This document describes the challenges that organizations face when protecting data in a hybrid cloud IT environment and presents a use case to demonstrate how Acronis Backup protects all data.

 
LVL 3

Assisted Solution

by:tempter
tempter earned 125 total points
ID: 22621015
Hey mate
Like the comments above and im sure you've read about RAID drives, you can only pretty much loose 1drive (in general). It becames very important to ensure that you replace that failed drive as soon as possible  so that you are not at risk of loosing the whole RAID drive and loosing all your data.

In your case, all your server hardware configured with RAID should have the RAID software installed on it. Yes you are right that you can access the RAID configuration when the server starts by pressing Fn keys but you can avoid rebooting server by having the RAID software installed. Dell has the DRAC Raid manager, IBM has ServeRAID manager, HP has Iits own,etc for every maker there is one.

You should install this software for the hardware you have and it can be setup also with a Hardware management software that can send you sms alerts, email or other method to notify you when something fails. This way you are better informed when something goes wrong or before catastrophy hits you. Its better to be safe than sorry.  

When you purchase h/w from the big boys (HP, IBM,etc) you get Hardware Management software called HP Insight Manager (for HP), IBM Director (for IBM) Dell OpenView, etc and can do the things mentioned above. You can remote control your server, power off back on again and other things to help you better manage your network.

Hope this helps
Morci
0
 
LVL 67

Accepted Solution

by:
sirbounty earned 250 total points
ID: 22623335
Hello tigermatt,

Just to clarify - in a RAID 1 (mirror) scenario, you can potentially lose more than one drive.  More accurately, you can lose one mirror.  If you have an array consisting of 3 drives mirrored to 3 other drives, you could lose all 3 in the mirror and the data would still be intact, yet no longer redundant, obviously.
With Compaq/HP equipment, if a drive has truly failed, the drive light will change from green to amber, indicating a hard failure.
However, the thing to watch out for is predictive failures.  Compaq diagnostics periodically run functional tests against your drives.  If any of those multiple tests fail, the drive would then be 'exptected' to fail at some point in the near future.  That's your indication to get ready to replace it, and properly schedule the maintenance after hours.

Also, I wouldn't rely on event logs to determine if you have a drive failure - use HP tools.  If you have a copy of smart start, or have otherwise installed the ACU (Array configuration utility), you can monitor information about both the logical and physical layout of your array from there.  Even better would be to utilize snmp monitoring, if you have that capability, so that you have an alert generated when a drive fails, so you don't have to be as proactive about manually monitoring the components...


Regards,

sirbounty
0
 
LVL 58

Author Comment

by:tigermatt
ID: 22626724
Thanks to all for your input. I have the "Easy Set-up CD" which came with the server when I bought it - would I found the RAID Array Tools on there?
0
 
LVL 58

Author Comment

by:tigermatt
ID: 22626729
Thanks to all for your input. I have the "Easy Set-up CD" which came with the server when I bought it - would I find the RAID Array Tools on there?
0
 
LVL 67

Expert Comment

by:sirbounty
ID: 22626757
See this PAQ: http://www.experts-exchange.com/Hardware/Servers/Q_23028271.html
Info on ACU & CIM/SIM for monitoring...
0
 
LVL 58

Author Closing Comment

by:tigermatt
ID: 31502154
Thanks guys. This definitely answers things about the actual notification side of things. Could you take a look at http://www.experts-exchange.com/Q_23788449.html for me - I have a few questions still about HP Insight Manager, a piece of software I definitely want to use for RAID monitoring of several of these servers.
0
 
LVL 58

Author Comment

by:tigermatt
ID: 22644199
Any ideas on http://www.experts-exchange.com/Q_23788449.html as per grading comment? Thanks!
0

Featured Post

MIM Survival Guide for Service Desk Managers

Major incidents can send mastered service desk processes into disorder. Systems and tools produce the data needed to resolve these incidents, but your challenge is getting that information to the right people fast. Check out the Survival Guide and begin bringing order to chaos.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

In this article we have discussed the manual scenarios to recover data from Windows 10 through some backup and recovery tools which are offered by it.
Storage devices are generally used to save the data or sometime transfer the data from one computer system to another system. However, sometimes user accidentally erased their important data from the Storage devices. Users have to know how data reco…
This tutorial will show how to configure a single USB drive with a separate folder for each day of the week. This will allow each of the backups to be kept separate preventing the previous day’s backup from being overwritten. The USB drive must be s…
This tutorial will walk an individual through setting the global and backup job media overwrite and protection periods in Backup Exec 2012. Log onto the Backup Exec Central Administration Server. Examine the services. If all or most of them are stop…

860 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question