RAID 5 with Three Hard Drives - Rebuild or Not Rebuild? HELP!

I have RAID 5 with Three Hard Drives.  This is a Dell PowerEdge 1750 with PERC 4/DI.

One of the drives is failing on an important server.   In RAID I see:

Array Disk 0:0   Failed    SCSI    ProductID   Rev0003   Seagate   136.63 GB capacity
Array Disk 0:1   Online    SCSI    ProductID   Rev0003   Seagate   136.63 GB capacity
Array Disk 0:2   Online    SCSI    ProductID   Rev0003   Seagate   136.63 GB capacity

I am not sure why the disk failed, but in Windows Server 2000 I see http://prntscr.com/evbuv6

Questions:
1) Should I rebuild the one failed drive, and then put it online?  
2) Or should I buy a new hard drive and hot swap it?
3) What's the worse that can happen?
benc007Asked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

mbkitmgrCommented:
If your array is configured using the Dell Perc Adapater then you only need to remove the failed drive and fit the new.  The PERC will take care of the rest
1
AntzsInfrastructure ServicesCommented:
If the disk already failed, rebuilding it would not be successful anyway.  It was still fail again.  

Like mbkitmgr say, just replace the physical disk and it should rebuild the array.
1
benc007Author Commented:
mbkitmgr - RE: If your array is configured using the Dell Perc Adapater then you only need to remove the failed drive and fit the new.  The PERC will take care of the rest
Doesn't the new drive need to be have NO metadata on it?

Antzs - RE: If the disk already failed, rebuilding it would not be successful anyway.  It was still fail again.  
The disk failed after I rebooted the server.  How can I be sure it's a hardware fail and NOT a software corruption fail?

What's the worse that can happen if I rebuild the failed hard drive? See http://prntscr.com/evbuv6
0
The Five Tenets of the Most Secure Backup

Data loss can hit a business in any number of ways. In reality, companies should expect to lose data at some point. The challenge is having a plan to recover from such an event.

mbkitmgrCommented:
Hi Benc007

No the array will ignore the file structure on the new drive.  When the new drive is inserted the Controller checks the new drive then commences the rebuild based on the contents of the other two.  That's how an array works
1
PowerEdgeTechIT ConsultantCommented:
Rebuilding the failed drive will not necessarily be unsuccessful. There are more reasons for a drive to go offline than a bad drive. Especially on this server, if the firmware is out of date or if you are using a specific SCSI backplane cable, the drive may not be bad at all.

Best thing to do is replace the failed drive though ... swap it "hot" - do not power down to replace if the drives are hot-swappable. If it doesn't rebuild automatically, then you will need to assign it as a hot-spare to start the rebuild. If the new drive shows as "failed", then you must choose Rebuild. Do NOT force it online under any circumstances (in this situation).
1
Mal OsborneAlpha GeekCommented:
I would choose option#2, assuming you can still get drives that old. Probably some on Ebay, but I would test them extensivly before implementation. At very least, run for a day, create a volume, do a surface scan with CHKDSK, then fill it with files to 100%, delete 20% of them and run a defrag. If it survives that, the drive is probably going to be OK.

The worst thing that could happen would be a second drive failure during the rebuild, and a complete loss of all data. This happens sometimes, rebuilding a RAID5 array thrashes the drives for a few hours. If a drive is close to failure, or the PSU is out of spec, or cooling blocked, or cables not terminated correctly, the rebuild itself can be enough to invoke a second failure.
0
benc007Author Commented:
PowerEdgeTech - the harddrives are hot-swappable.

RE: Rebuilding the failed drive will not necessarily be unsuccessful. There are more reasons for a drive to go offline than a bad drive. Especially on this server, if the firmware is out of date or if you are using a specific SCSI backplane cable, the drive may not be bad at all.

Do you mean I may be able rebuild the failed drive?    How can I see if the drive went offline for another reason that a failed hardware issue BEFORE rebuilding?


Mal Osborne - RE: I would choose option#2, assuming you can still get drives that old. Probably some on Ebay, but I would test them extensivly before implementation. At very least, run for a day, create a volume, do a surface scan with CHKDSK, then fill it with files to 100%, delete 20% of them and run a defrag. If it survives that, the drive is probably going to be OK.

RE: The worst thing that could happen would be a second drive failure during the rebuild, and a complete loss of all data. This happens sometimes, rebuilding a RAID5 array thrashes the drives for a few hours. If a drive is close to failure, or the PSU is out of spec, or cooling blocked, or cables not terminated correctly, the rebuild itself can be enough to invoke a second failure.

How can rebuilding the failed drive cause a second drive to fail?  What do you mean the PSU is out of spec?

Fan #1 is down as well.  Can this cause this drive to fail?
Which is option 2?  Do you mean I should try to rebuild the failed drive, and after rebuliding run a CHKDSK and defrag on the entire array?
0
Mal OsborneAlpha GeekCommented:
"How can rebuilding the failed drive cause a second drive to fail?  What do you mean the PSU is out of spec?"

A rebuild requires each drive to read every sector. It means a lot of activity for a while. More activity, means more chance of failure. I have seen several RAID5 arrays fail to rebuild due to a second drive failure.

Sometimes electrolytic capacitors in power supplies leak, and "dry out". This typically causes the power rail to gradually increase in electrical "noise", until devices randomly begin to fail. If it is the 12v rail, then often hard drives will start throwing up errors. This shows as a server with a really high rate of drive failures, with drives working fine in other machines, and the problem vanishing once a power supply is replaced. Typically, this occurs in old servers.

A failed cooling fan may mean drives and power supplies are not properly cooled, invoking failures. Most servers have redundant fans however, so generally a single failure will not be a big issue. A single cooling fan failure combined with a spectacular amount of dust might cause issues, and the server may cook if a second fan fails. I would be replacng the fan ASAP.

My suggestion of running CHKDSK, then filling the drive with files is just as simple "burn in" diagnostic; if the drive has problems, they would probably show themselves then. For second hand or "recertified" drives, this is a reasonable, cautious step.
1
PowerEdgeTechIT ConsultantCommented:
Do you mean I may be able rebuild the failed drive?    How can I see if the drive went offline for another reason that a failed hardware issue BEFORE rebuilding?
You would need to analyze the controller log and run diagnostics on the drive. Yes, it is possible to rebuild a drive that is "failed" or showing offline, but it is best to simply replace the drive (without the research to determine why it failed, as most often the drive is bad).
1
serialbandCommented:
136 GB SCSI?  Time to replace the server.  All 3 disks should be pretty old.  If you're replacing one now, the 2nd one may be ready to fail soon.  Back up your data.
0
benc007Author Commented:
I bought a used drive that was taken from production and is supposed to be "fully tested and reformatted to NTFS for Windows OS".

1) If I hot swap this drive with the failed drive on my Window 2000 Server, while the server is running, will this drive rebuild without affecting the websites hosted on this server in real-time?  There are live users on these websites.

2) I have another PowerEdge 1750 that has 3 hard drives also set up in RAID 5.  If I remove these 3 drives, and put the SINGLE used replacement drive and power on the server, how can I test this used drive?  

After testing this used drive, when I put the 3 hard drives back in, will everything run the same as before ... or will I have to make some re-configurations?
0
mbkitmgrCommented:
Answers to your Questions
1 - Yes you have hardware raid that allow you to Hot swap drives.  That means that you can eject the failed drive, insert the new one in its place without having to power down the server.  The system is designed to allow this so that you don't have downtime.

2. If you make changes to the hardware config on the 2nd server, you run the risk of affecting the raid config on that server.  Alternatively you could
a) power down the 2nd server if its not production
b) look for a spare SAS port on the main board and connect the drive to it. or fit it to a spare bay on the array
c) power up the server and check what the drive reports to the HW diagnostics.

If this is that out of your depth, perhaps its time to get someone in who knows how to address the issue.  I dont mean to offend, but as an IT professional if its not my expertise I don't play with it
0
PowerEdgeTechIT ConsultantCommented:
Never introduce a hot-swap drive by powering down the server. If it is hot-swap, swap it hot. THEN you can test it.
0
benc007Author Commented:
RE: 1 - Yes you have hardware raid that allow you to Hot swap drives.  That means that you can eject the failed drive, insert the new one in its place without having to power down the server.  The system is designed to allow this so that you don't have downtime.
-> Yes this is true if the "used" replacement drive was really "fully tested and reformatted to NTFS for Windows OS".  If not, then DELL PERC will not rebuild automatically onto a foreign disk, right?

RE: 2. If you make changes to the hardware config on the 2nd server, you run the risk of affecting the raid config on that server.  Alternatively you could
a) power down the 2nd server if its not production
b) look for a spare SAS port on the main board and connect the drive to it. or fit it to a spare bay on the array
c) power up the server and check what the drive reports to the HW diagnostics.
-> This are SCSI drives and the PowerEdge 1750 doesn't have any SAS ports.
0
PowerEdgeTechIT ConsultantCommented:
Yes this is true if the "used" replacement drive was really "fully tested and reformatted to NTFS for Windows OS".  If not, then DELL PERC will not rebuild automatically onto a foreign disk, right?
No, existing formatting doesn't matter at all. The PERC will ignore any data or formatting on the disk and format it for use in the array. The PERC will "probably" start rebuilding the disk automatically. There are several reasons why it might not rebuild automatically, but if it doesn't, you simply assign it as a hot-spare and the rebuild will start.
1
benc007Author Commented:
RE: The PERC will "probably" start rebuilding the disk automatically. There are several reasons why it might not rebuild automatically, but if it doesn't, you simply assign it as a hot-spare and the rebuild will start.
-> To assign it as a hot-spare, do I go into the RAID setup when the server boots up and do this manually?  If I do this, and then continue booting up Windows, will the OS and websites run while the replaced drive rebuilds in the background?

Do you know how long it takes to rebuild 100+ GB (current usage) out of 147GB?
0
PowerEdgeTechIT ConsultantCommented:
No, this can be done "live" from the OS using OpenManage - Storage, Virtual Disks, Assign Hot-Spare from dropdown menu of Available Tasks for the RAID array.

Amount of used space is not relevant, and time to rebuild can vary based on a lot of things - maybe an hour or two, maybe less, maybe more.

The rebuild will happen in the background while the system is live, regardless of whether you start the rebuild from the BIOS or from the OS, but starting it from the OS is how it should be done.
1

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
mbkitmgrCommented:
I'm bowing out, we've covered the pertinent points number of times.
1
andyalderSaggar maker's framemakerCommented:
>I'm bowing out, we've covered the pertinent points number of times.

We covered the same points a week before in https://www.experts-exchange.com/questions/29012346/RAID-5-with-Three-Hard-Drives.html 
Unfortunately the server is far too old for them to still have the box it came in.
0
PowerEdgeTechIT ConsultantCommented:
Ha ha ... that's funny ... all along I thought this was the same one :)
0
benc007Author Commented:
Previous question is about what happens to RAID 5 set up with 3 drives and using a different drive from another server.

This question is about rebuilding or not rebuilding drives in RAID 5 with the current hard drives.
0
benc007Author Commented:
Thank you so much for your help!!!
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Dell

From novice to tech pro — start learning today.