Avatar of benc007
benc007
Flag for United States of America asked on

RAID 5 with Three Hard Drives - Rebuild or Not Rebuild? HELP!

I have RAID 5 with Three Hard Drives.  This is a Dell PowerEdge 1750 with PERC 4/DI.

One of the drives is failing on an important server.   In RAID I see:

Array Disk 0:0   Failed    SCSI    ProductID   Rev0003   Seagate   136.63 GB capacity
Array Disk 0:1   Online    SCSI    ProductID   Rev0003   Seagate   136.63 GB capacity
Array Disk 0:2   Online    SCSI    ProductID   Rev0003   Seagate   136.63 GB capacity

I am not sure why the disk failed, but in Windows Server 2000 I see http://prntscr.com/evbuv6

Questions:
1) Should I rebuild the one failed drive, and then put it online?  
2) Or should I buy a new hard drive and hot swap it?
3) What's the worse that can happen?
DellRAIDStorage HardwareStorage

Avatar of undefined
Last Comment
benc007

8/22/2022 - Mon
SOLUTION
mbkitmgr

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
SOLUTION
Antzs

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
benc007

ASKER
mbkitmgr - RE: If your array is configured using the Dell Perc Adapater then you only need to remove the failed drive and fit the new.  The PERC will take care of the rest
Doesn't the new drive need to be have NO metadata on it?

Antzs - RE: If the disk already failed, rebuilding it would not be successful anyway.  It was still fail again.  
The disk failed after I rebooted the server.  How can I be sure it's a hardware fail and NOT a software corruption fail?

What's the worse that can happen if I rebuild the failed hard drive? See http://prntscr.com/evbuv6
SOLUTION
mbkitmgr

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
PowerEdgeTech

Rebuilding the failed drive will not necessarily be unsuccessful. There are more reasons for a drive to go offline than a bad drive. Especially on this server, if the firmware is out of date or if you are using a specific SCSI backplane cable, the drive may not be bad at all.

Best thing to do is replace the failed drive though ... swap it "hot" - do not power down to replace if the drives are hot-swappable. If it doesn't rebuild automatically, then you will need to assign it as a hot-spare to start the rebuild. If the new drive shows as "failed", then you must choose Rebuild. Do NOT force it online under any circumstances (in this situation).
Mal Osborne

I would choose option#2, assuming you can still get drives that old. Probably some on Ebay, but I would test them extensivly before implementation. At very least, run for a day, create a volume, do a surface scan with CHKDSK, then fill it with files to 100%, delete 20% of them and run a defrag. If it survives that, the drive is probably going to be OK.

The worst thing that could happen would be a second drive failure during the rebuild, and a complete loss of all data. This happens sometimes, rebuilding a RAID5 array thrashes the drives for a few hours. If a drive is close to failure, or the PSU is out of spec, or cooling blocked, or cables not terminated correctly, the rebuild itself can be enough to invoke a second failure.
Experts Exchange is like having an extremely knowledgeable team sitting and waiting for your call. Couldn't do my job half as well as I do without it!
James Murphy
benc007

ASKER
PowerEdgeTech - the harddrives are hot-swappable.

RE: Rebuilding the failed drive will not necessarily be unsuccessful. There are more reasons for a drive to go offline than a bad drive. Especially on this server, if the firmware is out of date or if you are using a specific SCSI backplane cable, the drive may not be bad at all.

Do you mean I may be able rebuild the failed drive?    How can I see if the drive went offline for another reason that a failed hardware issue BEFORE rebuilding?


Mal Osborne - RE: I would choose option#2, assuming you can still get drives that old. Probably some on Ebay, but I would test them extensivly before implementation. At very least, run for a day, create a volume, do a surface scan with CHKDSK, then fill it with files to 100%, delete 20% of them and run a defrag. If it survives that, the drive is probably going to be OK.

RE: The worst thing that could happen would be a second drive failure during the rebuild, and a complete loss of all data. This happens sometimes, rebuilding a RAID5 array thrashes the drives for a few hours. If a drive is close to failure, or the PSU is out of spec, or cooling blocked, or cables not terminated correctly, the rebuild itself can be enough to invoke a second failure.

How can rebuilding the failed drive cause a second drive to fail?  What do you mean the PSU is out of spec?

Fan #1 is down as well.  Can this cause this drive to fail?
Which is option 2?  Do you mean I should try to rebuild the failed drive, and after rebuliding run a CHKDSK and defrag on the entire array?
SOLUTION
Mal Osborne

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
PowerEdgeTech

Do you mean I may be able rebuild the failed drive?    How can I see if the drive went offline for another reason that a failed hardware issue BEFORE rebuilding?
You would need to analyze the controller log and run diagnostics on the drive. Yes, it is possible to rebuild a drive that is "failed" or showing offline, but it is best to simply replace the drive (without the research to determine why it failed, as most often the drive is bad).
serialband

136 GB SCSI?  Time to replace the server.  All 3 disks should be pretty old.  If you're replacing one now, the 2nd one may be ready to fail soon.  Back up your data.
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
benc007

ASKER
I bought a used drive that was taken from production and is supposed to be "fully tested and reformatted to NTFS for Windows OS".

1) If I hot swap this drive with the failed drive on my Window 2000 Server, while the server is running, will this drive rebuild without affecting the websites hosted on this server in real-time?  There are live users on these websites.

2) I have another PowerEdge 1750 that has 3 hard drives also set up in RAID 5.  If I remove these 3 drives, and put the SINGLE used replacement drive and power on the server, how can I test this used drive?  

After testing this used drive, when I put the 3 hard drives back in, will everything run the same as before ... or will I have to make some re-configurations?
mbkitmgr

Answers to your Questions
1 - Yes you have hardware raid that allow you to Hot swap drives.  That means that you can eject the failed drive, insert the new one in its place without having to power down the server.  The system is designed to allow this so that you don't have downtime.

2. If you make changes to the hardware config on the 2nd server, you run the risk of affecting the raid config on that server.  Alternatively you could
a) power down the 2nd server if its not production
b) look for a spare SAS port on the main board and connect the drive to it. or fit it to a spare bay on the array
c) power up the server and check what the drive reports to the HW diagnostics.

If this is that out of your depth, perhaps its time to get someone in who knows how to address the issue.  I dont mean to offend, but as an IT professional if its not my expertise I don't play with it
PowerEdgeTech

Never introduce a hot-swap drive by powering down the server. If it is hot-swap, swap it hot. THEN you can test it.
Experts Exchange has (a) saved my job multiple times, (b) saved me hours, days, and even weeks of work, and often (c) makes me look like a superhero! This place is MAGIC!
Walt Forbes
benc007

ASKER
RE: 1 - Yes you have hardware raid that allow you to Hot swap drives.  That means that you can eject the failed drive, insert the new one in its place without having to power down the server.  The system is designed to allow this so that you don't have downtime.
-> Yes this is true if the "used" replacement drive was really "fully tested and reformatted to NTFS for Windows OS".  If not, then DELL PERC will not rebuild automatically onto a foreign disk, right?

RE: 2. If you make changes to the hardware config on the 2nd server, you run the risk of affecting the raid config on that server.  Alternatively you could
a) power down the 2nd server if its not production
b) look for a spare SAS port on the main board and connect the drive to it. or fit it to a spare bay on the array
c) power up the server and check what the drive reports to the HW diagnostics.
-> This are SCSI drives and the PowerEdge 1750 doesn't have any SAS ports.
PowerEdgeTech

Yes this is true if the "used" replacement drive was really "fully tested and reformatted to NTFS for Windows OS".  If not, then DELL PERC will not rebuild automatically onto a foreign disk, right?
No, existing formatting doesn't matter at all. The PERC will ignore any data or formatting on the disk and format it for use in the array. The PERC will "probably" start rebuilding the disk automatically. There are several reasons why it might not rebuild automatically, but if it doesn't, you simply assign it as a hot-spare and the rebuild will start.
benc007

ASKER
RE: The PERC will "probably" start rebuilding the disk automatically. There are several reasons why it might not rebuild automatically, but if it doesn't, you simply assign it as a hot-spare and the rebuild will start.
-> To assign it as a hot-spare, do I go into the RAID setup when the server boots up and do this manually?  If I do this, and then continue booting up Windows, will the OS and websites run while the replaced drive rebuilds in the background?

Do you know how long it takes to rebuild 100+ GB (current usage) out of 147GB?
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
ASKER CERTIFIED SOLUTION
PowerEdgeTech

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
mbkitmgr

I'm bowing out, we've covered the pertinent points number of times.
andyalder

>I'm bowing out, we've covered the pertinent points number of times.

We covered the same points a week before in https://www.experts-exchange.com/questions/29012346/RAID-5-with-Three-Hard-Drives.html 
Unfortunately the server is far too old for them to still have the box it came in.
PowerEdgeTech

Ha ha ... that's funny ... all along I thought this was the same one :)
Your help has saved me hundreds of hours of internet surfing.
fblack61
benc007

ASKER
Previous question is about what happens to RAID 5 set up with 3 drives and using a different drive from another server.

This question is about rebuilding or not rebuilding drives in RAID 5 with the current hard drives.
benc007

ASKER
Thank you so much for your help!!!