We help IT Professionals succeed at work.

My SATA Array is always rebuilding (RAID 5)

mob_dream
mob_dream used Ask the Experts™
on
Hello,

I have two partitions in my Windows SBS 2008.

(C Drive) Which is RAID 1 + 1  spare HDD "Array A"

(D Drive) Which is RAID 5 (4 SATA HDDs) "Array B"

The problem is that as you can see in attached Bay 4 from Array B is always rebuilding. Some times it looks OK and no warnings. Other times it gives me this.

I tried to take an Image using Acronis True Image but it took a very long time. Also when I restored the image to an internal 3.5 SATA HDD 2TB, everything went OK except Exchange database didn't mount.

Please help as much as you can I'm really worried about the data.


Thank you from the bottom of my hear in advance,
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®

Author

Commented:
Here is the attachment
1.jpg
2.jpg
Most Valuable Expert 2015
Commented:
If it is always the same disk that needs to rebuild, change it. Also make sure you are using enterprise grade disks that are built for RAID, and not consumer disks.

Make the image when the OS and Exchange aren't running (boot from the image tool's CD or DVD). And naturally imaging 1.5 TB takes a long time. It'll also take even longer if the array is rebuilding, as that slows the system down.

Author

Commented:
Thanks rindi for your kind reply. Well yes it is the same disk that is always rebuilding and by the way we are using normal 2.5 sata disk  (that is been used for laptops) I don't know I found it like this when I handled this IT department.

Regarding changing the drive. Is it just unplug the old one while system is down and plug the replacement (same model and capacity of course). I just want to make sure nothing will lost.

I made the image while windows was shutdown and I don't know why it didn't work. And yes it took a very long time (around 27 Hours)

Regards,
Most Valuable Expert 2015
Commented:
Check the manual of your server and of your Smartarray controller. Usually the controllers and disks are hot-pluggable, and if that is the case with your hardware all you should do is remove the bad disk while the server is up and running, and then just insert the new one. That should automatically rebuild the array to the new disk. Normally you shouldn't do a shutdown for this. If your server has enough bays I'd also consider adding an extra disk as hot spare to the RAID 5 array. I think that's worth it.

How did you do the backup? was the disk you backed up to attached via USB? USB2 is rather slow and things take long if that was the case.
DavidPresident
Top Expert 2010

Commented:
Are you using enterprise class SATA disks, or the consumer/desktop drives?   If it is the latter, then problem is due to TLER, which means that your disks simply aren't compatible with the controller firmware.

Author

Commented:
Well rindi I'm going to shutdown the server for the best. I'm not sure if this Server supported and I'm not going to risk the data I have seen enough.

Thank you so much.

Author

Commented:
We are using normal disks instead of enterprise ones. I think you are right there's no compatibility between HDDs and controller.
Most Valuable Expert 2015

Commented:
Then get enterprise class disks, and replace your current ones, or you'll always run into issues. Normally if you order them directly from HP for that server you'll get the correct ones.
DavidPresident
Top Expert 2010
Commented:
That is root cause. The desktop drives are unsuitable. Google TLER and you will see why.  Replace the drives or go with RAID1 and you "MAY" be OK.  Not only that, but if they are WD green, blue, or black drives, then RAID5 voids their warranty!

Author

Commented:
You are right but how to change the current Arra from Raid 5 to Raid 1.

I don't want to make my server offline for a long time. You know many people will be useless and headache will come.
DavidPresident
Top Expert 2010
Commented:
You don't.  That is the penalty for building an unsupported configuration. You are going to spend some money and move things around and go through a painful cold metal backup/restore.

DON'T take a shortcut and replace drives one at a time.  This risks data further because if you have to do deep recovery on any of the surviving disks then you will lose the entire array because the controller will fail the disk in an already degraded array.
You can try the Acronis again,but you should dismount the store before taking the image.

Which version of Acronis?

Actually ,if you have SBS 2008,it has an image based backup that you can use to restore to another disk.

That's your best bet.
Top Expert 2014
Commented:
How can you tell that they are normal laptop drives? There's not really that much difference to see from the outside. If you select hardware view in the ACU and then highlight a disk and click more information button you can get the part number and look it up.

It's quite possible that there is a hard read fault on one of the other disks, that would cause the controller to repeatedly try to read it during rebuild and just stall at some percentage or restart from the beginning. There's controller firmware update that modifies the rebuild process so it skips over bad blocks and carries on with the rest but of course that leads to a hole in your data.

If you go to the array diagnostic utility you can get a report and upload it here, use attach file same as you did with the screenshots, it's mainly readable text but it's huge. The error log of each disk is in it so you can see if another one is really bad.

Also under the systems management homepage you can get the hardware log from the motherboard, that'll have each error chronologically so something like failed, replaced, failed, replaced, failed, replaced will show up whereas the ADU report is nore of a current snapshot without history.

Author

Commented:
Thank you so much I really wish I can have more that 500 points, I swear of my god you deserve more.

I simply replaced the bad disk and installed a new one, turned on the machine and it started rebuilding. Now rebuild is complete and things look fine.

From the bottom of my heart thank you everyone.