Link to home
Start Free TrialLog in
Avatar of Steven O'Neill
Steven O'NeillFlag for United Kingdom of Great Britain and Northern Ireland

asked on

Drive in RAID 5 array in PowerEdge 2800 failed

I have a disk in a RAID-5 array (1 of 4 disks) that is showing as failed on our PowerEdge 2800. We have a PERC 4e/Di RAID Controller that shows Physical Disks 0, 2 and 3 all online but 1 doesn't appear (this is the one blinking amber at me).

I assume the disk has failed and as a result I need to replace it. But I'm looking for a few instructions as to how this should be done.

Is it simple a case of powering down the server, removing the disc and then inserting the new one and all is well again or do I need to go into the RAID via Open Manage and tell it to rebuild that disk?

Any advice appreciated as always.

Thanx
ASKER CERTIFIED SOLUTION
Avatar of Darius Ghassem
Darius Ghassem
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Steven O'Neill

ASKER

Hi guys

Thanx for all the advice. We are always backing up the servers here using Acronis Backup and Recovery SBS 10 (and Server 10 for the others) so I know the backups are ok and validated.

The drive we have is a Seagate Cheetah 146.9GB 10K U320 but they are not available right now so I've had to order a couple (yeah I already thot of that thanx) Seagate Cheetah 146.8GB 15K U320 disks. So slightly concerned about what jakethecatuk has said (as I didn't think it truly mattered).

So I assume there's nothing left for me to do but wait for the disk, remove the 'bad' one, insert the new one and let it rebuild (again I assume nothing needed from me).

I would also assume that the rebuild will hit the performance of the server as well? Would I simply use the OpenManage Server Administrator during the rebuild and am I best doing this out of hours (with now users around)?

Thanx again
glad you are getting sorted.  my comment about size and speed only referenced slower drives.  dlethe expanded on that by confirming that faster and/or larger would not cause a problem.
It's been stated, but let me emphasize, to save you headaches, that the replacement should never be done on a system with hot-swappable drives with the server powered off - especially when the drive has been used in a previous array.

That said, yes, the drive should begin to rebuild automatically and its progress can be monitored by OMSA.  If for some reason the rebuild doesn't happen automatically (within about 2 minutes), you can start it manually in OMSA.  

If the server is heavily used, you may consider rebuilding after hours, as there will be an amount of system resources dedicated to its rebuild.
That is why I mentioned faster, that is one of the nice things about storage, every year it gets faster, cheaper, better.  Yes, pop it in, but important .. make sure you get this from Dell or authorized distributor.  The firmware on the drives is a big deal.  You save money buying a vanilla disk, but it won't have proper configurable settings that deal with cache, XOR logic, error recovery timing .. so they put your data at risk.

There is a configurable setting on most of the controllers that lets you prioritize rebuild vs application I/O.  I wouldn't make it prioritize rebuild any higher than 25%, and if systems are relatively idle at night, then the rebuild will use all it can anyway.  If it is busy during night, then just make judgement call.

It will likely finish overnight if you shut system down and do the rebuild from the BIOS, and even if it has not finished, you can just boot the computer when you get in, and rebuild continues at the lower priority
The default on the PERC 4 is 30% priority, but I wouldn't set it any higher (if you're thinking playing with it :), as I've seen 50% slow the server to a nearly unusable state.
Thanx again for all your info guys. Disks arrived this morning and once has been inserted to replace the problem disc and it has begun rebuilding as mentioned.

Just monitoring it now to make sure if rebuilds fully.