Solved

Drive in RAID 5 array in PowerEdge 2800 failed

Posted on 2010-11-11
9
1,351 Views
Last Modified: 2012-10-05
I have a disk in a RAID-5 array (1 of 4 disks) that is showing as failed on our PowerEdge 2800. We have a PERC 4e/Di RAID Controller that shows Physical Disks 0, 2 and 3 all online but 1 doesn't appear (this is the one blinking amber at me).

I assume the disk has failed and as a result I need to replace it. But I'm looking for a few instructions as to how this should be done.

Is it simple a case of powering down the server, removing the disc and then inserting the new one and all is well again or do I need to go into the RAID via Open Manage and tell it to rebuild that disk?

Any advice appreciated as always.

Thanx
0
Comment
Question by:Steven O'Neill
  • 2
  • 2
  • 2
  • +2
9 Comments
 
LVL 59

Accepted Solution

by:
Darius Ghassem earned 300 total points
ID: 34111484
You can keep the server online remove the disk then replace the disk. The RAID will rebuild itself once the drive it placed into the system.

0
 
LVL 23

Assisted Solution

by:jakethecatuk
jakethecatuk earned 100 total points
ID: 34111721
this may be stating the obvious - but you need to make sure that the drive you put in is the same model (speed, size, connector etc) as the one coming out. although you could put in a larger slower drive, doing that would have a severe impact on your raid array.

then as dariusg says...out with the old, in with the new and monitor the rebuild.
0
 
LVL 47

Assisted Solution

by:dlethe
dlethe earned 100 total points
ID: 34111831
I will just clarify something Jake said..
 - Assuming the replacement disk is QUALIFIED for this controller, and at least equal in capacity to the old drive, then there is nothing wrong with the disk being faster, as you will get an incremental performance gain (which you will likely only see on a benchmark), conversely, if it is slower, you will have an incremental performance hit.

 - Best practice, if you do NOT have the replacement drive now, is to take a full backup.  Not only do you have no protection against a drive failure, but even a bad block (WHICH YOU MAY HAVE RIGHT NOW) results in partial data loss.   The less I/O you do on this system while you wait, after backing up, the better.

 - Assuming all disks were bought at same time, consider they have all had the same I/O load, operating hours, environmentals, and were built in same manufacturing run.  It is not unusual for drive failures to be in groups, so buy 2 disks, and if you have slots, make one of them a hot spare.

0
 
LVL 2

Author Comment

by:Steven O'Neill
ID: 34112089
Hi guys

Thanx for all the advice. We are always backing up the servers here using Acronis Backup and Recovery SBS 10 (and Server 10 for the others) so I know the backups are ok and validated.

The drive we have is a Seagate Cheetah 146.9GB 10K U320 but they are not available right now so I've had to order a couple (yeah I already thot of that thanx) Seagate Cheetah 146.8GB 15K U320 disks. So slightly concerned about what jakethecatuk has said (as I didn't think it truly mattered).

So I assume there's nothing left for me to do but wait for the disk, remove the 'bad' one, insert the new one and let it rebuild (again I assume nothing needed from me).

I would also assume that the rebuild will hit the performance of the server as well? Would I simply use the OpenManage Server Administrator during the rebuild and am I best doing this out of hours (with now users around)?

Thanx again
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 23

Expert Comment

by:jakethecatuk
ID: 34112109
glad you are getting sorted.  my comment about size and speed only referenced slower drives.  dlethe expanded on that by confirming that faster and/or larger would not cause a problem.
0
 
LVL 32

Expert Comment

by:PowerEdgeTech
ID: 34113289
It's been stated, but let me emphasize, to save you headaches, that the replacement should never be done on a system with hot-swappable drives with the server powered off - especially when the drive has been used in a previous array.

That said, yes, the drive should begin to rebuild automatically and its progress can be monitored by OMSA.  If for some reason the rebuild doesn't happen automatically (within about 2 minutes), you can start it manually in OMSA.  

If the server is heavily used, you may consider rebuilding after hours, as there will be an amount of system resources dedicated to its rebuild.
0
 
LVL 47

Expert Comment

by:dlethe
ID: 34113326
That is why I mentioned faster, that is one of the nice things about storage, every year it gets faster, cheaper, better.  Yes, pop it in, but important .. make sure you get this from Dell or authorized distributor.  The firmware on the drives is a big deal.  You save money buying a vanilla disk, but it won't have proper configurable settings that deal with cache, XOR logic, error recovery timing .. so they put your data at risk.

There is a configurable setting on most of the controllers that lets you prioritize rebuild vs application I/O.  I wouldn't make it prioritize rebuild any higher than 25%, and if systems are relatively idle at night, then the rebuild will use all it can anyway.  If it is busy during night, then just make judgement call.

It will likely finish overnight if you shut system down and do the rebuild from the BIOS, and even if it has not finished, you can just boot the computer when you get in, and rebuild continues at the lower priority
0
 
LVL 32

Expert Comment

by:PowerEdgeTech
ID: 34113437
The default on the PERC 4 is 30% priority, but I wouldn't set it any higher (if you're thinking playing with it :), as I've seen 50% slow the server to a nearly unusable state.
0
 
LVL 2

Author Closing Comment

by:Steven O'Neill
ID: 34119169
Thanx again for all your info guys. Disks arrived this morning and once has been inserted to replace the problem disc and it has begun rebuilding as mentioned.

Just monitoring it now to make sure if rebuilds fully.
0

Featured Post

Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

Join & Write a Comment

I work for a company that primarily works with small businesses as their outsourced IT vendor. As such the majority of these customers utilize some version of Small Business Server. Due to the economics of running a small business, many of these cus…
More or less everybody in the IT market understands the basics of Networking, however when we start talking about Storage Networks, things get a bit dizzier, and this is where I would like to help.
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.
Here's a very brief overview of the methods PRTG Network Monitor (https://www.paessler.com/prtg) offers for monitoring bandwidth, to help you decide which methods you´d like to investigate in more detail.  The methods are covered in more detail in o…

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now