We help IT Professionals succeed at work.

Replacement of damaged SAS Drive, RAID 5 in Dell PowerEdge R720

2,442 Views
Last Modified: 2017-04-07
Hi,

I have a Dell PowerEdge R720 with 3 600 GB SAS HD's RAID 5 in a PERC card and this morning I saw the orange alert light and in the display says that "there is an error in DIsk 0, check Drive" (not sure how much time the error is because I rarely enter to the site) so, I have a spare 500 GB SAS drive and I have some questions:

- Since the message in the display says: "check drive" (doesn´t says literally "replace the disk") it means there is something to do in order to fix the disk or it is inevitable to replace it?
- If there's nothing else to do to fix the disk, may I replace the 500GB HD in a 2 HD 600GB array?
- If yes, may I do that with the server running in a Windows session or I have to power off the server?
- If I have to power off the server, do I have to boot with the PERC software and put online the disk and rebuild the array?
- If yes, takes many time to rebuild the array? (I ask this to plan how much time the server will be offline and warn users) the total array space with the 3 600GB HD's was 1TB and there are 450GB used.

Thanks.
Comment
Watch Question

Network & Systems Administrator
CERTIFIED EXPERT
Commented:
This problem has been solved!
(Unlock this solution with a 7-day Free Trial)
UNLOCK SOLUTION
Daniel Flores OlmosInfrastructure and Support Engineer

Author

Commented:
Forgot to mention the error code in display: "PDR1101 Fault detected on Drive 0. Check drive." it seems its like "bad connection" of the HD. The manual description says: "The controller detected a drive removal. If unintended, verify drive installation. Remove and reseat the indicated disk. If the problem persists, contact technical support." I'll do that but I don´t want to touch anything until the backup finish.
Daniel Flores OlmosInfrastructure and Support Engineer

Author

Commented:
UPDATE: For some reason the current windows server session closes and the backup was interrupted so I took advantage of that and removed the HD and plugged it again and for some minutes the display stops sending the error and goes blue and the blinking led of the HD goes green but few minutes later the error goes back in the display and the HD led goes back to orange.
CERTIFIED EXPERT
Top Expert 2014
Commented:
This problem has been solved!
(Unlock this solution with a 7-day Free Trial)
UNLOCK SOLUTION
PowerEdgeTechIT Consultant
CERTIFIED EXPERT
Top Expert 2010
Commented:
This problem has been solved!
(Unlock this solution with a 7-day Free Trial)
UNLOCK SOLUTION
CERTIFIED EXPERT
Distinguished Expert 2019

Commented:
Just to add that replacing a 600GB disk with a 1TB would be ill advised even if they are both SAS because the 600GB ones will be 10 or 15K whereas the 1TB will be 7.2K "nearline" disk, you can replace it with a same speed 90GB one if you had one of them that's the same spin speed.
DavidPresident
CERTIFIED EXPERT
Top Expert 2010

Commented:
Do NOT replace the drive.   it puts you at extreme risk of data loss.  (Because you degrade the RAID, and just ONE unreadable block guarantees data loss)

So here is the smart move ... buy a replacement,  then do an in-place upgrade from RAID5 -> RAID6.   You have redundant data all of the time, and even if the drive eventually fails you still have redundant data.

(Besides, doing RAID5 is just nuts if your system is one where you are concerned about the inconvenience of down time, data loss, or rebuilding.
Daniel Flores OlmosInfrastructure and Support Engineer

Author

Commented:
Thank you all,

I now have clear my doubts but Dell is giving me 3-4 weeks delivery time and in that time, surely I'll be back with you all to rebuild the array; hope I can keep open this ticket until the new disk comes.
PowerEdgeTechIT Consultant
CERTIFIED EXPERT
Top Expert 2010

Commented:
If you'd rather not wait that long, there are resellers that could get it to you in a day or two:
http://www.xbyte.com/Items.aspx?key=fr&code=457&cat=P_D_SP_HDD&grp=2&fil5=5%3a106&fil2=2%3a457&incl_m=F
CERTIFIED EXPERT
Distinguished Expert 2019

Commented:
3-4 weeks is a long time to be at risk, I'd rather fit a reconditioned one than wait that long.

I like dlethe's idea of migrating to RAID 6 although it's a bit slower but I'd still replace the predictive fail one after the RAID level migration was complete so that would mean buying two. At least they're 10 or 15K SAS so low chance of unrecoverable read errors compared to 7.2K disks so tolerable in RAID5.

Gain unlimited access to on-demand training courses with an Experts Exchange subscription.

Get Access
Why Experts Exchange?

Experts Exchange always has the answer, or at the least points me in the correct direction! It is like having another employee that is extremely experienced.

Jim Murphy
Programmer at Smart IT Solutions

When asked, what has been your best career decision?

Deciding to stick with EE.

Mohamed Asif
Technical Department Head

Being involved with EE helped me to grow personally and professionally.

Carl Webster
CTP, Sr Infrastructure Consultant
Empower Your Career
Did You Know?

We've partnered with two important charities to provide clean water and computer science education to those who need it most. READ MORE

Ask ANY Question

Connect with Certified Experts to gain insight and support on specific technology challenges including:

  • Troubleshooting
  • Research
  • Professional Opinions