Solved

HP Proliant Server with RAID 5, possible bad drive?

Posted on 2010-08-27
28
2,800 Views
Last Modified: 2012-05-10
I have a server and I am constanly (every few seconds) getting errors regarding unable to write to disk and write delay failed.  If you look the array, none of the drives have an amber light, there all green.  However, the perfromance of the machine also indicates a failing drive.  I ran the HP Array diagnostics tool but am having a hard time determining which drive is failing b/c there is so much info in this report.  I have attached the report to this question.  If someone could look over it as well and tell me which drive is failing I would greatly appreciate the help.

Thanks!!!!!!  Array-dianostics.pdf
0
Comment
Question by:nytekgirl
  • 12
  • 7
  • 3
  • +5
28 Comments
 
LVL 6

Accepted Solution

by:
radnbne earned 300 total points
ID: 33547611
I can see no obvious errors in the report, however this report is a hardware / raid level report, and would not necessarily show logical errors on the disk caused by OS corruption.

1) Do you have a current backup of the server?
2) Where are the errors being reported?
3) How long has this been happening?

Regards
Steve
0
 
LVL 7

Author Comment

by:nytekgirl
ID: 33547643
What is the best method to find out or show "logical errors on the disk caused by OS corruption"?
0
 
LVL 6

Expert Comment

by:radnbne
ID: 33547648
What OS are you running?
0
 
LVL 7

Author Comment

by:nytekgirl
ID: 33547723
Windows 2000 Server...
0
 
LVL 6

Expert Comment

by:radnbne
ID: 33547799
Are the errors in the event log?

You could try a chkdsk on the drive and see if any problems are reported.
0
 
LVL 23

Expert Comment

by:ComputerTechie
ID: 33548027

Backup before running check disk. Chkdsk /f

CT
0
 
LVL 11

Expert Comment

by:gmbaxter
ID: 33548162
Does the raid controller have a battery (being raid 5) if so may require replacement this may be causing the issues with disk i/o.
0
 
LVL 5

Expert Comment

by:DanMar
ID: 33548395
Hi Nytekgirl,
This is actually errors coming relating to some of the drives:
Last Failure Reason: 0x0D (Drive hardware error) coming up for Drive ID1(page 16) & ID4(page 19).  The other drives show: 0x30 (A drive insertion has been detected) &
0x00 (Drive has not failed)
Now you know which are the problematic disks (please note ID1 is not the first disk).  This doesn't look totally healthy and the array may fail at some point.  To solve this properly would involve updating firmware on each drive, full scans, and rebuild arrary.  Considering this is a 2000 Server box I suggest at this point to initiate a migration to new hardware.
 
0
 
LVL 55

Expert Comment

by:andyalder
ID: 33548975
@gmbaxter - no battery fitted according to ADU report.

I can't see anything wrong at first glance, those disks mentioned above look like they have been replaced since although I'd have to read through again when got more time to confirm.

Can you list a few of the event log errors please.
0
 
LVL 7

Author Comment

by:nytekgirl
ID: 33550153
Thanks guys, I am going to update the drivers for the controlller card and run CHKDSK and get back to you soon with the errors from the event log.  They are mostly Write Delayed Fail errors but I can get you more information from the logs.....
0
 
LVL 7

Author Comment

by:nytekgirl
ID: 33554422
I ran chkdksk /r and several errors were found and replaced however we are still receiving the same errors as before about every minute.  I don't have access to the event logs at the moment but I remember the error was: somthing to the effect of:

Windows - Delayed Write Failed : Windows was unable to write to \\device\harddiskvolume0. This data has been lost.

Any other suggesstions of how I can figure out what exactly is failing?

Thanks!
0
 
LVL 7

Expert Comment

by:D_Vante
ID: 33554729
I believe you will need the memory stick and battery backup on the RAID controller.  Is there a large chunk a data you can take off the drives?  If you can do this and the errors go away, then that is your problem.
0
 
LVL 7

Author Comment

by:nytekgirl
ID: 33554786
These servers have been in place for a long time and we have never had this problem until this week.  And the server is performing very slowly as if it is indeed having drive issues so I feel like the battery for the Raid Controller is not the issue....
0
 
LVL 6

Expert Comment

by:radnbne
ID: 33554809
Delayed write failures can be caused by a number of things, including:

Failing hard drive media
Faulty Controller
Faulty RAM
Faulty drivers

I have re-read the log file you posted and I can see each of the drives has reported some errors, with drive id 3 and 4 reporting the highest number.  that said none of the drives is reporting current errors, and none appear to have rebuilt.

If you can post the details from the event logs it might help us to pinpoint the issue further.

0
6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

 
LVL 7

Author Comment

by:nytekgirl
ID: 33554886
It is really only one error....


Windows - Delayed Write Failed : Windows was unable to write to \\device\harddiskvolume0. This data has been lost.

0
 
LVL 6

Expert Comment

by:radnbne
ID: 33554951
I am thinking it's backup and rebuild time for your Windows 2000 server.  You wanted to upgrade it anyway didn't you :-)

You may be have more than one drive failing in the array, you could also have a logical fault that is not being detected by the controller.

If this were a desktop pc I would suggest running a hard disk repair tool called hdd regenerator across the drive, but as it is a raid 5 config, I suggest a reformat and a rebuild.

As you are running Windows 2000 Server it might be time to upgrade.

Any other thoughts from the experts?
0
 
LVL 55

Expert Comment

by:andyalder
ID: 33556816
Look at the integrated management log under systems management homepage, that should tell if there is a problem with the controller. You can ignore the odd error on individual disk drives since RAID takes care of that. Lack of a battery doesn't affect data integrity since without a battery the cache is just read cache, it affects speed as write cache is disabled without but not data.

You could replace the controller just to be sure but if it was going bad I'd expect delayed write failure on volume 1 as well as volume 0 as they're both on the same spindles.
0
 
LVL 1

Expert Comment

by:ted_sin
ID: 33556878
That specific error can also be caused by bad sectors, run Chkdsk /r to scan for bad sectors.
0
 
LVL 7

Author Comment

by:nytekgirl
ID: 33558625
See above, did that yesterday,  The errors were found and corrected but we are still getting the same error about every minute.
0
 
LVL 7

Author Comment

by:nytekgirl
ID: 33561485
Here is the exact error:
Event ID 11 - The driver detected a controller error on \Device\Harddisk0\DR0

Thanks!
0
 
LVL 6

Expert Comment

by:radnbne
ID: 33563086
0
 
LVL 7

Author Comment

by:nytekgirl
ID: 33563475
Okay, I was able to get in front of the server today and get the actual error messages from the event log.  I updated the controller and SCSI drivers today but that did not help unfortuately.

Event id 11:  The driver detected a controller error on \Device\Harddisk0\DR0
Event id  7:  The device \Device\HardDisk0\DR0 has a bad block.

These are the only two errors and they are appearing about every 30 seconds.

Thanks!
0
 
LVL 6

Expert Comment

by:radnbne
ID: 33563504
These errors indicate hardware failure.

The fact you are getting a bad block error and a controller error, I expect the controller is failing.  Do you have a spare controller?
0
 
LVL 7

Author Comment

by:nytekgirl
ID: 33563521
I was thinking the same thing, this is for a client of mine so I will pick one up and try it out tomorrow and get back to you.
0
 
LVL 7

Author Comment

by:nytekgirl
ID: 33579779
Before we buy a new controller card and try that route, my boss suggessted we swap out the c Drive since most of the erros occur when the OS is trying to write to the that disk.  If we swap out the C: Drive (which is not part of the array) and reinstall the OS is their any particular measures we need to take since we also have a RAID 5 Array for storage?
0
 
LVL 55

Assisted Solution

by:andyalder
andyalder earned 100 total points
ID: 33580907
I think you'll find the C and D drives share the same spindles, look in the ACU and do a screenshot of logical view.
0
 
LVL 5

Assisted Solution

by:DanMar
DanMar earned 100 total points
ID: 33585948
Hi Nytegirl,
Are you sure that C: is not part of the RAID 5?  Often arrays are setup in the hardware and all drive letters presented make up the array.
As mentioned 2 disks are showing errors.  This array is destined to fail at some point as losing 2 disks mean losing all the data on the array.  If you are not wanting to migrate to new hardware then you should backup all the data e.g. to an external USB drive, totally reinitialise the RAID 5 array, reinstall OS then copy back the data.  You will extend some lifeof your system by going down this path.
0
 
LVL 7

Author Closing Comment

by:nytekgirl
ID: 33608453
Turns out the SCSI Drives in the RAID Array were fine.  Replaced the C Drive and am now good to go.  Thanks to all.
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

AWS Glacier is Amazons cheapest storage option and is their answer to a ‘Cold’ storage service.  Customers primarily use this service for archival purposes and storage of infrastructure backups.  Its unlimited storage potential and low storage cost …
I use more than 1 computer in my office for various reasons. Multiple keyboards and mice take up more than just extra space, they make working a little more complicated. Using one mouse and keyboard for all of my computers makes life easier. This co…
This video teaches viewers how to encrypt an external drive that requires a password to read and edit the drive. All tasks are done in Disk Utility. Plug in the external drive you wish to encrypt: Make sure all previous data on the drive has been …
This tutorial will walk an individual through the process of installing the necessary services and then configuring a Windows Server 2012 system as an iSCSI target. To install the necessary roles, go to Server Manager, and select Add Roles and Featu…

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now