asked on

Proliant ML330 G3 - ATA RAID Issue

I am running a HP/Compaq Proliant - ML330 G3 server with onboard ATA-RAID. The system has 2x80GB HDs as one RAID1 logical drive and 2x160GB HDs as another RAID1 logical drive. The system began posting errors to the event log yesterday. And the monitoring utils were reporting failed one failed drive, then another, then they would both come back online. Both were the 160GB drives on the secondary RAID controller on the same cable as CS.

I have updated the drivers, controller firmware, and changed the HD cable itself, per HP tech support. The first two errors are now only on system startup, which takes 10 minutes. The server is running now and seems to be stable. I suspect a problem with the onboard RAID controller. Has anyone else seen something like this?

First error:

Event Type:      Error
Event Source:      LsiCsb6
Event Category:      None
Event ID:      9
Date:            6/5/2006
Time:            12:03:35 PM
User:            N/A
Computer:      WVFP01
Description:
The device, \Device\Scsi\LsiCsb61, did not respond within the timeout period.

Second error: (there are lots of these)

Event Type:      Error
Event Source:      Disk
Event Category:      None
Event ID:      11
Date:            6/5/2006
Time:            12:03:35 PM
User:            N/A
Computer:      WVFP01
Description:
The driver detected a controller error on \Device\Harddisk1.

Failed drive error:

Event Type:      Error
Event Source:      Storage Agents
Event Category:      Events
Event ID:      1186
Date:            6/5/2006
Time:            12:58:53 PM
User:            N/A
Computer:      WVFP01
Description:
IDE ATA Disk Status Change. The ATA disk drive with model ST3160021A and serial number 5JS4667D has a new status of 4.
(ATA disk status values: 1=other, 2=ok, 3=smartError, 4=failed)
[SNMP TRAP: 14004 in CPQIDE.MIB]

Recovered drive message:

Event Type:      Information
Event Source:      Storage Agents
Event Category:      Events
Event ID:      1186
Date:            6/5/2006
Time:            1:21:37 PM
User:            N/A
Computer:      WVFP01
Description:
IDE ATA Disk Status Change. The ATA disk drive with model ST3160021A and serial number 5JS4667D has a new status of 2.
(ATA disk status values: 1=other, 2=ok, 3=smartError, 4=failed)
[SNMP TRAP: 14004 in CPQIDE.MIB]

ASKER CERTIFIED SOLUTION

rindi

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

pgm554

I would run disk diags along with controller diags from the HP site.

My first guess is that your drive(s) are going bad.

Just for the heck of it I would change from cable select to master/slave.

pgm554

If you really want to do ATA RAID correctly,i would add another RAID controlle,like Rindi explained,having 2 IDE drives hanging off the same channel is not a good idea.
IDE is cheap,but if you want to do do RAID right ,go SCSI.

wdhanson94

ASKER

Thanks to all for your help, and quick responses.

The server seems to be running fine now. I believe the problem started when one of my users accidentally moved a very large directory to an adjacent directory, about 20GB and thousands of files worth of data began moving to the other folder. The RAID controller was so busy replicating all the changes to the drives that it was slow to respond to the monitoring software. It seems that no hardware failure ever occoured.

This server will be phased out later this year. My users seem to be pushing it to it's limits. The new server will definately be SCSI.

Thanks again!