• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 2433
  • Last Modified:

Proliant ML330 G3 - ATA RAID Issue

I am running a HP/Compaq Proliant - ML330 G3 server with onboard ATA-RAID.  The system has 2x80GB HDs as one RAID1 logical drive and 2x160GB HDs as another RAID1 logical drive.  The system began posting errors to the event log yesterday.  And the monitoring utils were reporting failed one failed drive, then another, then they would both come back online.  Both were the 160GB drives on the secondary RAID controller on the same cable as CS.

I have updated the drivers, controller firmware, and changed the HD cable itself, per HP tech support.  The first two errors are now only on system startup, which takes 10 minutes.  The server is running now and seems to be stable.  I suspect a problem with the onboard RAID controller.  Has anyone else seen something like this?  

First error:

Event Type:      Error
Event Source:      LsiCsb6
Event Category:      None
Event ID:      9
Date:            6/5/2006
Time:            12:03:35 PM
User:            N/A
Computer:      WVFP01
Description:
The device, \Device\Scsi\LsiCsb61, did not respond within the timeout period.

Second error:  (there are lots of these)

Event Type:      Error
Event Source:      Disk
Event Category:      None
Event ID:      11
Date:            6/5/2006
Time:            12:03:35 PM
User:            N/A
Computer:      WVFP01
Description:
The driver detected a controller error on \Device\Harddisk1.

Failed drive error:

Event Type:      Error
Event Source:      Storage Agents
Event Category:      Events
Event ID:      1186
Date:            6/5/2006
Time:            12:58:53 PM
User:            N/A
Computer:      WVFP01
Description:
IDE ATA Disk Status Change.  The ATA disk drive with model ST3160021A and serial number 5JS4667D has a new status of 4.
(ATA disk status values: 1=other, 2=ok, 3=smartError, 4=failed)
[SNMP TRAP: 14004 in CPQIDE.MIB]

Recovered drive message:

Event Type:      Information
Event Source:      Storage Agents
Event Category:      Events
Event ID:      1186
Date:            6/5/2006
Time:            1:21:37 PM
User:            N/A
Computer:      WVFP01
Description:
IDE ATA Disk Status Change.  The ATA disk drive with model ST3160021A and serial number 5JS4667D has a new status of 2.
(ATA disk status values: 1=other, 2=ok, 3=smartError, 4=failed)
[SNMP TRAP: 14004 in CPQIDE.MIB]
0
wdhanson94
Asked:
wdhanson94
  • 2
1 Solution
 
rindiCommented:
First I'd not have all the disks of the same array on the same IDE channel (According to you you have the 160GB disks on the same cable, set as CS). An IDE channel can only access one connected device at a time, In raid 1 both drives are normally accessed at the same time, and being on the same channel will drasticaly reduce the speed, as first one, then the other drive is accessed, and that goes back and forth. This could also be the cause of your problem because of too much work the disks and controller has to do.

Also download the the seagate utilities and thoroughly test the drives using these. Also check the server's powersupply, often temporary failures are caused by a bad powersupply.
0
 
pgm554Commented:
I would run disk diags along with controller diags from the HP site.

My first guess is that your drive(s) are going bad.

Just for the heck of it I would change from cable select to master/slave.

0
 
pgm554Commented:
If you really want to do ATA RAID correctly,i would add another RAID controlle,like Rindi explained,having 2 IDE drives hanging off the same channel is not a good idea.
IDE is cheap,but if you want to do do RAID right ,go SCSI.
0
 
wdhanson94Author Commented:
Thanks to all for your help, and quick responses.

The server seems to be running fine now.  I believe the problem started when one of my users accidentally moved a very large directory to an adjacent directory, about 20GB and thousands of files worth of data began moving to the other folder.  The RAID controller was so busy replicating all the changes to the drives that it was slow to respond to the monitoring software.  It seems that no hardware failure ever occoured.

This server will be phased out later this year.  My users seem to be pushing it to it's limits.  The new server will definately be SCSI.

Thanks again!
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now