asked on

Proliant ML350 G4, Ultra320 Drives downshifted to Ultra 2 Narrow

Have an HP Proliant ML350 G4 that's giving me some hard drive grief. The server has 6x72.8GB 10K ultra320 SCSI drives.

It all starts with an Event on bootup from CPQCISSE, Event ID: 24683
SCSI bus fault occurred on Storage Box box 0, , Port 0 of
Array Controller in slot 3.
This may result in a "downshift" in transfer rate for one or more hard drives on the bus.

When I look at my drive configuration, I see that drives 0, 1, 2, 3 have been downshifted to Ultra 2 Narrow. Drive 4 is running at Ultra 320 Wide (!). Drive 5 is running at Ultra 2 Wide.

I have updated all firmware on the system and hard drives, and pulled them all out and reseated the cables.

Looking at the diagnostics, I see that the Storage Enclosure and Drive Cage have a Critical saying U160 Disabled.

On the physical drive there is a warning flag about the drives being downshifted to Ultra 2.

I do see that drives 4 & 5 have a timeout as the last failed reason. I suspect the timeout on drive 4 is from when I pulled it out while the system was on earlier this month, not sure about the timeout on drive 5.

Any suggestions? Attached is the diagnostics report.

report-c7380d42-00001f20-0000000.zip

SOLUTION

SysExpert

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ASKER CERTIFIED SOLUTION

David

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Member_2_231077

Never seen the count for "bus faults" being so high. There's no external SCSI connector on the controller so no chance anything's accidentally plugged in there, just the 6 internal drive bays. I'd take the lid off and reseat the cable, maybe it's damaged.

gozoliet

ASKER

Just some thoughts from me...
Andy - you're right about the error counts.. Lots of errors all over the place, which gives the downshifting some reason. Unfortunately it's not like 1 drive has a higher amount of errors, it's pretty consistent across disks on the same port (except 4/5?? still a mystery).

dlethe - the storage bay is internal, they're hot-swappable (or hot-plug) hard drives on the front of an HP. There was a firmware release that corrected a "timeout" issue which caused the controller to downshift, but that's for 15k drives.. not mine..

I'll try to dig out another cable. Otherwise I don't *really* want to shell out some cash to try a new backplane on a whim. The handful of threads on google have mentionned everything from mainboard to power supply so lots of part-swapping. ugh.

The suggestions are great.. we'll see if more come in or if I isolate better.

SOLUTION

Member_2_231077

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

gozoliet

ASKER

Just an update.. or a non-update. Have reseated everything and still experiencing problem.
Don't have a spare PSU to test with at this point in time. Not quite sure how I would go about testing the hard drives individually without disturbing the raid.. If I replug a drive onto the onboard controller and boot off something to run smartmon on that drive, can I replace the hard drive in the array after and everything works as is?

gozoliet

ASKER

Never really figured it out, still a problem, but your tips were good at helping me look further into it.