Solved

Proliant ML350 G4, Ultra320 Drives downshifted to Ultra 2 Narrow

Posted on 2010-09-13
7
1,901 Views
Last Modified: 2012-05-10
Have an HP Proliant ML350 G4 that's giving me some hard drive grief. The server has 6x72.8GB 10K ultra320 SCSI drives.

It all starts with an Event on bootup from CPQCISSE, Event ID: 24683
SCSI bus fault occurred on Storage Box box 0, , Port 0 of
Array Controller  in slot 3.
This may result in a "downshift" in transfer rate for one or more hard drives on the bus.

When I look at my drive configuration, I see that drives 0, 1, 2, 3 have been downshifted to Ultra 2 Narrow.  Drive 4 is running at Ultra 320 Wide (!).  Drive 5 is running at Ultra 2 Wide.

I have updated all firmware on the system and hard drives, and pulled them all out and reseated the cables.

Looking at the diagnostics, I see that the Storage Enclosure and Drive Cage have a Critical saying U160 Disabled.

On the physical drive there is a warning flag about the drives being downshifted to Ultra 2.

I do see that drives 4 & 5 have a timeout as the last failed reason. I suspect the timeout on drive 4 is from when I pulled it out while the system was on earlier this month, not sure about the timeout on drive 5.

Any suggestions? Attached is the diagnostics report.


 report-c7380d42-00001f20-0000000.zip
0
Comment
Question by:gozoliet
7 Comments
 
LVL 63

Assisted Solution

by:SysExpert
SysExpert earned 166 total points
Comment Utility
I would run diagnostics on the drives and the controller, and consider doing a rebuild by swapping out 1 drive at a time, or a full backup and restore to a new set of drives. If the controller is bad, replace it.

I would also swap cables, and maybe even the backplane.

If under warranty, contact HP since the errors are a little weird, and hard to say where the exact cause is.



I hope this helps !
0
 
LVL 47

Accepted Solution

by:
dlethe earned 167 total points
Comment Utility
Noisy signals & poorly shielded cables along with connectors that are not tightened will do this.   At least give parallel SCSI credit, that instead of just dying, it degrades speed so that everything can still talk.  Since you have external enclosures, my advice is to make sure cables are as short as possible.  Not all cabling is suitable for U320 at full distance.  They are not created equally.  Updating firmware has nothing to do with this problem.  

You *could* run some software on a JBOD controller attached to the enclosure and add disks one at a time until reported speed drops from U320.   Then once you see which disk does it, move things around to see if the problem is specific to the disk or the bay.    You have to use a JBOD controller, and this is only way to do it, as you don't want to be messing with moving things around behind a RAID controller, as this would fool the controller into thinking you had drive failures, and would kick off a reboot.

Anyway, that is what I do when in similar situations.   (See http://www.santools.com/smart/unix/manual/linkspeedreporting.htm)
Software will set you back $100, but it will absolutely identify link speed issues.  If you have the cabling, swap everything, but if you want to diagnose & confirm you'll need some software.   Also, you could have issue where link speed just decreases a few seconds, minutes, or longer after it is hooked up.   To diagnose that with the software, just hook it all up, run a read-only benchmark of your choice (do not mount the disks, just do raw read I/O to physical devices), and check link speed every 10 minutes.  Something else may be going on.
0
 
LVL 55

Expert Comment

by:andyalder
Comment Utility
Never seen the count for "bus faults" being so high. There's no external SCSI connector on the controller so no chance anything's accidentally plugged in there, just the 6 internal drive bays. I'd take the lid off and reseat the cable, maybe it's damaged.
0
Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

 
LVL 4

Author Comment

by:gozoliet
Comment Utility
Just some thoughts from me...
Andy - you're right about the error counts.. Lots of errors all over the place, which gives the downshifting some reason.  Unfortunately it's not like 1 drive has a higher amount of errors, it's pretty consistent across disks on the same port (except 4/5?? still a mystery).

dlethe - the storage bay is internal, they're hot-swappable (or hot-plug) hard drives on the front of an HP. There was a firmware release that corrected a "timeout" issue which caused the controller to downshift, but that's for 15k drives.. not mine..

I'll try to dig out another cable. Otherwise I don't *really* want to shell out some cash to try a new backplane on a whim.  The handful of threads on google have mentionned everything from mainboard to power supply so lots of part-swapping. ugh.

The suggestions are great.. we'll see if more come in or if I isolate better.
0
 
LVL 55

Assisted Solution

by:andyalder
andyalder earned 167 total points
Comment Utility
At least you can systematically eliminate the disks if you have time to spin them up one at a time as dlethe suggested, you can always boot the smartstart CD and use the ADU there if you don't want to buy software. you can also eliminate the smart array controller since there's an onboard non-RAID SCSI chip on the motherboard. Make sure you number them. PSU sounds very possible because the non SCSI bus related errors are at silly levels as well.
0
 
LVL 4

Author Comment

by:gozoliet
Comment Utility
Just an update.. or a non-update. Have reseated everything and still experiencing problem.
Don't have a spare PSU to test with at this point in time. Not quite sure how I would go about testing the hard drives individually without disturbing the raid.. If I replug a drive onto the onboard controller and boot off something to run smartmon on that drive, can I replace the hard drive in the array after and everything works as is?
0
 
LVL 4

Author Closing Comment

by:gozoliet
Comment Utility
Never really figured it out, still a problem, but your tips were good at helping me look further into it.
0

Featured Post

What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

Join & Write a Comment

Suggested Solutions

The Samsung SSD 840 EVO and 840 EVO mSATA have a well-known problem with a drop in read performance. I first learned about this in an interesting thread here at Experts Exchange: http://www.experts-exchange.com/Hardware/Storage/Hard_Drives/Q_2852…
AWS Glacier is Amazons cheapest storage option and is their answer to a ‘Cold’ storage service.  Customers primarily use this service for archival purposes and storage of infrastructure backups.  Its unlimited storage potential and low storage cost …
This video Micro Tutorial explains how to clone a hard drive using a commercial software product for Windows systems called Casper from Future Systems Solutions (FSS). Cloning makes an exact, complete copy of one hard disk drive (HDD) onto another d…
This Micro Tutorial will teach you how to reformat your flash drive. Sometimes your flash drive may have issues carrying files so this will completely restore it to manufacturing settings. Make sure to backup all files before reformatting. This w…

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now