Solved

Repeated scsi drive failure with windows 2003

Posted on 2003-12-03
18
539 Views
Last Modified: 2012-06-27
Hey Guys,

I'm having problems with drives failing on reboots.  I'm using windows 2003 enterprise server, IBM Ultrastar 160 drives, adaptec 2010S raid controller, supermicro motherboard and backplane.

In the past two weeks, we've had 3 drive failures all on the reboots.   We are running Raid 5 config, so we haven't lost any data, and we have caught it every time.  Basically is there any known issues or compatibility issues with this hardware?

Thanks.....This is a EXTREMELY important question, so it's valued at 500 points.
0
Comment
Question by:johnstaggs
  • 10
  • 7
18 Comments
 
LVL 18

Expert Comment

by:chicagoan
ID: 9869604
Did the drives really fail or are they failing to come ready?
I've seen 160's and 320's in hotswap cages not come ready from a cold start, but work fine if they're inserted into the cage after the system's power up and warm booted.
Sometimes you see  LUN=0 BUS=0 ID=0 Bad SCSI Status – Check Condition messages.
Setting the system bios to do a full post sometimes helps, especially if there is a lot of ram.
These drive can take up to 20 seconds to spin up.
Check the auto spin up jumper settings, though setting all the drives to auto spin can put a severe load on the PSU.

0
 

Author Comment

by:johnstaggs
ID: 9869682
Well, when the system boots, the drives has no lights on what so ever.  Then after Windows boots up, we will go into SMOR (Adaptecs software) to look at the config, and the drive with no lights on will be marked with a red drive, and say "failed".
0
 

Author Comment

by:johnstaggs
ID: 9869697
One thing we did notice with the Ultrastar drives....is some are differant models then the others.  Only 1 model that we have is on the Windows HCL list for Windows 2003 Server......Could this be a potential pitfall?
0
 
LVL 18

Expert Comment

by:chicagoan
ID: 9869856
Are these in a hotswap cage? Whose?

What have you done with the failed drives?
0
 

Author Comment

by:johnstaggs
ID: 9870075
Yes, SuperMicro.

We've reformatted them, and gave them another try, and they are able to be used again.
0
 
LVL 18

Expert Comment

by:chicagoan
ID: 9870105
WHich lends credence to them being OK and just not ready.
I'd get with SuperMicro and see what their suggestions are about getting the drives initialized before the OS looks at the array.
0
 

Author Comment

by:johnstaggs
ID: 9870516
The drive is intializing when the system boots up, because you can see it when you post.  And also if you go to SMOR (bios) it will show the raid "degraded".   But maybe I'm not quite getting what your saying.
0
 
LVL 18

Expert Comment

by:chicagoan
ID: 9871411
If the drives test out OK afterward, something's going on at boot time that makes them unavailable to the array.
I'd really be suspicious of the backplane/disk enclosure here, and I'd see what the manufacturerer has to say.
0
 
LVL 47

Expert Comment

by:dbrunton
ID: 9871750
>> WHich lends credence to them being OK and just not ready.

I'll add a couple of comments to this statement.

I'd be looking at the SCSI cable and the host adapter in this case.  And possibly power supply, it may not be capable of supporting the power required for everything.
0
Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

 

Author Comment

by:johnstaggs
ID: 9874340
Hey guys, I'm going to get ahold of supermicro today, so bare with me.   But the backplane is a good possibility....since I had already called adaptec and they said it could be the problem.  Another thing is, I had single drive on a dual xeon box (same exact type of setup), the machine did not have a raid controler in it (wasn't running raid, hense the single drive).  And it died on me in about 1 week.

Maybe that will lead to something else.  I'm in the process of setting up another machine, which it has dual backplanes on it, and i'm sure a differant type of raid controller.


But all suggestions are welcome, and I really appreciate the time you guys take to help me figure out the problem.
0
 
LVL 18

Expert Comment

by:chicagoan
ID: 9874450
While a drive dying in another box, especially a non redundant drive, is a pain, I think it's just anecdotal.
You said the drives from the degraded array tested OK outside the system.
Unless these drives are all from one lot and you suspect a manufacturing defect, I'm liking the backplane/bus as the likely suspect.
0
 

Author Comment

by:johnstaggs
ID: 9874524
Indeed, I was able to format the failed drive, and put it back into the array, so that shows it's not really the drives.  All the drives being from the same lot is a good chance, there is two differant models of the drive that we have.  (drive specs are the same, just differant models).

So I should look into the backplane/bus issue correct?   And do you guys have any suggestions on how I could go about testing it?

(btw, that non redundant drive that i lost, was just on a test box i had setup, so it wasn't to important...thank god).   Right now I'm not setting any machine up, unless it's using raid 5, and has two hot spares.   You know, I had ran these drives quite awhile on a differant box, that had a older motherboard, never had a single problem...  Then I went to these new boxes that have a newer motherboard, and have had nothing but problems with the drives.

When i say older motherboard, i'm meaning months, not years or anything, but they are differant models.
0
 
LVL 18

Expert Comment

by:chicagoan
ID: 9874724
That's one of those "zero-raid" setups through a dedicated PCI slot?
0
 

Author Comment

by:johnstaggs
ID: 9874768
yes, that is correct.
0
 
LVL 18

Accepted Solution

by:
chicagoan earned 500 total points
ID: 9874888
0
 

Author Comment

by:johnstaggs
ID: 9874970
I do have the latest 2003 drivers, but I'm going to have to check about the latest bios, give me a few min, and I'll update this and let you know.
0
 

Author Comment

by:johnstaggs
ID: 9875093
Bios shows I20 v.001.62 but the date doesn't match the date on the link.  So i'm going to do a update to both of these on a new machine
0
 

Author Comment

by:johnstaggs
ID: 9875514
I had the latest 2003 (tried reinstalling it).  The bios looked like the same version, but it was updated.  So both of those are done.

We've got another very similar machine, and we are setting it up with raid5 (it has a split backplane), and two hostspares.  We are going to run it for awhile and see if we run into any more problems.

I"m going to go ahead and award the points to you, but if you can think of anything else to try down the road, please reply.

Thanks
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

The Rasberry PI is a low cost piece of hardware that you can have a lot of fun with through experimenting and building/working on projects like media players, running a low cost computer, build data loggers etc. - see: https://www.raspberrypi.org
Does your iMac really need a hardware upgrade? Will upgrading RAM speed-up your computer? If yes, then how can you proceed? Upgrading RAM in your iMac is not as simple as it may seem. This article will help you in getting and installing right RA…
Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…
This video gives you a great overview about bandwidth monitoring with SNMP and WMI with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're looking for how to monitor bandwidth using netflow or packet s…

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now