Solved

Repeated scsi drive failure with windows 2003

Posted on 2003-12-03
18
541 Views
Last Modified: 2012-06-27
Hey Guys,

I'm having problems with drives failing on reboots.  I'm using windows 2003 enterprise server, IBM Ultrastar 160 drives, adaptec 2010S raid controller, supermicro motherboard and backplane.

In the past two weeks, we've had 3 drive failures all on the reboots.   We are running Raid 5 config, so we haven't lost any data, and we have caught it every time.  Basically is there any known issues or compatibility issues with this hardware?

Thanks.....This is a EXTREMELY important question, so it's valued at 500 points.
0
Comment
Question by:johnstaggs
  • 10
  • 7
18 Comments
 
LVL 18

Expert Comment

by:chicagoan
ID: 9869604
Did the drives really fail or are they failing to come ready?
I've seen 160's and 320's in hotswap cages not come ready from a cold start, but work fine if they're inserted into the cage after the system's power up and warm booted.
Sometimes you see  LUN=0 BUS=0 ID=0 Bad SCSI Status – Check Condition messages.
Setting the system bios to do a full post sometimes helps, especially if there is a lot of ram.
These drive can take up to 20 seconds to spin up.
Check the auto spin up jumper settings, though setting all the drives to auto spin can put a severe load on the PSU.

0
 

Author Comment

by:johnstaggs
ID: 9869682
Well, when the system boots, the drives has no lights on what so ever.  Then after Windows boots up, we will go into SMOR (Adaptecs software) to look at the config, and the drive with no lights on will be marked with a red drive, and say "failed".
0
 

Author Comment

by:johnstaggs
ID: 9869697
One thing we did notice with the Ultrastar drives....is some are differant models then the others.  Only 1 model that we have is on the Windows HCL list for Windows 2003 Server......Could this be a potential pitfall?
0
Migrating Your Company's PCs

To keep pace with competitors, businesses must keep employees productive, and that means providing them with the latest technology. This document provides the tips and tricks you need to help you migrate an outdated PC fleet to new desktops, laptops, and tablets.

 
LVL 18

Expert Comment

by:chicagoan
ID: 9869856
Are these in a hotswap cage? Whose?

What have you done with the failed drives?
0
 

Author Comment

by:johnstaggs
ID: 9870075
Yes, SuperMicro.

We've reformatted them, and gave them another try, and they are able to be used again.
0
 
LVL 18

Expert Comment

by:chicagoan
ID: 9870105
WHich lends credence to them being OK and just not ready.
I'd get with SuperMicro and see what their suggestions are about getting the drives initialized before the OS looks at the array.
0
 

Author Comment

by:johnstaggs
ID: 9870516
The drive is intializing when the system boots up, because you can see it when you post.  And also if you go to SMOR (bios) it will show the raid "degraded".   But maybe I'm not quite getting what your saying.
0
 
LVL 18

Expert Comment

by:chicagoan
ID: 9871411
If the drives test out OK afterward, something's going on at boot time that makes them unavailable to the array.
I'd really be suspicious of the backplane/disk enclosure here, and I'd see what the manufacturerer has to say.
0
 
LVL 48

Expert Comment

by:dbrunton
ID: 9871750
>> WHich lends credence to them being OK and just not ready.

I'll add a couple of comments to this statement.

I'd be looking at the SCSI cable and the host adapter in this case.  And possibly power supply, it may not be capable of supporting the power required for everything.
0
 

Author Comment

by:johnstaggs
ID: 9874340
Hey guys, I'm going to get ahold of supermicro today, so bare with me.   But the backplane is a good possibility....since I had already called adaptec and they said it could be the problem.  Another thing is, I had single drive on a dual xeon box (same exact type of setup), the machine did not have a raid controler in it (wasn't running raid, hense the single drive).  And it died on me in about 1 week.

Maybe that will lead to something else.  I'm in the process of setting up another machine, which it has dual backplanes on it, and i'm sure a differant type of raid controller.


But all suggestions are welcome, and I really appreciate the time you guys take to help me figure out the problem.
0
 
LVL 18

Expert Comment

by:chicagoan
ID: 9874450
While a drive dying in another box, especially a non redundant drive, is a pain, I think it's just anecdotal.
You said the drives from the degraded array tested OK outside the system.
Unless these drives are all from one lot and you suspect a manufacturing defect, I'm liking the backplane/bus as the likely suspect.
0
 

Author Comment

by:johnstaggs
ID: 9874524
Indeed, I was able to format the failed drive, and put it back into the array, so that shows it's not really the drives.  All the drives being from the same lot is a good chance, there is two differant models of the drive that we have.  (drive specs are the same, just differant models).

So I should look into the backplane/bus issue correct?   And do you guys have any suggestions on how I could go about testing it?

(btw, that non redundant drive that i lost, was just on a test box i had setup, so it wasn't to important...thank god).   Right now I'm not setting any machine up, unless it's using raid 5, and has two hot spares.   You know, I had ran these drives quite awhile on a differant box, that had a older motherboard, never had a single problem...  Then I went to these new boxes that have a newer motherboard, and have had nothing but problems with the drives.

When i say older motherboard, i'm meaning months, not years or anything, but they are differant models.
0
 
LVL 18

Expert Comment

by:chicagoan
ID: 9874724
That's one of those "zero-raid" setups through a dedicated PCI slot?
0
 

Author Comment

by:johnstaggs
ID: 9874768
yes, that is correct.
0
 
LVL 18

Accepted Solution

by:
chicagoan earned 500 total points
ID: 9874888
0
 

Author Comment

by:johnstaggs
ID: 9874970
I do have the latest 2003 drivers, but I'm going to have to check about the latest bios, give me a few min, and I'll update this and let you know.
0
 

Author Comment

by:johnstaggs
ID: 9875093
Bios shows I20 v.001.62 but the date doesn't match the date on the link.  So i'm going to do a update to both of these on a new machine
0
 

Author Comment

by:johnstaggs
ID: 9875514
I had the latest 2003 (tried reinstalling it).  The bios looked like the same version, but it was updated.  So both of those are done.

We've got another very similar machine, and we are setting it up with raid5 (it has a split backplane), and two hostspares.  We are going to run it for awhile and see if we run into any more problems.

I"m going to go ahead and award the points to you, but if you can think of anything else to try down the road, please reply.

Thanks
0

Featured Post

The Eight Noble Truths of Backup and Recovery

How can IT departments tackle the challenges of a Big Data world? This white paper provides a roadmap to success and helps companies ensure that all their data is safe and secure, no matter if it resides on-premise with physical or virtual machines or in the cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A clone is a duplicate copy. Sheep have been cloned and maybe someday even people will be cloned, but disk cloning (performed by the hard drive cloning software) is a vital tool used to manage and protect data. Let’s look at what hard drive cloning …
This article outlines why you need to choose a backup solution that protects your entire environment – including your VMware ESXi and Microsoft Hyper-V virtualization hosts – not just your virtual machines.
This Micro Tutorial will teach you how to censor certain areas of your screen. The example in this video will show a little boy's face being blurred. This will be demonstrated using Adobe Premiere Pro CS6.
In a recent question (https://www.experts-exchange.com/questions/28997919/Pagination-in-Adobe-Acrobat.html) here at Experts Exchange, a member asked how to add page numbers to a PDF file using Adobe Acrobat XI Pro. This short video Micro Tutorial sh…

803 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question