Repeated scsi drive failure with windows 2003

Hey Guys,

I'm having problems with drives failing on reboots.  I'm using windows 2003 enterprise server, IBM Ultrastar 160 drives, adaptec 2010S raid controller, supermicro motherboard and backplane.

In the past two weeks, we've had 3 drive failures all on the reboots.   We are running Raid 5 config, so we haven't lost any data, and we have caught it every time.  Basically is there any known issues or compatibility issues with this hardware?

Thanks.....This is a EXTREMELY important question, so it's valued at 500 points.
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Did the drives really fail or are they failing to come ready?
I've seen 160's and 320's in hotswap cages not come ready from a cold start, but work fine if they're inserted into the cage after the system's power up and warm booted.
Sometimes you see  LUN=0 BUS=0 ID=0 Bad SCSI Status – Check Condition messages.
Setting the system bios to do a full post sometimes helps, especially if there is a lot of ram.
These drive can take up to 20 seconds to spin up.
Check the auto spin up jumper settings, though setting all the drives to auto spin can put a severe load on the PSU.

johnstaggsAuthor Commented:
Well, when the system boots, the drives has no lights on what so ever.  Then after Windows boots up, we will go into SMOR (Adaptecs software) to look at the config, and the drive with no lights on will be marked with a red drive, and say "failed".
johnstaggsAuthor Commented:
One thing we did notice with the Ultrastar some are differant models then the others.  Only 1 model that we have is on the Windows HCL list for Windows 2003 Server......Could this be a potential pitfall?
10 Tips to Protect Your Business from Ransomware

Did you know that ransomware is the most widespread, destructive malware in the world today? It accounts for 39% of all security breaches, with ransomware gangsters projected to make $11.5B in profits from online extortion by 2019.

Are these in a hotswap cage? Whose?

What have you done with the failed drives?
johnstaggsAuthor Commented:
Yes, SuperMicro.

We've reformatted them, and gave them another try, and they are able to be used again.
WHich lends credence to them being OK and just not ready.
I'd get with SuperMicro and see what their suggestions are about getting the drives initialized before the OS looks at the array.
johnstaggsAuthor Commented:
The drive is intializing when the system boots up, because you can see it when you post.  And also if you go to SMOR (bios) it will show the raid "degraded".   But maybe I'm not quite getting what your saying.
If the drives test out OK afterward, something's going on at boot time that makes them unavailable to the array.
I'd really be suspicious of the backplane/disk enclosure here, and I'd see what the manufacturerer has to say.
dbruntonQuid, Me Anxius Sum?  Illegitimi non carborundum.Commented:
>> WHich lends credence to them being OK and just not ready.

I'll add a couple of comments to this statement.

I'd be looking at the SCSI cable and the host adapter in this case.  And possibly power supply, it may not be capable of supporting the power required for everything.
johnstaggsAuthor Commented:
Hey guys, I'm going to get ahold of supermicro today, so bare with me.   But the backplane is a good possibility....since I had already called adaptec and they said it could be the problem.  Another thing is, I had single drive on a dual xeon box (same exact type of setup), the machine did not have a raid controler in it (wasn't running raid, hense the single drive).  And it died on me in about 1 week.

Maybe that will lead to something else.  I'm in the process of setting up another machine, which it has dual backplanes on it, and i'm sure a differant type of raid controller.

But all suggestions are welcome, and I really appreciate the time you guys take to help me figure out the problem.
While a drive dying in another box, especially a non redundant drive, is a pain, I think it's just anecdotal.
You said the drives from the degraded array tested OK outside the system.
Unless these drives are all from one lot and you suspect a manufacturing defect, I'm liking the backplane/bus as the likely suspect.
johnstaggsAuthor Commented:
Indeed, I was able to format the failed drive, and put it back into the array, so that shows it's not really the drives.  All the drives being from the same lot is a good chance, there is two differant models of the drive that we have.  (drive specs are the same, just differant models).

So I should look into the backplane/bus issue correct?   And do you guys have any suggestions on how I could go about testing it?

(btw, that non redundant drive that i lost, was just on a test box i had setup, so it wasn't to important...thank god).   Right now I'm not setting any machine up, unless it's using raid 5, and has two hot spares.   You know, I had ran these drives quite awhile on a differant box, that had a older motherboard, never had a single problem...  Then I went to these new boxes that have a newer motherboard, and have had nothing but problems with the drives.

When i say older motherboard, i'm meaning months, not years or anything, but they are differant models.
That's one of those "zero-raid" setups through a dedicated PCI slot?
johnstaggsAuthor Commented:
yes, that is correct.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
johnstaggsAuthor Commented:
I do have the latest 2003 drivers, but I'm going to have to check about the latest bios, give me a few min, and I'll update this and let you know.
johnstaggsAuthor Commented:
Bios shows I20 v.001.62 but the date doesn't match the date on the link.  So i'm going to do a update to both of these on a new machine
johnstaggsAuthor Commented:
I had the latest 2003 (tried reinstalling it).  The bios looked like the same version, but it was updated.  So both of those are done.

We've got another very similar machine, and we are setting it up with raid5 (it has a split backplane), and two hostspares.  We are going to run it for awhile and see if we run into any more problems.

I"m going to go ahead and award the points to you, but if you can think of anything else to try down the road, please reply.

It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.