Solved

Failed redundancy, drive or controller?

Posted on 2013-01-13
7
172 Views
Last Modified: 2013-12-29
sorry for the long post, figure too much info better than too little

Server 2003, software mirroring, 2 SATA drives, Sunfire x2100

History
Drive 0 gave rare intermittent errors over a year and finally failed BSOD
Boot sector was still good and could boot the mirrored drive
------------
Position 0                         Position 1
Old HDD, BSOD                Old Good drive, failed mirror
Good boot sector             No boot sector



Bought new drive, moved mirror to 1st position and made new drive the mirror
(still booting off floppy for 2 weeks)
-------------
Position 0                         Position 1
Old Good Drive                New Drive, good mirror
No boot sector                 No boot sector


Fixed boot sector on new drive and made it Drive 0 and ran like that for a week, without a working mirror
---------------------
Position 0                       Position 1
New Drive                      Old Good Drive, no mirror
Good boot sector           No boot sector


Finally fixed boot sector on old working drive, made it Drive 1 and reestablished the mirror.
------------------
Position 0                        Position 1
New Drive                       Old Good drive, good mirror for one day
Good boot sector            Good boot sector

Within 24 hours of syncing to 100%, the new drive started reporting errors until it detached itself

Now when I boot the system, I have to boot to the old working drive on Drive 1 and Disk Manager shows failed redundancy on both and the exclamation on Drive 0. When I right click the dynamic volume, it says that the drive status is active and working.
----------------
Position 0                      Position 1
New drive, BSOD           Old good drive, failed mirror
Good boot sector          Good boot sector          


Not sure what is failing here. Originally when the first drive failed and we were booting from the boot sector of the failed drive, but running from the mirror, that worked for a couple of months until we got the new drive... so sounds like working 2nd controller and working 2nd drive
When 2nd drive was in running for two weeks with new drive in 2nd position, there were no errors, so sounds like 2 good controllers and 2 good HDDs
So now that I have a new drive that won't boot, what is suspect, the controller or the new drive?
0
Comment
Question by:shadowz85
  • 4
  • 3
7 Comments
 
LVL 47

Accepted Solution

by:
dlethe earned 500 total points
Comment Utility
No way to eliminate anything, because it is entirely possible that you have data corruption that crept in when you weren't running in a mirrored state after first HDD failed.

Or you could have had corruption on the surviving disk before the first drive failed.  

But if it takes 24 hours to sync, it is clear that the source disk is having a lot of recoverable and potentially unrecoverable errors.   So ODDs are that the surviving mirrored disk has encountered both unrecovered errors and already had filesystem damage before the sync.  

By any chance are these cheap desktop consumer drives? If so, they are unacceptable. You need enterprise class drives because they have more ECC bits so provide 10X more reliable data.

Where do you go from here?  Get another machine with known good motherboard, controller, RAM, etc .. and run diagnostics.
0
 

Author Comment

by:shadowz85
Comment Utility
I don't believe they are cheap consumer drives. The replacement drive was almost $300 for a 250Gb drive. At the time I placed the order, I wasn't sure if I was dealing with a hardware mirror and I know they like the drives to be identical, so I used the part no. for the original drive.
I believe that once you boot off the mirror drive, Windows is no longer mirroring. Can you confirm that? The drives aren't even trying to sync and I noticed that the very first time I was booting from the 2nd drive instead of the 1st one.
0
 
LVL 47

Expert Comment

by:dlethe
Comment Utility
No, windows host-based raid on W2k3 mirrors all writes, and does load balancing on reads very early into the boot process ... then in the few seconds after it boots before the mirroring code kicks in, it syncs up anything that might have changed into the boot process.

The 24-hours is classic indication of read errors on one of the drive, but that is an independent  issue if you have a munged up file system. Decent diagnostics will confirm health of the drives and give you an idea of how many read errors they have had.  

What is make / model of disk?  Just because you paid $300 doesn't mean you got a $300 disk. They aren't making any models of disk drives today that they were making several years ago, so you MUST have gotten an old drive that has been sitting on the shelf degrading.  Disk drives don't have shelf lives like one would think.  They are somewhat like old car batteries.

Anyway, that "new" disk drive is not a new disk. It is an old disk, and I doubt it has any factory warranty remaining.  It could very well be one of the problems you have besides unrecoverable read errors on the other disk, and a slightly munged file system.
0
Free book by J.Peter Bruzzese, Microsoft MVP

Are you using Office 365? Trying to set up email signatures but you’re struggling with transport rules and connectors? Let renowned Microsoft MVP J.Peter Bruzzese show you how in this exclusive e-book on Office 365 email signatures. Better yet, it’s free!

 

Author Comment

by:shadowz85
Comment Utility
Do you have any particular diagnostic tool that you prefer? The drive is either Hitachi or Seagate.
0
 
LVL 47

Expert Comment

by:dlethe
Comment Utility
Both seagate & hitachi have freebies designed specifically for their disk drives. Just go to their website.
0
 

Author Comment

by:shadowz85
Comment Utility
Disk is a Sun disk. Hitachi HDS722525VLSA80 (250GB - 7200 RPM - SATA Disk)
0
 
LVL 47

Expert Comment

by:dlethe
Comment Utility
go to hds.com and look for the disk diagnostics.
0

Featured Post

Too many email signature updates to deal with?

Do you feel like you are taking up all of your time constantly visiting users’ desks to make changes to email signatures? Wish you could manage all signatures from one central location, easily design them and deploy them quickly to users? Well, there is an easy way!

Join & Write a Comment

The environment that this is running in is SCCM 2007 R2 running on a Windows 2008 R2 server. The PXE Distribution point is running on its own Windows 2008 R2 box. This is what Event viewer showed after trying to start the WDS service:  An erro…
Welcome to my series of short tips on migrations. Whilst based on Microsoft migrations the same principles can be applied to any type of migration. My first tip is around source server preparation. No migration is an easy migration, there is a…
Here's a very brief overview of the methods PRTG Network Monitor (https://www.paessler.com/prtg) offers for monitoring bandwidth, to help you decide which methods you´d like to investigate in more detail.  The methods are covered in more detail in o…
In this tutorial you'll learn about bandwidth monitoring with flows and packet sniffing with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're interested in additional methods for monitoring bandwidt…

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now