Solved

Failed redundancy, drive or controller?

Posted on 2013-01-13
7
176 Views
Last Modified: 2013-12-29
sorry for the long post, figure too much info better than too little

Server 2003, software mirroring, 2 SATA drives, Sunfire x2100

History
Drive 0 gave rare intermittent errors over a year and finally failed BSOD
Boot sector was still good and could boot the mirrored drive
------------
Position 0                         Position 1
Old HDD, BSOD                Old Good drive, failed mirror
Good boot sector             No boot sector



Bought new drive, moved mirror to 1st position and made new drive the mirror
(still booting off floppy for 2 weeks)
-------------
Position 0                         Position 1
Old Good Drive                New Drive, good mirror
No boot sector                 No boot sector


Fixed boot sector on new drive and made it Drive 0 and ran like that for a week, without a working mirror
---------------------
Position 0                       Position 1
New Drive                      Old Good Drive, no mirror
Good boot sector           No boot sector


Finally fixed boot sector on old working drive, made it Drive 1 and reestablished the mirror.
------------------
Position 0                        Position 1
New Drive                       Old Good drive, good mirror for one day
Good boot sector            Good boot sector

Within 24 hours of syncing to 100%, the new drive started reporting errors until it detached itself

Now when I boot the system, I have to boot to the old working drive on Drive 1 and Disk Manager shows failed redundancy on both and the exclamation on Drive 0. When I right click the dynamic volume, it says that the drive status is active and working.
----------------
Position 0                      Position 1
New drive, BSOD           Old good drive, failed mirror
Good boot sector          Good boot sector          


Not sure what is failing here. Originally when the first drive failed and we were booting from the boot sector of the failed drive, but running from the mirror, that worked for a couple of months until we got the new drive... so sounds like working 2nd controller and working 2nd drive
When 2nd drive was in running for two weeks with new drive in 2nd position, there were no errors, so sounds like 2 good controllers and 2 good HDDs
So now that I have a new drive that won't boot, what is suspect, the controller or the new drive?
0
Comment
Question by:shadowz85
  • 4
  • 3
7 Comments
 
LVL 47

Accepted Solution

by:
dlethe earned 500 total points
ID: 38773161
No way to eliminate anything, because it is entirely possible that you have data corruption that crept in when you weren't running in a mirrored state after first HDD failed.

Or you could have had corruption on the surviving disk before the first drive failed.  

But if it takes 24 hours to sync, it is clear that the source disk is having a lot of recoverable and potentially unrecoverable errors.   So ODDs are that the surviving mirrored disk has encountered both unrecovered errors and already had filesystem damage before the sync.  

By any chance are these cheap desktop consumer drives? If so, they are unacceptable. You need enterprise class drives because they have more ECC bits so provide 10X more reliable data.

Where do you go from here?  Get another machine with known good motherboard, controller, RAM, etc .. and run diagnostics.
0
 

Author Comment

by:shadowz85
ID: 38773365
I don't believe they are cheap consumer drives. The replacement drive was almost $300 for a 250Gb drive. At the time I placed the order, I wasn't sure if I was dealing with a hardware mirror and I know they like the drives to be identical, so I used the part no. for the original drive.
I believe that once you boot off the mirror drive, Windows is no longer mirroring. Can you confirm that? The drives aren't even trying to sync and I noticed that the very first time I was booting from the 2nd drive instead of the 1st one.
0
 
LVL 47

Expert Comment

by:dlethe
ID: 38773445
No, windows host-based raid on W2k3 mirrors all writes, and does load balancing on reads very early into the boot process ... then in the few seconds after it boots before the mirroring code kicks in, it syncs up anything that might have changed into the boot process.

The 24-hours is classic indication of read errors on one of the drive, but that is an independent  issue if you have a munged up file system. Decent diagnostics will confirm health of the drives and give you an idea of how many read errors they have had.  

What is make / model of disk?  Just because you paid $300 doesn't mean you got a $300 disk. They aren't making any models of disk drives today that they were making several years ago, so you MUST have gotten an old drive that has been sitting on the shelf degrading.  Disk drives don't have shelf lives like one would think.  They are somewhat like old car batteries.

Anyway, that "new" disk drive is not a new disk. It is an old disk, and I doubt it has any factory warranty remaining.  It could very well be one of the problems you have besides unrecoverable read errors on the other disk, and a slightly munged file system.
0
Simplifying Server Workload Migrations

This use case outlines the migration challenges that organizations face and how the Acronis AnyData Engine supports physical-to-physical (P2P), physical-to-virtual (P2V), virtual to physical (V2P), and cross-virtual (V2V) migration scenarios to address these challenges.

 

Author Comment

by:shadowz85
ID: 38776957
Do you have any particular diagnostic tool that you prefer? The drive is either Hitachi or Seagate.
0
 
LVL 47

Expert Comment

by:dlethe
ID: 38776962
Both seagate & hitachi have freebies designed specifically for their disk drives. Just go to their website.
0
 

Author Comment

by:shadowz85
ID: 38778509
Disk is a Sun disk. Hitachi HDS722525VLSA80 (250GB - 7200 RPM - SATA Disk)
0
 
LVL 47

Expert Comment

by:dlethe
ID: 38778537
go to hds.com and look for the disk diagnostics.
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The question has been asked on multiple occasions as to how best to do printing in a remote desktop or terminal services environment.   It seems that this particular question has plagued several people and most especially as Terminal Services, as…
Issue: One Windows 2008 R2 64bit server on the network unable to connect to a buffalo Device (Linkstation) with firmware version 1.56. There are a total of four servers on the network this being one of them. Troubleshooting Steps: Connect via h…
Microsoft Active Directory, the widely used IT infrastructure, is known for its high risk of credential theft. The best way to test your Active Directory’s vulnerabilities to pass-the-ticket, pass-the-hash, privilege escalation, and malware attacks …
Although Jacob Bernoulli (1654-1705) has been credited as the creator of "Binomial Distribution Table", Gottfried Leibniz (1646-1716) did his dissertation on the subject in 1666; Leibniz you may recall is the co-inventor of "Calculus" and beat Isaac…

860 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question