Avatar of ActiontechKS
ActiontechKS
 asked on

Server 2008 R2 x64 failed disk in RAID-10 now blue screen/reboot during startup

This server is fairly new, about 6 months old.  It's running Windows Server 2008 R2 x64, 8GB RAM and a RAID-10 comprised of 4x250GB SATA drives.  It's also the company's exchange server (2007.)  This weekend, around 7AM Sunday, something happened and a drive failed in the RAID array.  When we tried to boot the machine Sunday evening it wanted to go into repair mode.  Any time we attempt to start Windows in normal mode the machine blue screens with "Directory Services could not start because ... a device attached to the system is not functioning" and reboots.  You only see the blue screen if you disable auto-restart on error from the F8 boot menu.  This happens almost immediately after the mouse cursor appears during the boot-up process.  The computer will not boot in Safe Mode (any option) but will start in Active Directory Restore mode.

I have replaced the failed drive and am currently running the rebuild function of the RAID controller.  That should be done in an hour and a half, give or take.  I am also going to run the Intel PCT from EFI shell once the rebuild is complete to rule out any additional hardware problems.

I am attaching a picture of the blue screen error.

SFC /scannow was run, as was CHKDSK /F /R but did not change the blue screen/reboot situation.

Please let me know if additional details are needed or if you have any suggestions, and thank you in advance for the assistance.
IMG-2636.JPG
Windows Server 2008Microsoft Legacy OSActive Directory

Avatar of undefined
Last Comment
Philip Elder

8/22/2022 - Mon
Justin Owens

Let us know if you can boot after your RAID rebuild finishes.  No sense in chasing rabbits...
Philip Elder

What are the following specs please:
 + server board make/model
 + RAID controller make/model
 + HDD make/model

Philip
John

Are you sure something else did not fail? The server should have been able to keep going if only one drive failed in a RAID array. It may be still rebuilding while I post this, and then be OK, so let us know what happened when the rebuild finished. ... Thinkpads_User
Your help has saved me hundreds of hours of internet surfing.
fblack61
Matt V

I think you will find it is fine once you rebuild the array.  You should have been able to go into the RAID controller and mark the one drive as defunct which should have made the controller present the single remaining drive as though it were the original mirror set.  
Creating RAID filesystems without hot spares is a bad idea no matter how you look at it.  Something to consider going forward.
ActiontechKS

ASKER
The RAID rebuild is complete and the same problem is still happening.  The motherboard is Intel S5500BC, drives are Seagate Barracude ES.2 250GB SATA firmware SN06.  The RAID controller is the on-board Intel controller, not sure if there's a specific model to give you other than the motherboard.
Am currently running chkdsk /r /f /x now that the rebuild is complete to see if that makes a difference.
@mattvmotas: yes, I'm not exactly a RAID expert but I will definitely be installing a hot spare once this problem is resolved.
John

A drive can fail because the drive is a problem, and a drive can fail because a controller fails. What are chances there is a RAID controller issue?  You would probably need a service person for this.

... Thinkpads_User
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
ActiontechKS

ASKER
I am running the Intel PCT tests from the EFI shell now to determine if any additional hardware has failed, and will replace anything that is bad of course, but it seems more like an O/S issue at this point.  The RAID rebuilt fine and shows 'online' instead of 'degraded' now.  I'm really hoping someone has a way to repair Windows so it will boot.  2003 had the 'repair installation' options but 2008 seems to have done away with that.
Philip Elder

The Intel RAIDWeb Console 2 is able to connect to and manage the on board RAID (Software RAID) setup.

It will tell you what the status of each drive is and should have a log of events related to the RAID array(s).

More than likely the rebuild was done when the mirrored drive was not quite up to date with the primary drive and thus the hiccup/burp in the Windows OS. This will happen, as will a complete freeze, with on board software based RAID solutions. A hardware RAID setup would not have done that.

Do you have a good backup?

Rebuild the boot configuration database:
http://www.ehow.com/how_5472680_rebuild-bcd.html

Philip
ActiontechKS

ASKER
We're going to replace the motherboard and see what happens.  I'll leave this open until the problem is sovled.
Experts Exchange has (a) saved my job multiple times, (b) saved me hours, days, and even weeks of work, and often (c) makes me look like a superhero! This place is MAGIC!
Walt Forbes
SOLUTION
sibisteanu

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
ActiontechKS

ASKER
Update:
Replacing the motherboard did not help and I have ruled out hardware as being a cause.  Microsoft support believes it to be a corrupt Active Directory (they're going to call back later when an 'expert' is available to discuss it.)
This is a member server, and there is another 2008 R2 backup domain controller.  It does not have Exchange.  I cannot run dcpromo in safe mode (or DS retore mode) so removing AD isn't possible.  Is it possible to simply copy the contents of the NTDS folder from the other (still fine) server to this one without screwing up Exchange?
ActiontechKS

ASKER
I forgot to answer the question from MPSec - it is a hardware RAID, not a software RAID so that didn't apply in this case.
ASKER CERTIFIED SOLUTION
Philip Elder

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
ActiontechKS

ASKER
Apologies for not knowing my RAID rules.  I thought by software you meant a Windows controlled RAID array.  As I said, I'm no expert in that arena.
For clarification - Exchange will not be affected by the removal of AD from the PDC?  I can do a forceremoval and seize the AD roles to the backup controller, then reinstall AD on the failed AD machine and gracefully transfer those roles back without impacting peoples' mailboxes?
Thanks
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
Matt V

MPSECSInc -> Every vendor and tech I have spoken to refers to hardware RAID as RAID setup using a hardware controller, and not done using just the OS.
I have never heard that a controller card or on-board controller RAID was not a hardware RAID, and I have been working with RAID for over 10 years with all the major vendors.
Philip Elder

Matt,

Then we shall have to agree to disagree.

Philip
ActiontechKS

ASKER
I had to force the removal of AD, transfer all the FSMO roles via ntdsutil to the backup domain controller, cleanup the metadata, and then reinstall AD.  Now Windows will load in normal mode which solves the main problem. Thanks guys.
All of life is about relationships, and EE has made a viirtual community a real community. It lifts everyone's boat
William Peck
Philip Elder

Good stuff.

Keep an eye on the replication logs between the two servers.

Take note of the AD replication GUID in DNS for the two servers. If you see three GUIDs listed in the AD portion of DNS, then the old DC GUID is still there. It should not be, but verify just in case.

Philip
ActiontechKS

ASKER
DNS looks okay.  However, removing AD from the Exchange server did cause a massive failure of Exchange.  All of the permissions that Exchange sets when it's installed were lost and now the Exchange services will not start.  I would caution anyone else who tries this solution to attempt everything else first.
Philip Elder

My bad ... I am working on getting the steps to get things back online for you ASAP.

Philip
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
Philip Elder

Okay, process 1:

If you have a good Exchange backup, then I would follow this general process:

                [a] format the server – DO NOT REMOVE THE ACCOUNT FROM AD
                [b] metadata cleanup
                [c] reinstall WITH THE SAME NAME
                [d] dcpromo
                [e] reinstall Exchange with /RecoverServer
                [f] create the databases IN THE SAME PLACE WITH THE SAME NAME as the old ones
                [g] stop exchange
                [h] copy the original databases and log files in
                [i] reboot

Philip
ActiontechKS

ASKER
Believe it or not I was able to recover things okay without doing anything too drastic.  I had to install the desktop experience feature (odd I know) and add the server's domain controller account to the Exchange Server groups - Exchange Domain Servers, Exchange Enterprise Servers and Exchange Servers.  Rebooted a couple of times and voila the thing came back up like it should.  So, not as terrible as I thought at first.  Overall I think the service I got here was 100% better than what I got when I called Microsoft support.  A+.
Philip Elder

I am glad to hear that things worked out positively.

Philip
Experts Exchange is like having an extremely knowledgeable team sitting and waiting for your call. Couldn't do my job half as well as I do without it!
James Murphy
Philip Elder

Just an FYI to Matt from http://realserverhunt.intel.com which is an ongoing Intel Partner knowledge building contest.

Philip

10-09-20-Hardware-versus-Softwar.png
10-09-20-Hardware-versus-Softwar.png
10-09-20-Hardware-versus-Softwar.png