Solved

Server 2008 R2 x64 failed disk in RAID-10 now blue screen/reboot during startup

Posted on 2010-09-07
24
1,957 Views
Last Modified: 2012-05-10
This server is fairly new, about 6 months old.  It's running Windows Server 2008 R2 x64, 8GB RAM and a RAID-10 comprised of 4x250GB SATA drives.  It's also the company's exchange server (2007.)  This weekend, around 7AM Sunday, something happened and a drive failed in the RAID array.  When we tried to boot the machine Sunday evening it wanted to go into repair mode.  Any time we attempt to start Windows in normal mode the machine blue screens with "Directory Services could not start because ... a device attached to the system is not functioning" and reboots.  You only see the blue screen if you disable auto-restart on error from the F8 boot menu.  This happens almost immediately after the mouse cursor appears during the boot-up process.  The computer will not boot in Safe Mode (any option) but will start in Active Directory Restore mode.

I have replaced the failed drive and am currently running the rebuild function of the RAID controller.  That should be done in an hour and a half, give or take.  I am also going to run the Intel PCT from EFI shell once the rebuild is complete to rule out any additional hardware problems.

I am attaching a picture of the blue screen error.

SFC /scannow was run, as was CHKDSK /F /R but did not change the blue screen/reboot situation.

Please let me know if additional details are needed or if you have any suggestions, and thank you in advance for the assistance.
IMG-2636.JPG
0
Comment
Question by:ActiontechKS
  • 9
  • 9
  • 2
  • +3
24 Comments
 
LVL 31

Expert Comment

by:DrUltima
ID: 33619808
Let us know if you can boot after your RAID rebuild finishes.  No sense in chasing rabbits...
0
 
LVL 38

Expert Comment

by:Philip Elder
ID: 33619813
What are the following specs please:
 + server board make/model
 + RAID controller make/model
 + HDD make/model

Philip
0
 
LVL 90

Expert Comment

by:John Hurst
ID: 33619815
Are you sure something else did not fail? The server should have been able to keep going if only one drive failed in a RAID array. It may be still rebuilding while I post this, and then be OK, so let us know what happened when the rebuild finished. ... Thinkpads_User
0
 
LVL 22

Expert Comment

by:Matt V
ID: 33619822
I think you will find it is fine once you rebuild the array.  You should have been able to go into the RAID controller and mark the one drive as defunct which should have made the controller present the single remaining drive as though it were the original mirror set.  
Creating RAID filesystems without hot spares is a bad idea no matter how you look at it.  Something to consider going forward.
0
 

Author Comment

by:ActiontechKS
ID: 33620380
The RAID rebuild is complete and the same problem is still happening.  The motherboard is Intel S5500BC, drives are Seagate Barracude ES.2 250GB SATA firmware SN06.  The RAID controller is the on-board Intel controller, not sure if there's a specific model to give you other than the motherboard.
Am currently running chkdsk /r /f /x now that the rebuild is complete to see if that makes a difference.
@mattvmotas: yes, I'm not exactly a RAID expert but I will definitely be installing a hot spare once this problem is resolved.
0
 
LVL 90

Expert Comment

by:John Hurst
ID: 33620684
A drive can fail because the drive is a problem, and a drive can fail because a controller fails. What are chances there is a RAID controller issue?  You would probably need a service person for this.

... Thinkpads_User
0
 

Author Comment

by:ActiontechKS
ID: 33620708
I am running the Intel PCT tests from the EFI shell now to determine if any additional hardware has failed, and will replace anything that is bad of course, but it seems more like an O/S issue at this point.  The RAID rebuilt fine and shows 'online' instead of 'degraded' now.  I'm really hoping someone has a way to repair Windows so it will boot.  2003 had the 'repair installation' options but 2008 seems to have done away with that.
0
 
LVL 38

Expert Comment

by:Philip Elder
ID: 33621482
The Intel RAIDWeb Console 2 is able to connect to and manage the on board RAID (Software RAID) setup.

It will tell you what the status of each drive is and should have a log of events related to the RAID array(s).

More than likely the rebuild was done when the mirrored drive was not quite up to date with the primary drive and thus the hiccup/burp in the Windows OS. This will happen, as will a complete freeze, with on board software based RAID solutions. A hardware RAID setup would not have done that.

Do you have a good backup?

Rebuild the boot configuration database:
http://www.ehow.com/how_5472680_rebuild-bcd.html

Philip
0
 

Author Comment

by:ActiontechKS
ID: 33621484
We're going to replace the motherboard and see what happens.  I'll leave this open until the problem is sovled.
0
 
LVL 2

Assisted Solution

by:sibisteanu
sibisteanu earned 250 total points
ID: 33626357
Is it the only DC in the domain or is it a member server? If it's a member server I would just remove Active Directory from it and then join it to the domain again. It can then replicate AD from another server.

If it's a standalone server when was your last backup? If it is recent you could just selectively restore the C:\Windows\NTDS folder - all you'd lose would be any changes to AD, such as new user accounts, password changes, etc since the backup. Make a copy of the current folder first, of course, just incase it doesn't work.

These would be the quickest options to get it back up and running.
0
 

Author Comment

by:ActiontechKS
ID: 33630256
Update:
Replacing the motherboard did not help and I have ruled out hardware as being a cause.  Microsoft support believes it to be a corrupt Active Directory (they're going to call back later when an 'expert' is available to discuss it.)
This is a member server, and there is another 2008 R2 backup domain controller.  It does not have Exchange.  I cannot run dcpromo in safe mode (or DS retore mode) so removing AD isn't possible.  Is it possible to simply copy the contents of the NTDS folder from the other (still fine) server to this one without screwing up Exchange?
0
 

Author Comment

by:ActiontechKS
ID: 33630281
I forgot to answer the question from MPSec - it is a hardware RAID, not a software RAID so that didn't apply in this case.
0
 
LVL 38

Accepted Solution

by:
Philip Elder earned 250 total points
ID: 33630367
Any motherboard or server board based RAID, that is a set of hard drives connected directly to the board itself and configured in a RAID 1, 0, 10, or whatever is not a hardware RAID solution.

It is a software based RAID solution. Please see:
http://download.intel.com/support/motherboards/server/sb/d29305014_raid_swg.pdf

Note page 1 Supported Hardware.

A hardware RAID solution is an SRCSASRB or RS2BL040 RAID controller. See:
http://blog.mpecsinc.ca/2010/09/on-board-software-raid-no-more.html

Back to the Q:
On the bad AD box:
DCPromo /forceremoval
See:
http://support.microsoft.com/kb/332199

On the good AD box:
http://technet.microsoft.com/en-us/library/cc816907%28WS.10%29.aspx
Clean up the AD.

Once done, DCPromo the second box back into the AD DS role.

Philip
0
 

Author Comment

by:ActiontechKS
ID: 33630475
Apologies for not knowing my RAID rules.  I thought by software you meant a Windows controlled RAID array.  As I said, I'm no expert in that arena.
For clarification - Exchange will not be affected by the removal of AD from the PDC?  I can do a forceremoval and seize the AD roles to the backup controller, then reinstall AD on the failed AD machine and gracefully transfer those roles back without impacting peoples' mailboxes?
Thanks
0
 
LVL 22

Expert Comment

by:Matt V
ID: 33630695
MPSECSInc -> Every vendor and tech I have spoken to refers to hardware RAID as RAID setup using a hardware controller, and not done using just the OS.
I have never heard that a controller card or on-board controller RAID was not a hardware RAID, and I have been working with RAID for over 10 years with all the major vendors.
0
 
LVL 38

Expert Comment

by:Philip Elder
ID: 33630781
Matt,

Then we shall have to agree to disagree.

Philip
0
 

Author Closing Comment

by:ActiontechKS
ID: 33631195
I had to force the removal of AD, transfer all the FSMO roles via ntdsutil to the backup domain controller, cleanup the metadata, and then reinstall AD.  Now Windows will load in normal mode which solves the main problem. Thanks guys.
0
 
LVL 38

Expert Comment

by:Philip Elder
ID: 33631428
Good stuff.

Keep an eye on the replication logs between the two servers.

Take note of the AD replication GUID in DNS for the two servers. If you see three GUIDs listed in the AD portion of DNS, then the old DC GUID is still there. It should not be, but verify just in case.

Philip
0
 

Author Comment

by:ActiontechKS
ID: 33631511
DNS looks okay.  However, removing AD from the Exchange server did cause a massive failure of Exchange.  All of the permissions that Exchange sets when it's installed were lost and now the Exchange services will not start.  I would caution anyone else who tries this solution to attempt everything else first.
0
 
LVL 38

Expert Comment

by:Philip Elder
ID: 33631667
My bad ... I am working on getting the steps to get things back online for you ASAP.

Philip
0
 
LVL 38

Expert Comment

by:Philip Elder
ID: 33631913
Okay, process 1:

If you have a good Exchange backup, then I would follow this general process:

                [a] format the server – DO NOT REMOVE THE ACCOUNT FROM AD
                [b] metadata cleanup
                [c] reinstall WITH THE SAME NAME
                [d] dcpromo
                [e] reinstall Exchange with /RecoverServer
                [f] create the databases IN THE SAME PLACE WITH THE SAME NAME as the old ones
                [g] stop exchange
                [h] copy the original databases and log files in
                [i] reboot

Philip
0
 

Author Comment

by:ActiontechKS
ID: 33631981
Believe it or not I was able to recover things okay without doing anything too drastic.  I had to install the desktop experience feature (odd I know) and add the server's domain controller account to the Exchange Server groups - Exchange Domain Servers, Exchange Enterprise Servers and Exchange Servers.  Rebooted a couple of times and voila the thing came back up like it should.  So, not as terrible as I thought at first.  Overall I think the service I got here was 100% better than what I got when I called Microsoft support.  A+.
0
 
LVL 38

Expert Comment

by:Philip Elder
ID: 33632110
I am glad to hear that things worked out positively.

Philip
0
 
LVL 38

Expert Comment

by:Philip Elder
ID: 33717764
Just an FYI to Matt from http://realserverhunt.intel.com which is an ongoing Intel Partner knowledge building contest.

Philip

10-09-20-Hardware-versus-Softwar.png
10-09-20-Hardware-versus-Softwar.png
10-09-20-Hardware-versus-Softwar.png
0

Join & Write a Comment

Ever notice how you can't use a new drive in Windows without having Windows assigning a Disk Signature?  Ever have a signature collision problem (especially with Virtual Machines?)  This article is intended to help you understand what's going on and…
A procedure for exporting installed hotfix details of remote computers using powershell
This tutorial will give a an overview on how to deploy remote agents in Backup Exec 2012 to new servers. Click on the Backup Exec button in the upper left corner. From here, are global settings for the application such as connecting to a remote Back…
This tutorial will walk an individual through the process of transferring the five major, necessary Active Directory Roles, commonly referred to as the FSMO roles from a Windows Server 2008 domain controller to a Windows Server 2012 domain controlle…

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now