• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 2025
  • Last Modified:

Server 2008 R2 x64 failed disk in RAID-10 now blue screen/reboot during startup

This server is fairly new, about 6 months old.  It's running Windows Server 2008 R2 x64, 8GB RAM and a RAID-10 comprised of 4x250GB SATA drives.  It's also the company's exchange server (2007.)  This weekend, around 7AM Sunday, something happened and a drive failed in the RAID array.  When we tried to boot the machine Sunday evening it wanted to go into repair mode.  Any time we attempt to start Windows in normal mode the machine blue screens with "Directory Services could not start because ... a device attached to the system is not functioning" and reboots.  You only see the blue screen if you disable auto-restart on error from the F8 boot menu.  This happens almost immediately after the mouse cursor appears during the boot-up process.  The computer will not boot in Safe Mode (any option) but will start in Active Directory Restore mode.

I have replaced the failed drive and am currently running the rebuild function of the RAID controller.  That should be done in an hour and a half, give or take.  I am also going to run the Intel PCT from EFI shell once the rebuild is complete to rule out any additional hardware problems.

I am attaching a picture of the blue screen error.

SFC /scannow was run, as was CHKDSK /F /R but did not change the blue screen/reboot situation.

Please let me know if additional details are needed or if you have any suggestions, and thank you in advance for the assistance.
IMG-2636.JPG
0
ActiontechKS
Asked:
ActiontechKS
  • 9
  • 9
  • 2
  • +3
2 Solutions
 
Justin OwensITIL Problem ManagerCommented:
Let us know if you can boot after your RAID rebuild finishes.  No sense in chasing rabbits...
0
 
Philip ElderTechnical Architect - HA/Compute/StorageCommented:
What are the following specs please:
 + server board make/model
 + RAID controller make/model
 + HDD make/model

Philip
0
 
John HurstBusiness Consultant (Owner)Commented:
Are you sure something else did not fail? The server should have been able to keep going if only one drive failed in a RAID array. It may be still rebuilding while I post this, and then be OK, so let us know what happened when the rebuild finished. ... Thinkpads_User
0
Problems using Powershell and Active Directory?

Managing Active Directory does not always have to be complicated.  If you are spending more time trying instead of doing, then it's time to look at something else. For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why

 
Matt VCommented:
I think you will find it is fine once you rebuild the array.  You should have been able to go into the RAID controller and mark the one drive as defunct which should have made the controller present the single remaining drive as though it were the original mirror set.  
Creating RAID filesystems without hot spares is a bad idea no matter how you look at it.  Something to consider going forward.
0
 
ActiontechKSAuthor Commented:
The RAID rebuild is complete and the same problem is still happening.  The motherboard is Intel S5500BC, drives are Seagate Barracude ES.2 250GB SATA firmware SN06.  The RAID controller is the on-board Intel controller, not sure if there's a specific model to give you other than the motherboard.
Am currently running chkdsk /r /f /x now that the rebuild is complete to see if that makes a difference.
@mattvmotas: yes, I'm not exactly a RAID expert but I will definitely be installing a hot spare once this problem is resolved.
0
 
John HurstBusiness Consultant (Owner)Commented:
A drive can fail because the drive is a problem, and a drive can fail because a controller fails. What are chances there is a RAID controller issue?  You would probably need a service person for this.

... Thinkpads_User
0
 
ActiontechKSAuthor Commented:
I am running the Intel PCT tests from the EFI shell now to determine if any additional hardware has failed, and will replace anything that is bad of course, but it seems more like an O/S issue at this point.  The RAID rebuilt fine and shows 'online' instead of 'degraded' now.  I'm really hoping someone has a way to repair Windows so it will boot.  2003 had the 'repair installation' options but 2008 seems to have done away with that.
0
 
Philip ElderTechnical Architect - HA/Compute/StorageCommented:
The Intel RAIDWeb Console 2 is able to connect to and manage the on board RAID (Software RAID) setup.

It will tell you what the status of each drive is and should have a log of events related to the RAID array(s).

More than likely the rebuild was done when the mirrored drive was not quite up to date with the primary drive and thus the hiccup/burp in the Windows OS. This will happen, as will a complete freeze, with on board software based RAID solutions. A hardware RAID setup would not have done that.

Do you have a good backup?

Rebuild the boot configuration database:
http://www.ehow.com/how_5472680_rebuild-bcd.html

Philip
0
 
ActiontechKSAuthor Commented:
We're going to replace the motherboard and see what happens.  I'll leave this open until the problem is sovled.
0
 
sibisteanuCommented:
Is it the only DC in the domain or is it a member server? If it's a member server I would just remove Active Directory from it and then join it to the domain again. It can then replicate AD from another server.

If it's a standalone server when was your last backup? If it is recent you could just selectively restore the C:\Windows\NTDS folder - all you'd lose would be any changes to AD, such as new user accounts, password changes, etc since the backup. Make a copy of the current folder first, of course, just incase it doesn't work.

These would be the quickest options to get it back up and running.
0
 
ActiontechKSAuthor Commented:
Update:
Replacing the motherboard did not help and I have ruled out hardware as being a cause.  Microsoft support believes it to be a corrupt Active Directory (they're going to call back later when an 'expert' is available to discuss it.)
This is a member server, and there is another 2008 R2 backup domain controller.  It does not have Exchange.  I cannot run dcpromo in safe mode (or DS retore mode) so removing AD isn't possible.  Is it possible to simply copy the contents of the NTDS folder from the other (still fine) server to this one without screwing up Exchange?
0
 
ActiontechKSAuthor Commented:
I forgot to answer the question from MPSec - it is a hardware RAID, not a software RAID so that didn't apply in this case.
0
 
Philip ElderTechnical Architect - HA/Compute/StorageCommented:
Any motherboard or server board based RAID, that is a set of hard drives connected directly to the board itself and configured in a RAID 1, 0, 10, or whatever is not a hardware RAID solution.

It is a software based RAID solution. Please see:
http://download.intel.com/support/motherboards/server/sb/d29305014_raid_swg.pdf

Note page 1 Supported Hardware.

A hardware RAID solution is an SRCSASRB or RS2BL040 RAID controller. See:
http://blog.mpecsinc.ca/2010/09/on-board-software-raid-no-more.html

Back to the Q:
On the bad AD box:
DCPromo /forceremoval
See:
http://support.microsoft.com/kb/332199

On the good AD box:
http://technet.microsoft.com/en-us/library/cc816907%28WS.10%29.aspx
Clean up the AD.

Once done, DCPromo the second box back into the AD DS role.

Philip
0
 
ActiontechKSAuthor Commented:
Apologies for not knowing my RAID rules.  I thought by software you meant a Windows controlled RAID array.  As I said, I'm no expert in that arena.
For clarification - Exchange will not be affected by the removal of AD from the PDC?  I can do a forceremoval and seize the AD roles to the backup controller, then reinstall AD on the failed AD machine and gracefully transfer those roles back without impacting peoples' mailboxes?
Thanks
0
 
Matt VCommented:
MPSECSInc -> Every vendor and tech I have spoken to refers to hardware RAID as RAID setup using a hardware controller, and not done using just the OS.
I have never heard that a controller card or on-board controller RAID was not a hardware RAID, and I have been working with RAID for over 10 years with all the major vendors.
0
 
Philip ElderTechnical Architect - HA/Compute/StorageCommented:
Matt,

Then we shall have to agree to disagree.

Philip
0
 
ActiontechKSAuthor Commented:
I had to force the removal of AD, transfer all the FSMO roles via ntdsutil to the backup domain controller, cleanup the metadata, and then reinstall AD.  Now Windows will load in normal mode which solves the main problem. Thanks guys.
0
 
Philip ElderTechnical Architect - HA/Compute/StorageCommented:
Good stuff.

Keep an eye on the replication logs between the two servers.

Take note of the AD replication GUID in DNS for the two servers. If you see three GUIDs listed in the AD portion of DNS, then the old DC GUID is still there. It should not be, but verify just in case.

Philip
0
 
ActiontechKSAuthor Commented:
DNS looks okay.  However, removing AD from the Exchange server did cause a massive failure of Exchange.  All of the permissions that Exchange sets when it's installed were lost and now the Exchange services will not start.  I would caution anyone else who tries this solution to attempt everything else first.
0
 
Philip ElderTechnical Architect - HA/Compute/StorageCommented:
My bad ... I am working on getting the steps to get things back online for you ASAP.

Philip
0
 
Philip ElderTechnical Architect - HA/Compute/StorageCommented:
Okay, process 1:

If you have a good Exchange backup, then I would follow this general process:

                [a] format the server – DO NOT REMOVE THE ACCOUNT FROM AD
                [b] metadata cleanup
                [c] reinstall WITH THE SAME NAME
                [d] dcpromo
                [e] reinstall Exchange with /RecoverServer
                [f] create the databases IN THE SAME PLACE WITH THE SAME NAME as the old ones
                [g] stop exchange
                [h] copy the original databases and log files in
                [i] reboot

Philip
0
 
ActiontechKSAuthor Commented:
Believe it or not I was able to recover things okay without doing anything too drastic.  I had to install the desktop experience feature (odd I know) and add the server's domain controller account to the Exchange Server groups - Exchange Domain Servers, Exchange Enterprise Servers and Exchange Servers.  Rebooted a couple of times and voila the thing came back up like it should.  So, not as terrible as I thought at first.  Overall I think the service I got here was 100% better than what I got when I called Microsoft support.  A+.
0
 
Philip ElderTechnical Architect - HA/Compute/StorageCommented:
I am glad to hear that things worked out positively.

Philip
0
 
Philip ElderTechnical Architect - HA/Compute/StorageCommented:
Just an FYI to Matt from http://realserverhunt.intel.com which is an ongoing Intel Partner knowledge building contest.

Philip

10-09-20-Hardware-versus-Softwar.png
10-09-20-Hardware-versus-Softwar.png
10-09-20-Hardware-versus-Softwar.png
0

Featured Post

Simplify Active Directory Administration

Administration of Active Directory does not have to be hard.  Too often what should be a simple task is made more difficult than it needs to be.The solution?  Hyena from SystemTools Software.  With ease-of-use as well as powerful importing and bulk updating capabilities.

  • 9
  • 9
  • 2
  • +3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now