?
Solved

Server 2008 R2 x64 failed disk in RAID-10 now blue screen/reboot during startup

Posted on 2010-09-07
24
Medium Priority
?
1,997 Views
Last Modified: 2012-05-10
This server is fairly new, about 6 months old.  It's running Windows Server 2008 R2 x64, 8GB RAM and a RAID-10 comprised of 4x250GB SATA drives.  It's also the company's exchange server (2007.)  This weekend, around 7AM Sunday, something happened and a drive failed in the RAID array.  When we tried to boot the machine Sunday evening it wanted to go into repair mode.  Any time we attempt to start Windows in normal mode the machine blue screens with "Directory Services could not start because ... a device attached to the system is not functioning" and reboots.  You only see the blue screen if you disable auto-restart on error from the F8 boot menu.  This happens almost immediately after the mouse cursor appears during the boot-up process.  The computer will not boot in Safe Mode (any option) but will start in Active Directory Restore mode.

I have replaced the failed drive and am currently running the rebuild function of the RAID controller.  That should be done in an hour and a half, give or take.  I am also going to run the Intel PCT from EFI shell once the rebuild is complete to rule out any additional hardware problems.

I am attaching a picture of the blue screen error.

SFC /scannow was run, as was CHKDSK /F /R but did not change the blue screen/reboot situation.

Please let me know if additional details are needed or if you have any suggestions, and thank you in advance for the assistance.
IMG-2636.JPG
0
Comment
Question by:ActiontechKS
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 9
  • 9
  • 2
  • +3
24 Comments
 
LVL 31

Expert Comment

by:Justin Owens
ID: 33619808
Let us know if you can boot after your RAID rebuild finishes.  No sense in chasing rabbits...
0
 
LVL 39

Expert Comment

by:Philip Elder
ID: 33619813
What are the following specs please:
 + server board make/model
 + RAID controller make/model
 + HDD make/model

Philip
0
 
LVL 97

Expert Comment

by:Experienced Member
ID: 33619815
Are you sure something else did not fail? The server should have been able to keep going if only one drive failed in a RAID array. It may be still rebuilding while I post this, and then be OK, so let us know what happened when the rebuild finished. ... Thinkpads_User
0
Get real performance insights from real users

Key features:
- Total Pages Views and Load times
- Top Pages Viewed and Load Times
- Real Time Site Page Build Performance
- Users’ Browser and Platform Performance
- Geographic User Breakdown
- And more

 
LVL 22

Expert Comment

by:Matt V
ID: 33619822
I think you will find it is fine once you rebuild the array.  You should have been able to go into the RAID controller and mark the one drive as defunct which should have made the controller present the single remaining drive as though it were the original mirror set.  
Creating RAID filesystems without hot spares is a bad idea no matter how you look at it.  Something to consider going forward.
0
 

Author Comment

by:ActiontechKS
ID: 33620380
The RAID rebuild is complete and the same problem is still happening.  The motherboard is Intel S5500BC, drives are Seagate Barracude ES.2 250GB SATA firmware SN06.  The RAID controller is the on-board Intel controller, not sure if there's a specific model to give you other than the motherboard.
Am currently running chkdsk /r /f /x now that the rebuild is complete to see if that makes a difference.
@mattvmotas: yes, I'm not exactly a RAID expert but I will definitely be installing a hot spare once this problem is resolved.
0
 
LVL 97

Expert Comment

by:Experienced Member
ID: 33620684
A drive can fail because the drive is a problem, and a drive can fail because a controller fails. What are chances there is a RAID controller issue?  You would probably need a service person for this.

... Thinkpads_User
0
 

Author Comment

by:ActiontechKS
ID: 33620708
I am running the Intel PCT tests from the EFI shell now to determine if any additional hardware has failed, and will replace anything that is bad of course, but it seems more like an O/S issue at this point.  The RAID rebuilt fine and shows 'online' instead of 'degraded' now.  I'm really hoping someone has a way to repair Windows so it will boot.  2003 had the 'repair installation' options but 2008 seems to have done away with that.
0
 
LVL 39

Expert Comment

by:Philip Elder
ID: 33621482
The Intel RAIDWeb Console 2 is able to connect to and manage the on board RAID (Software RAID) setup.

It will tell you what the status of each drive is and should have a log of events related to the RAID array(s).

More than likely the rebuild was done when the mirrored drive was not quite up to date with the primary drive and thus the hiccup/burp in the Windows OS. This will happen, as will a complete freeze, with on board software based RAID solutions. A hardware RAID setup would not have done that.

Do you have a good backup?

Rebuild the boot configuration database:
http://www.ehow.com/how_5472680_rebuild-bcd.html

Philip
0
 

Author Comment

by:ActiontechKS
ID: 33621484
We're going to replace the motherboard and see what happens.  I'll leave this open until the problem is sovled.
0
 
LVL 2

Assisted Solution

by:sibisteanu
sibisteanu earned 1000 total points
ID: 33626357
Is it the only DC in the domain or is it a member server? If it's a member server I would just remove Active Directory from it and then join it to the domain again. It can then replicate AD from another server.

If it's a standalone server when was your last backup? If it is recent you could just selectively restore the C:\Windows\NTDS folder - all you'd lose would be any changes to AD, such as new user accounts, password changes, etc since the backup. Make a copy of the current folder first, of course, just incase it doesn't work.

These would be the quickest options to get it back up and running.
0
 

Author Comment

by:ActiontechKS
ID: 33630256
Update:
Replacing the motherboard did not help and I have ruled out hardware as being a cause.  Microsoft support believes it to be a corrupt Active Directory (they're going to call back later when an 'expert' is available to discuss it.)
This is a member server, and there is another 2008 R2 backup domain controller.  It does not have Exchange.  I cannot run dcpromo in safe mode (or DS retore mode) so removing AD isn't possible.  Is it possible to simply copy the contents of the NTDS folder from the other (still fine) server to this one without screwing up Exchange?
0
 

Author Comment

by:ActiontechKS
ID: 33630281
I forgot to answer the question from MPSec - it is a hardware RAID, not a software RAID so that didn't apply in this case.
0
 
LVL 39

Accepted Solution

by:
Philip Elder earned 1000 total points
ID: 33630367
Any motherboard or server board based RAID, that is a set of hard drives connected directly to the board itself and configured in a RAID 1, 0, 10, or whatever is not a hardware RAID solution.

It is a software based RAID solution. Please see:
http://download.intel.com/support/motherboards/server/sb/d29305014_raid_swg.pdf

Note page 1 Supported Hardware.

A hardware RAID solution is an SRCSASRB or RS2BL040 RAID controller. See:
http://blog.mpecsinc.ca/2010/09/on-board-software-raid-no-more.html

Back to the Q:
On the bad AD box:
DCPromo /forceremoval
See:
http://support.microsoft.com/kb/332199

On the good AD box:
http://technet.microsoft.com/en-us/library/cc816907%28WS.10%29.aspx
Clean up the AD.

Once done, DCPromo the second box back into the AD DS role.

Philip
0
 

Author Comment

by:ActiontechKS
ID: 33630475
Apologies for not knowing my RAID rules.  I thought by software you meant a Windows controlled RAID array.  As I said, I'm no expert in that arena.
For clarification - Exchange will not be affected by the removal of AD from the PDC?  I can do a forceremoval and seize the AD roles to the backup controller, then reinstall AD on the failed AD machine and gracefully transfer those roles back without impacting peoples' mailboxes?
Thanks
0
 
LVL 22

Expert Comment

by:Matt V
ID: 33630695
MPSECSInc -> Every vendor and tech I have spoken to refers to hardware RAID as RAID setup using a hardware controller, and not done using just the OS.
I have never heard that a controller card or on-board controller RAID was not a hardware RAID, and I have been working with RAID for over 10 years with all the major vendors.
0
 
LVL 39

Expert Comment

by:Philip Elder
ID: 33630781
Matt,

Then we shall have to agree to disagree.

Philip
0
 

Author Closing Comment

by:ActiontechKS
ID: 33631195
I had to force the removal of AD, transfer all the FSMO roles via ntdsutil to the backup domain controller, cleanup the metadata, and then reinstall AD.  Now Windows will load in normal mode which solves the main problem. Thanks guys.
0
 
LVL 39

Expert Comment

by:Philip Elder
ID: 33631428
Good stuff.

Keep an eye on the replication logs between the two servers.

Take note of the AD replication GUID in DNS for the two servers. If you see three GUIDs listed in the AD portion of DNS, then the old DC GUID is still there. It should not be, but verify just in case.

Philip
0
 

Author Comment

by:ActiontechKS
ID: 33631511
DNS looks okay.  However, removing AD from the Exchange server did cause a massive failure of Exchange.  All of the permissions that Exchange sets when it's installed were lost and now the Exchange services will not start.  I would caution anyone else who tries this solution to attempt everything else first.
0
 
LVL 39

Expert Comment

by:Philip Elder
ID: 33631667
My bad ... I am working on getting the steps to get things back online for you ASAP.

Philip
0
 
LVL 39

Expert Comment

by:Philip Elder
ID: 33631913
Okay, process 1:

If you have a good Exchange backup, then I would follow this general process:

                [a] format the server – DO NOT REMOVE THE ACCOUNT FROM AD
                [b] metadata cleanup
                [c] reinstall WITH THE SAME NAME
                [d] dcpromo
                [e] reinstall Exchange with /RecoverServer
                [f] create the databases IN THE SAME PLACE WITH THE SAME NAME as the old ones
                [g] stop exchange
                [h] copy the original databases and log files in
                [i] reboot

Philip
0
 

Author Comment

by:ActiontechKS
ID: 33631981
Believe it or not I was able to recover things okay without doing anything too drastic.  I had to install the desktop experience feature (odd I know) and add the server's domain controller account to the Exchange Server groups - Exchange Domain Servers, Exchange Enterprise Servers and Exchange Servers.  Rebooted a couple of times and voila the thing came back up like it should.  So, not as terrible as I thought at first.  Overall I think the service I got here was 100% better than what I got when I called Microsoft support.  A+.
0
 
LVL 39

Expert Comment

by:Philip Elder
ID: 33632110
I am glad to hear that things worked out positively.

Philip
0
 
LVL 39

Expert Comment

by:Philip Elder
ID: 33717764
Just an FYI to Matt from http://realserverhunt.intel.com which is an ongoing Intel Partner knowledge building contest.

Philip

10-09-20-Hardware-versus-Softwar.png
10-09-20-Hardware-versus-Softwar.png
10-09-20-Hardware-versus-Softwar.png
0

Featured Post

Back Up Your Microsoft Windows Server®

Back up all your Microsoft Windows Server – on-premises, in remote locations, in private and hybrid clouds. Your entire Windows Server will be backed up in one easy step with patented, block-level disk imaging. We achieve RTOs (recovery time objectives) as low as 15 seconds.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Here's a look at newsworthy articles and community happenings during the last month.
In the absence of a fully-fledged GPO Management product like AGPM, the script in this article will provide you with a simple way to watch the domain (or a select OU) for GPOs changes and automatically take backups when policies are added, removed o…
This Micro Tutorial hows how you can integrate  Mac OSX to a Windows Active Directory Domain. Apple has made it easy to allow users to bind their macs to a windows domain with relative ease. The following video show how to bind OSX Mavericks to …
This video shows how to use Hyena, from SystemTools Software, to bulk import 100 user accounts from an external text file. View in 1080p for best video quality.
Suggested Courses

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question