Server 2008 R2 x64 failed disk in RAID-10 now blue screen/reboot during startup
This server is fairly new, about 6 months old. It's running Windows Server 2008 R2 x64, 8GB RAM and a RAID-10 comprised of 4x250GB SATA drives. It's also the company's exchange server (2007.) This weekend, around 7AM Sunday, something happened and a drive failed in the RAID array. When we tried to boot the machine Sunday evening it wanted to go into repair mode. Any time we attempt to start Windows in normal mode the machine blue screens with "Directory Services could not start because ... a device attached to the system is not functioning" and reboots. You only see the blue screen if you disable auto-restart on error from the F8 boot menu. This happens almost immediately after the mouse cursor appears during the boot-up process. The computer will not boot in Safe Mode (any option) but will start in Active Directory Restore mode.
I have replaced the failed drive and am currently running the rebuild function of the RAID controller. That should be done in an hour and a half, give or take. I am also going to run the Intel PCT from EFI shell once the rebuild is complete to rule out any additional hardware problems.
I am attaching a picture of the blue screen error.
SFC /scannow was run, as was CHKDSK /F /R but did not change the blue screen/reboot situation.
Please let me know if additional details are needed or if you have any suggestions, and thank you in advance for the assistance. IMG-2636.JPG
Windows Server 2008Microsoft Legacy OSActive Directory
Last Comment
Philip Elder
8/22/2022 - Mon
Justin Owens
Let us know if you can boot after your RAID rebuild finishes. No sense in chasing rabbits...
Philip Elder
What are the following specs please:
+ server board make/model
+ RAID controller make/model
+ HDD make/model
Philip
John
Are you sure something else did not fail? The server should have been able to keep going if only one drive failed in a RAID array. It may be still rebuilding while I post this, and then be OK, so let us know what happened when the rebuild finished. ... Thinkpads_User
I think you will find it is fine once you rebuild the array. You should have been able to go into the RAID controller and mark the one drive as defunct which should have made the controller present the single remaining drive as though it were the original mirror set.
Creating RAID filesystems without hot spares is a bad idea no matter how you look at it. Something to consider going forward.
ActiontechKS
ASKER
The RAID rebuild is complete and the same problem is still happening. The motherboard is Intel S5500BC, drives are Seagate Barracude ES.2 250GB SATA firmware SN06. The RAID controller is the on-board Intel controller, not sure if there's a specific model to give you other than the motherboard.
Am currently running chkdsk /r /f /x now that the rebuild is complete to see if that makes a difference.
@mattvmotas: yes, I'm not exactly a RAID expert but I will definitely be installing a hot spare once this problem is resolved.
John
A drive can fail because the drive is a problem, and a drive can fail because a controller fails. What are chances there is a RAID controller issue? You would probably need a service person for this.
I am running the Intel PCT tests from the EFI shell now to determine if any additional hardware has failed, and will replace anything that is bad of course, but it seems more like an O/S issue at this point. The RAID rebuilt fine and shows 'online' instead of 'degraded' now. I'm really hoping someone has a way to repair Windows so it will boot. 2003 had the 'repair installation' options but 2008 seems to have done away with that.
Philip Elder
The Intel RAIDWeb Console 2 is able to connect to and manage the on board RAID (Software RAID) setup.
It will tell you what the status of each drive is and should have a log of events related to the RAID array(s).
More than likely the rebuild was done when the mirrored drive was not quite up to date with the primary drive and thus the hiccup/burp in the Windows OS. This will happen, as will a complete freeze, with on board software based RAID solutions. A hardware RAID setup would not have done that.
Update:
Replacing the motherboard did not help and I have ruled out hardware as being a cause. Microsoft support believes it to be a corrupt Active Directory (they're going to call back later when an 'expert' is available to discuss it.)
This is a member server, and there is another 2008 R2 backup domain controller. It does not have Exchange. I cannot run dcpromo in safe mode (or DS retore mode) so removing AD isn't possible. Is it possible to simply copy the contents of the NTDS folder from the other (still fine) server to this one without screwing up Exchange?
ActiontechKS
ASKER
I forgot to answer the question from MPSec - it is a hardware RAID, not a software RAID so that didn't apply in this case.
Apologies for not knowing my RAID rules. I thought by software you meant a Windows controlled RAID array. As I said, I'm no expert in that arena.
For clarification - Exchange will not be affected by the removal of AD from the PDC? I can do a forceremoval and seize the AD roles to the backup controller, then reinstall AD on the failed AD machine and gracefully transfer those roles back without impacting peoples' mailboxes?
Thanks
MPSECSInc -> Every vendor and tech I have spoken to refers to hardware RAID as RAID setup using a hardware controller, and not done using just the OS.
I have never heard that a controller card or on-board controller RAID was not a hardware RAID, and I have been working with RAID for over 10 years with all the major vendors.
Philip Elder
Matt,
Then we shall have to agree to disagree.
Philip
ActiontechKS
ASKER
I had to force the removal of AD, transfer all the FSMO roles via ntdsutil to the backup domain controller, cleanup the metadata, and then reinstall AD. Now Windows will load in normal mode which solves the main problem. Thanks guys.
Keep an eye on the replication logs between the two servers.
Take note of the AD replication GUID in DNS for the two servers. If you see three GUIDs listed in the AD portion of DNS, then the old DC GUID is still there. It should not be, but verify just in case.
Philip
ActiontechKS
ASKER
DNS looks okay. However, removing AD from the Exchange server did cause a massive failure of Exchange. All of the permissions that Exchange sets when it's installed were lost and now the Exchange services will not start. I would caution anyone else who tries this solution to attempt everything else first.
Philip Elder
My bad ... I am working on getting the steps to get things back online for you ASAP.
If you have a good Exchange backup, then I would follow this general process:
[a] format the server – DO NOT REMOVE THE ACCOUNT FROM AD
[b] metadata cleanup
[c] reinstall WITH THE SAME NAME
[d] dcpromo
[e] reinstall Exchange with /RecoverServer
[f] create the databases IN THE SAME PLACE WITH THE SAME NAME as the old ones
[g] stop exchange
[h] copy the original databases and log files in
[i] reboot
Philip
ActiontechKS
ASKER
Believe it or not I was able to recover things okay without doing anything too drastic. I had to install the desktop experience feature (odd I know) and add the server's domain controller account to the Exchange Server groups - Exchange Domain Servers, Exchange Enterprise Servers and Exchange Servers. Rebooted a couple of times and voila the thing came back up like it should. So, not as terrible as I thought at first. Overall I think the service I got here was 100% better than what I got when I called Microsoft support. A+.
Philip Elder
I am glad to hear that things worked out positively.