This morning I discovered that our Windows 2003 Server (and Active Directory controller) had a failed drive. There are 5 36.8 GB drives configured in a RAID-5. I have a replacement drive coming in the next four hours.
Windows boots until the login screen, then it sits there at a gray screen and no users can login on their workstations or access the Exchange server. Even though 1-drive failing should not affect the server except to slow it down, it seems to and I'm concerned.
The HP tech recommend I replace the drive while the machine is on. Then wait for it to rebuild (~9 hours) and then see if it starts working.
This doesn't make sense to me. It makes more sense to me to hard shutdown the server, Replace the drive and then boot and let the rebuild process go on while the Windows services don't try to function off the failed drive.
It also makes me think that the I will get my AD controller functioning sooner this way.
This morning I restarted the server remotely with the ILO controller with a hard reset (not knowing a drive failed). In the first few minutes of the boot process the AD and file sharing services were working, then stopped and then I couldn't log in via the login window.
So, my guess is that the drive is reporting partial failure to the array controller and continuing trying to function, but in reality it is totally not working and the system would function better if the drive was out of the equation.
Following my logic or am I nuts and ignorant? :)
P.S. In another worst case scenario, another drive has failed and it isn't being shown on the drive lights. I can't access the Insight Diagnostics remotely.