Failure of 2003 dc and 2008 sbs
Posted on 2011-10-08
Deleted my original question yesterday cuz i thought i had this figured out. Seems AD found it's way back onto the sbs 2008 server after a reboot, 4th or so since running a non-authoritative system state restore due to the deletion of some OU's inadvertently. Original setup was 2k3 dc w/exchange (not sbs), we moved to sbs 2008 around the beginning of the year. a 6-8 weeks ago our 2k3 dc died and we intended to fix it and put it back into commission but never had the time. after doing the sys state restore yesterday I'm getting a whole lot of weirdness including what appeared to be a working server to a couple hours later when AD could no longer be opened again and no domain is being found, etc. Here's what i get when i open AD U&C The following domain controller could not be contacted dcname, The specified domain either does not exist or could not be contacted., Sites and services gives an error saying either DNS or replication is configured incorrectly. This is going on right now while I tried to open users and computers the first time it actually opened, tried a second time got the error above, side by side, one window showing all the OU's under users and computer and the other saying it can't find a thing???????
The sbs server has all three roles, both were gc's although it now says none are...
Since yesterday I was able to find a server with matching hardware to the dead 2k3 dc. Here's my three ideas, which one of these should i do or if none, great too but please gimme a fourth idea...
First, thinking about manually removing the old DC using metadata cleanup and seeing if that clears this up. Doing this will negate the next one but this would most likely be my preferred method for obvious reasons.
Second is to swap drives on the server i borrowed to bring up the old 2k3 box and start replication then demote it. This is a production server for another company so i need to be very careful about the raid 5 VD(s) because once the original drives are put back into that box they need to boot right up for the company that needs this server back on Monday. By the way these are HP Proliant ML350 in case anyone cares. I also have both RAID cards so if this method get;s the nod I think I'll swap those before continuing.
Last is to do another restore of the sys state, this time authoritative or to even do this first then do my first choice as well afterwards.
I've not been able to locate anything online where anyone else has witnessed this exact behavior where it seems fine then deteriorates back to dead. I've done all these things in the past several times and over the last few months have done more than ever with server recovery due to a higher than normal volume of failures so I'm comfortable with all the procedures UNLESS there are caveats to SBS 2008 or little known/published extra procedures to ensure success. Willing to suck up any info anyone is willing to dish out.
Thanks much for the help. This place is a saw mill in a small town rural area and this issue has over 100 people out of work at the moment. They get back on Tuesday and I'm going onsite tonight to try some stuff out. If i don't get any response before tonight please still answer because there's a good chance I'll chicken out tonight without solid advice from someone that's been in a similar pickle.