Kerberos/Domain login problems after ADR of 2003 Server - very strange behavior
Posted on 2006-03-29
Here is the history:
A RAID set on a 2003 Server system disk (with Exchange), failed and the system could not be recovered by rebuilding the array. The only option that was available was to use the ADR and restore to a time 2 months earlier (the local admin only does the ADR when he feels that there has been a major change in the environment). The DATA array is O.K, so it does not need to be recovered, and we had an up to date Exchange backup. So, with this being my best available option I restored the server using the ADR method.
The server came back up in a spitting image of itself of a time 2 months previous, which is fine. I performed the Exchange restore, which went off without a hitch. The event logs of the server look clean at this point, other than the fact that it is not connected to the network.
Here is the strange part:
We connect the server back onto the network, it connects to the internet and mail begins flowing in and out of the server again, in general things look good.
Then, we try to bring the workstations back online, and they get errors stating that they cannot find the Domain Controller, and cannot log in. We fiddle a bit, and I find that if you disconnect the LAN cable from the (Win XP SP2) workstations and then log into the domain, then reconnect the cable you can access the server and Exchange.
We are connected, allthough I cannot have the users perform this ritual everytime they have to reboot their workstations. There are a few machines on the network that are still running Windows 98, and they can connect to the domain with no problem.
But wait, there are other issues:
I cannot connect from the server to the workstations, I get various no permission type errors depending on how I attempt to connect, some result in kerberos errors in the servers event log, while others logs erros equate to having two computer accounts with the same name on the domain (which I assure you is not the case). There are no other servers on the network that would be out of synch because the server replicates with no one. DNS does not have multiple entries for the machines.
I can ping the workstations, resolve their names properly and I have confirmed that DHCP and DNS are set up properly on the server. The workstations use the server as their only DNS server, and they can all resolve names without issue. I flushed DNS on the workstations, released their IPs and renewed again and made sure that the time was in sych with the server. On the surface, everything looks fine.
Furthermore, you get all of the problems assosiated with a non working domain authentication issue such as the machines cannot connect to printers shared on other workstations etc.
The suggested solution:
The only solution that I can come up with for this problem is to remove and re-add all of XP workstations Computer accounts from the domain, but there is an issue with that. This is a relitively new istallation (4 months up and running), and there were difficulties coppying the users profiles on the workstations. The users have a software set that takes quite some time to configure, and they have only recently ironed out all of the bugs from the last migration. Taking them down that path again is a situation that has to be avoided at all costs.
Any other suggestions?