Link to home
Start Free TrialLog in
Avatar of strafexx
strafexxFlag for Australia

asked on

Weird problems - DHCP slow to respond, domain trusts continually failing

G'day

A client of mine recently had a RAID controller crash which resulted in us having to do a bare metal restore of Windows Small Biz Server 2003 - reinstall a fresh O/S, restore file system, then restore AD (all from the same backup sets). Restore was done from tape via BackupExec 12. Crash was on a Monday afternoon, last backup was from a Friday night.

After getting it up and running at the end of June, I have noticed two major issues.

- DHCP responses seem really slow, many records in event viewer from user workstations saying that they were unable to get an IP from the system, which is causing netlogon/group policy processing errors (as the user is logging on under cached credentials). Most of the time there is only one error in the event viewer, some times two, before an IP is assigned.

- Domain trusts keep failing on various machines, even after removing them from the domain and rejoining, trying the NETDOM RESET commands, etc. When joining new computers to the domain via the ConnectComputer utility in SBS, the domain trust fails straight away after reboot. I rejoined a system yesterday, and today the domain trust has failed, with a new previously unused computer name.

I run a Kixtart script at logon to do a few things, and in the event log of some affected systems this results with:
SOURCE:Kixtart, Event ID 1789: GetPrimaryGroup failed Error : The trust relationship between this workstation and the primary domain failed. (0x6fd/1789).
SOURCE: Userenv, Event ID: 1053: Windows cannot determine the user or computer name. (The RPC server is unavailable. ). Group Policy processing aborted.
SOURCE: AutoEnrollment, Event ID: 15: Automatic certificate enrollment for local system failed to contact the active directory (0x8007003a).  The specified server cannot perform the requested operation.
  Enrollment will not be performed.

Occasionally we get a:
SOURCE: Kerberos, Event ID 7: he kerberos subsystem encountered a PAC verification failure.  This indicates that the PAC from the client XXX15$ in realm XXX.LOCAL had a PAC which failed to verify or was modified.

The server itself isn't reporting any errors like these.

I've come across many Userenv 1053's before  and fixed them, however this feels different from previous encounters. There were no registry changes, group policy changes, network changes, or any other changes other than the restore of the filesystem/AD/system state to the Friday night backup.

Any ideas or clues?

DCDiag and Netdiag all return clean, I can provide verbose outputs of these if required..
Avatar of Jeffrey Kane - TechSoEasy
Jeffrey Kane - TechSoEasy
Flag of United States of America image

Something isn't right with the way it was restored.  What procedure was used?

Posting the DCDIAG and NETDIAG would help.  Also, can you please post complete IPCONFIG /all from both the server and a workstation?

Jeff
TechSoEasy
Also, review this KB article:  http://support.microsoft.com/kb/883268

Jeff
TechSoEasy
Avatar of strafexx

ASKER

IPCONFIG from server,, in the code snippet. As well as dcdiag /v and netdiag /v as attached files.

I can't get on a client machine at the moment - away from office and everyone is using them. Will try do it in about 40 mins during lunch break.

In the logs - 192.168.5.1 is a Billion ADSL modem, 192.168.5.4 is the SBS server in question

Will go through the MS KB now

Thanks
Windows IP Configuration
 
   Host Name . . . . . . . . . . . . : superx3650
   Primary Dns Suffix  . . . . . . . : xxx.local
   Node Type . . . . . . . . . . . . : Unknown
   IP Routing Enabled. . . . . . . . : Yes
   WINS Proxy Enabled. . . . . . . . : Yes
   DNS Suffix Search List. . . . . . : xxx.local
 
Ethernet adapter Server Local Area Connection:
 
   Connection-specific DNS Suffix  . :
   Description . . . . . . . . . . . : Broadcom BCM5708C NetXtreme II GigE (NDIS
 VBD Client) #2
   Physical Address. . . . . . . . . : 00-1A-64-D3-E1-3A
   DHCP Enabled. . . . . . . . . . . : No
   IP Address. . . . . . . . . . . . : 192.168.5.4
   Subnet Mask . . . . . . . . . . . : 255.255.255.0
   Default Gateway . . . . . . . . . : 192.168.5.1
   DNS Servers . . . . . . . . . . . : 192.168.5.4
                                       192.168.5.1
   Primary WINS Server . . . . . . . : 192.168.5.4
 
PPP adapter RAS Server (Dial In) Interface:
 
   Connection-specific DNS Suffix  . :
   Description . . . . . . . . . . . : WAN (PPP/SLIP) Interface
   Physical Address. . . . . . . . . : 00-53-45-00-00-00
   DHCP Enabled. . . . . . . . . . . : No
   IP Address. . . . . . . . . . . . : 192.168.5.71
   Subnet Mask . . . . . . . . . . . : 255.255.255.255
   Default Gateway . . . . . . . . . :
   NetBIOS over Tcpip. . . . . . . . : Disabled

Open in new window

dcdiag.txt
netdiag.txt
I checked the SC QUERY results, they are all 20 WIN32_SHARE_PROCESS, except for ProtectedStorage which returned slightly different results:

C:\Tools>sc query protectedstorage

SERVICE_NAME: protectedstorage
        TYPE               : 120  WIN32_SHARE_PROCESS  (interactive)
        STATE              : 4  RUNNING
                                (STOPPABLE, NOT_PAUSABLE, IGNORES_SHUTDOWN))
        WIN32_EXIT_CODE    : 0  (0x0)
        SERVICE_EXIT_CODE  : 0  (0x0)
        CHECKPOINT         : 0x0
        WAIT_HINT          : 0x0
ASKER CERTIFIED SOLUTION
Avatar of Rob Williams
Rob Williams
Flag of Canada image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Rob,

In the event that the SBS box there would be DNS resolution - web browsing etc would be unavailable by hostname. The router is the secondary DNS.

The SBS box is providing all DHCP services, there are no DHCP relays or other devices listening anywhere on the network.
>>"In the event that the SBS box there would be DNS resolution - web browsing etc would be unavailable by hostname."
Correct. Sorry that is the way it needs to be. If not you will have name resolution problems, slow logons, and unable to register errors as you have in your NetDiag output. The only solution to that is to add a second domain controller.

Make sure DHCP on the SBS also does not hand out the router's IP, as a DNS server.
RobWill,

Have adjusted the DNS on both server and DHCP to have no reference of the router other than as a gateway.

But that would not cause domain trust issues like I'm experiencing?
>>"But that would not cause domain trust issues like I'm experiencing?"
No I am doubtful, but DNS can have some "weird" side effects. It should clean up a few of the NetDiag errors though.
Sorry I don't see any other glaring issues, perhaps someone else will.

Often with server<=>PC trust issues, the resolution is to remove PC's from the domain and rejoin them, but depending how many machines that can be very time consuming.
Actually, I would believe that the incorrect DNS Server IP would cause these problems.  Your NetDIAG shows a bunch of DNS ERRORS for 192.168.5.1, which indicates that IP address was configured within the SBS's DNS Lookup Zones as a valid name server.  So its much more than just having your NIC pointing to an additional DNS Server... your actual lookup zones were doing the same.  I'd suspect they still are.

Generally, the best way to clean up DNS on an SBS is to delete both the forward and reverse lookup zones.  Then rerun the Configure Email and Internet Connection Wizard (CEICW -- which is linked as Connect to the Internet in the Server Management Console > To-Do List) to recreate them.

For future reference, if you want to pull an IPCONFIG /ALL from a workstation remotely, you can use PSEXEC
(http://technet.microsoft.com/en-us/sysinternals/bb897553.aspx)

Also, if fixing up the DNS Zones doesn't resolve the problem, please post the results from running gpresult /v on a workstation.

Lastly... if you rejoin workstations, be sure to follow the steps I've outlined here:  http://sbsurl.com/rejoin

Jeff
TechSoEasy
One other thought... when you restored the server, did you reapply all service packs that had previously been installed?

Jeff
TechSoEasy
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Jeff,

When I restored - to be honest I cannot remember (75 hr working week due to server outage), if all the service packs/patches were applied prior to restoring. I did restore the file system first, then AD, as outlined by Symantec.

I re-ran the netdiag /v and it didn't show the errors relating to the router.
If you didn't reinstall service packs and patches then you aren't running the same OS... Get those installed!

Jeff
TechSoEasy
Jeff,

According to Windows Update I'm fully patched - minus a few IE7 and recent patches from after the restore date. These will be applied tonight and the rig rebooted.

One would think that restoring the file system and system state would put things back together again? Nevertheless I'm moving to an image based protection system..

I will monitor the system for the next few days and report back with any recurrences

Thanks
Have rebooted and whilst at it ran a CHKDSK /F which cleared up a few things. I re-ran the CEICW and it broke HTTP over RPC/Outlook over Web, which I have fixed.

Now , the server itself gets userenv 1030 and 1058 errors, cant browse to \\domain.local\ on the server, but it resolves in an nslookup correctly.

:-(
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Rob's correct about 1030 and 1058 being incorrectly configured DNS.   I'm wondering, though, how the CEICW managed to break RPC over HTTPS and what you did to fix it... because it shouldn't break it unless you had a misconfiguration to begin with.

Jeff
TechSoEasy
Rob, the last URL you posted was exactly the problem.. with the SMB signing. We turned the applicable entry off as we have a Mac OSX 10.4 client who could not connect to the file shares otherwise. Those settings were changed quite awhile ago though and I've not encountered that error previously.

Jeff, the CEICW changed the access granted/deny permissions only to the local IP's 127.0.0.1 and 192.168.5.4 on pretty much the whole Default Website in IIS, however that may have been because of one of the options I chose when re-running it relating to the default website.

You guys have been brilliant - today I find out if the clients have any issues as the work was done after hours last night. Should have some news in about 3-4 hours.

Thanks so much!
Sounds promising. Let us know how you make out.
--Rob
Thanks guys - have split points up, Rob for the actual solution and subsequent userenv issues, and Jeff for some extra info which I will retain as a KB

Cheers
Steve
Thanks strafexx.
Cheers !
--Rob
FYI, you shouldn't need to worry about what the CEICW changes regarding access/deny settings on the default web site.  It will set them correctly.  

Jeff
TechSoEasy