I raised domain functional level to WS2k3, now domain is offline.  DCDIAG on primary DC reports no GC's or a DC for my domain

chsoriano
chsoriano used Ask the Experts™
on
I raised the functional level of my Windows Server 2003 domain environment today.  It is a small domain, with 2 domain controllers.  After raising the domain functional level, I raised the forest fuctional level.  Within an hour, I could no longer connect to any of my servers through terminal services.

After logging on locally, I discovered additional problems.  ADUC (on either DC) would not open without errors.  I could however, still open the ADUC console, and select "Connect to Domain Controller" which would bring up a dialog asking if I would like to manage the domain with that server.  I select yes, and it opens ADUC after that.

However, now, I cannot join workstations or servers to the domain.  I receive an error that says "A domain controller for the domain could not be contacted."  Also, when I run DCDIAG on my primary DC, I get errors stating that it cannot contact a GC or domain controller.  I've attached this output to the question as a txt file.

I can still login to the servers locally, and workstations so far seem to be unaffected.  Does anyone have any advice to get the domain back up and running?

dcdiag-report.txt
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®

Commented:
This looks like a DNS issue,  make sure both servers are pointed to your internal DNS server with AD content on it.

Also have you checked the Catalog Server status on your servers?
- Administrator Tools -> AD Sites and Services
- Open up one of the servers, right click on NTDS Settings and properties
- Check the "Global Catalog" server check mark and make sure it is checked

Commented:
Are you able to supply the whole DCDiag output?

Author

Commented:
I opened up DNS management, and I don't remember seeing two forward lookup zones.  I have an "OTRC" and an "otrc.tamu.edu".  The second zone shows records for all of the machines, but the first ("OTRC") has just the SOA, two NS, and one host.

It seems like for some reason, it created a new domain or something.  I noticed that after I logged into one of the servers, the username was now username.domain.000
dns.jpg
Ensure you’re charging the right price for your IT

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

Commented:
That record is going to interfere with all most of your AD functions! You should try and get rid of that record (Create a copy of the DNS file first), that should probably fix most of your problems.

Author

Commented:
vahiid, the primary DC is also the DNS server.  Also, DC1 is designated as the GC, but not DC2.

Psy053, the txt file I attached is all that dcdiag output.  Is there something else you would like me to provide?

Author

Commented:
vahiid: do you mean to just delete the "OTRC" forward lookup zone?  Should I do this on both domain controllers?  Does it require a restart?

Commented:
Yes, before deleting, right click on it and in the properties, make sure the "Type" is not Active Directory Integrated, if it is, click on Change, and uncheck. This is required for you to be able to backup the file
Then backup your dns files under %windir%\system32\dns folder and then delete
This way if things didn't work out you can just restore the DNS file.

Commented:
It does not require a restart, maybe just a reset of your DNS cache.

Author

Commented:
Okay, I've done that... but I don't think it changed anything.  I still see the following screenshot when I attempt to open ADUC.  DCDIAG is still reporting that all GC's are down.

Domain Controller Diagnosis
 
Performing initial setup:
   Done gathering initial info.
 
Doing initial required tests
   
   Testing server: OTRC\OTRC-DC1
      Starting test: Connectivity
         ......................... OTRC-DC1 passed test Connectivity
 
Doing primary tests
   
   Testing server: OTRC\OTRC-DC1
      Starting test: Replications
         [Replications Check,OTRC-DC1] A recent replication attempt failed:
            From OTRC-DC2 to OTRC-DC1
            Naming Context: DC=ForestDnsZones,DC=otrc,DC=tamu,DC=edu
            The replication generated an error (1753):
            There are no more endpoints available from the endpoint mapper.
            The failure occurred at 2009-10-22 19:03:45.
            The last success occurred at 2009-10-22 18:59:50.
            1 failures have occurred since the last success.
            The directory on OTRC-DC2 is in the process.
            of starting up or shutting down, and is not available.
            Verify machine is not hung during boot.
         [Replications Check,OTRC-DC1] A recent replication attempt failed:
            From OTRC-DC2 to OTRC-DC1
            Naming Context: CN=Schema,CN=Configuration,DC=otrc,DC=tamu,DC=edu
            The replication generated an error (1753):
            There are no more endpoints available from the endpoint mapper.
            The failure occurred at 2009-10-22 19:03:45.
            The last success occurred at 2009-10-22 18:59:50.
            1 failures have occurred since the last success.
            The directory on OTRC-DC2 is in the process.
            of starting up or shutting down, and is not available.
            Verify machine is not hung during boot.
         ......................... OTRC-DC1 passed test Replications
      Starting test: NCSecDesc
         ......................... OTRC-DC1 passed test NCSecDesc
      Starting test: NetLogons
         Unable to connect to the NETLOGON share! (\\OTRC-DC1\netlogon)
         [OTRC-DC1] An net use or LsaPolicy operation failed with error 1203, No network provider accepted the given network path..
         ......................... OTRC-DC1 failed test NetLogons
      Starting test: Advertising
         Fatal Error:DsGetDcName (OTRC-DC1) call failed, error 1355
         The Locator could not find the server.
         ......................... OTRC-DC1 failed test Advertising
      Starting test: KnowsOfRoleHolders
         ......................... OTRC-DC1 passed test KnowsOfRoleHolders
      Starting test: RidManager
         ......................... OTRC-DC1 passed test RidManager
      Starting test: MachineAccount
         ......................... OTRC-DC1 passed test MachineAccount
      Starting test: Services
            IsmServ Service is stopped on [OTRC-DC1]
         ......................... OTRC-DC1 failed test Services
      Starting test: ObjectsReplicated
         ......................... OTRC-DC1 passed test ObjectsReplicated
      Starting test: frssysvol
         ......................... OTRC-DC1 passed test frssysvol
      Starting test: frsevent
         There are warning or error events within the last 24 hours after the
 
         SYSVOL has been shared.  Failing SYSVOL replication problems may cause
 
         Group Policy problems. 
         ......................... OTRC-DC1 failed test frsevent
      Starting test: kccevent
         ......................... OTRC-DC1 passed test kccevent
      Starting test: systemlog
         An Error Event occured.  EventID: 0xC0002715
            Time Generated: 10/22/2009   19:03:33
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0x0000168F
            Time Generated: 10/22/2009   19:03:45
            Event String: The dynamic deletion of the DNS record
 
         An Error Event occured.  EventID: 0x0000168F
            Time Generated: 10/22/2009   19:03:46
            Event String: The dynamic deletion of the DNS record
 
         An Error Event occured.  EventID: 0x0000168F
            Time Generated: 10/22/2009   19:03:46
            Event String: The dynamic deletion of the DNS record
 
         ......................... OTRC-DC1 failed test systemlog
      Starting test: VerifyReferences
         ......................... OTRC-DC1 passed test VerifyReferences
   
   Running partition tests on : ForestDnsZones
      Starting test: CrossRefValidation
         ......................... ForestDnsZones passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... ForestDnsZones passed test CheckSDRefDom
   
   Running partition tests on : DomainDnsZones
      Starting test: CrossRefValidation
         ......................... DomainDnsZones passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... DomainDnsZones passed test CheckSDRefDom
   
   Running partition tests on : Schema
      Starting test: CrossRefValidation
         ......................... Schema passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... Schema passed test CheckSDRefDom
   
   Running partition tests on : Configuration
      Starting test: CrossRefValidation
         ......................... Configuration passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... Configuration passed test CheckSDRefDom
   
   Running partition tests on : otrc
      Starting test: CrossRefValidation
         ......................... otrc passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... otrc passed test CheckSDRefDom
   
   Running enterprise tests on : otrc.tamu.edu
      Starting test: Intersite
         ......................... otrc.tamu.edu passed test Intersite
      Starting test: FsmoCheck
         Warning: DcGetDcName(GC_SERVER_REQUIRED) call failed, error 1355
         A Global Catalog Server could not be located - All GC's are down.
         Warning: DcGetDcName(TIME_SERVER) call failed, error 1355
         A Time Server could not be located.
         The server holding the PDC role is down.
         Warning: DcGetDcName(GOOD_TIME_SERVER_PREFERRED) call failed, error 1355
         A Good Time Server could not be located.
         Warning: DcGetDcName(KDC_REQUIRED) call failed, error 1355
         A KDC could not be located - All the KDCs are down.
         ......................... otrc.tamu.edu failed test FsmoCheck

Open in new window

aduc.jpg

Commented:
I know this seem like a cop out, but have you tried rebooting the DC's?

Commented:
Also, can you please have a look in the event logs and report back any errors in there.

Author

Commented:
Yes, I restarted both DC's earlier this evening, and am now rebooting them again.  I will look at eventvwr as soon as they come back online.  What areas are you interested in?  And what's the best way to post them here?  

Commented:
Rather that post up the whole Event, could you just post up the Source: and EventID: for any Errors or Warnings since rebooting the DC.

Author

Commented:
DNS Errors:
DNS - 4004
DNS - 4015
DNS - 4521

System Errors:
LsaSrv - 40960
DCOM - 10005

Application Errors:
Userenv - 1030
Userenv - 1006
Userenv - 1054

Author

Commented:
I've also noticed something else... I pinged "otrc" from the workstation I've been working on (it hasn't been restarted since I started having problems) and it returned the IP of our web server... it's always been like that, I'm not sure why.  Anyway, on a different machine, one that has been rebooted, the same ping command would return the IP of our domain controller DC1.

I flushed the dns cache on my local machine, and now "ping OTRC" resolves to the IP of DC2.  If I say ping "otrc.tamu.edu" which is the actual domain, it resolves to the IP of our web server again (which is how it was before problems began)

Commented:
In the remaining DNS Zone, are you able to confirm that there is still an entry similar to:

(same as parent folder)    Name Server (NS)   OTRC-DC1.otrc.edu.edu

Author

Commented:
Sorry, so it wasn't a DNS issue, well, if it was I think deleting that other zone might have helped... it was a problem with SYSVOL.

SYSVOL did not exist on the servers.  I think it's because I performed an unauthoritative restore on both domain controllers, while replication wasn't working correctly.  After running an authoritative restore and linking dc2 to dc1, it seems to be working.

Not sure this actually had anything to do with raising the functional level of the domain then... :\

Commented:
Glad you got it working.
Just in case, here is the solution provided by Microsoft support:

1.  Ran Net share command on both the domain controllers and found that SYSVOL and Netlogon server Shares are missing

2.  Performed Following steps and Performed Authoritative Restore of FRS on OTRC-DC1 and Non- Authoritative Restore of FRS on OTRC-DC2

3.  Stopped the File Replication Service on both the Domain Controllers.

4.  Modified the Following Registry key on OTRC-DC1:

KEY_LOCAL_MACHINE\System\CurrentControlSet\Services\NtFrs\Parameters\Cumulative Replica Sets<GUIDof the Replica Set>

set BurFlag value to D4

5.  Started File Replication service and waited for event id 13516.

6.  Modified the Following Registry key on OTRC-DC2

KEY_LOCAL_MACHINE\System\CurrentControlSet\Services\NtFrs\Parameters\Cumulative Replica Sets<GUID of the Replica Set>

set BurFlag value to D2

7.  Started File Replication service and waited for event id 13516.

8.  Ran DCDIAG on both the server and found that InterSite Messaging service was Disabled.

9.  We Started InterSite Messaging service on both the server.


After  that Users are able to login to the Domain, machines detect and can succesfully join the domain.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial