Solved

Exchange problem, DC problems, GPO fails with error.. the system cannot find the file specified

Posted on 2011-03-12
129
1,658 Views
Last Modified: 2012-08-14
Our Exchange Server recently went bananas. Exchange Event, Information Store, MTA Stacts and System Attendant SERVICES fails to start. While diagnosing fault, I found that the Domain Controllers also had problems from Replication to DNS. We have 3 DCs. The PDC and the Exchange Server are both GCs. Now, no GC can be found. I ran dcdiag on the 3 servers. The PDC and the Exchange servers both failed Netlogons and frsevent. The PDC also failed kccevent and systemlog.
Please note that running netdiag shows that the DNS is ok on the PDC and the Exchange Server. But I find that the 3rd DC fails DNS test using nslookup.
I also found that SYSVOL was empty on the PDC. I attemped using GPMC to backup and restore the GPO which was given error.. The system cannot find the file specified. To my surprise, the backup using GPMC failed with the same error.. The system cannot find the file specified.
How do I resolve this? Do I find gpofix.exe and use it? Which of the DCs should I run it.
Urgent help needed. Thanks.
0
Comment
Question by:uspiv
  • 64
  • 40
  • 25
129 Comments
 
LVL 74

Expert Comment

by:Glen Knight
Comment Utility
First thing to do is to configure all the servers to use a single DNS server, and all use the same on.  Once you have done this,  restart the NETLOGON service on all.

If your exchange servers are also domain controllers they will only ever use themselves.

Once you have made the above changes, run DCDIAG on all DC's and post the results.
0
 

Author Comment

by:uspiv
Comment Utility
Since the PDC runs as DNS server and the Exchange also runs as DNS Server, I should then remove the DNS server only on the 3rd DC. Then I'll have 2 DNS Servers, the PDC and the exchange server. Is this ok?
0
 
LVL 74

Assisted Solution

by:Glen Knight
Glen Knight earned 100 total points
Comment Utility
No, I am not suggesting you uninstall DNS.

What I need you to do is configure all 3 severs to point to only one DnS server and make sure it's the same one on all 3.

It's a troubleshooting step.
0
 

Author Comment

by:uspiv
Comment Utility
dcdiag on 3rd server - looks ok:
Domain Controller Diagnosis

Performing initial setup:
   Done gathering initial info.

Doing initial required tests

   Testing server: Default-First-Site-Name\XXCS004
      Starting test: Connectivity
         ......................... XXCS004 passed test Connectivity

Doing primary tests

   Testing server: Default-First-Site-Name\XXCS004
      Starting test: Replications
         ......................... XXCS004 passed test Replications
      Starting test: NCSecDesc
         ......................... XXCS004 passed test NCSecDesc
      Starting test: NetLogons
         ......................... XXCS004 passed test NetLogons
      Starting test: Advertising
         ......................... XXCS004 passed test Advertising
      Starting test: KnowsOfRoleHolders
         ......................... XXCS004 passed test KnowsOfRoleHolders
      Starting test: RidManager
         ......................... XXCS004 passed test RidManager
      Starting test: MachineAccount
         ......................... XXCS004 passed test MachineAccount
      Starting test: Services
         ......................... XXCS004 passed test Services
      Starting test: ObjectsReplicated
         ......................... XXCS004 passed test ObjectsReplicated
      Starting test: frssysvol
         ......................... XXCS004 passed test frssysvol
      Starting test: frsevent
         ......................... XXCS004 passed test frsevent
      Starting test: kccevent
         ......................... XXCS004 passed test kccevent
      Starting test: systemlog
         ......................... XXCS004 passed test systemlog
      Starting test: VerifyReferences
         ......................... XXCS004 passed test VerifyReferences

   Running partition tests on : Schema
      Starting test: CrossRefValidation
         ......................... Schema passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... Schema passed test CheckSDRefDom

   Running partition tests on : Configuration
      Starting test: CrossRefValidation
         ......................... Configuration passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... Configuration passed test CheckSDRefDom

   Running partition tests on : XXC
      Starting test: CrossRefValidation
         ......................... XXC passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... XXC passed test CheckSDRefDom

   Running enterprise tests on : XXC.com
      Starting test: Intersite
         ......................... XXC.com passed test Intersite
      Starting test: FsmoCheck
         ......................... XXC.com passed test FsmoCheck
0
 

Author Comment

by:uspiv
Comment Utility
dcdiag on the PDC - errors still occur.

Domain Controller Diagnosis

Performing initial setup:
   Done gathering initial info.

Doing initial required tests
   
   Testing server: Default-First-Site-Name\XXCS001
      Starting test: Connectivity
         ......................... XXCS001 passed test Connectivity

Doing primary tests
   
   Testing server: Default-First-Site-Name\XXCS001
      Starting test: Replications
         ......................... XXCS001 passed test Replications
      Starting test: NCSecDesc
         ......................... XXCS001 passed test NCSecDesc
      Starting test: NetLogons
         Unable to connect to the NETLOGON share! (\\XXCS001\netlogon)
         [XXCS001] An net use or LsaPolicy operation failed with error 1203, No network provider accepted the given network path..
         ......................... XXCS001 failed test NetLogons
      Starting test: Advertising
         ......................... XXCS001 passed test Advertising
      Starting test: KnowsOfRoleHolders
         ......................... XXCS001 passed test KnowsOfRoleHolders
      Starting test: RidManager
         ......................... XXCS001 passed test RidManager
      Starting test: MachineAccount
         ......................... XXCS001 passed test MachineAccount
      Starting test: Services
         ......................... XXCS001 passed test Services
      Starting test: ObjectsReplicated
         ......................... XXCS001 passed test ObjectsReplicated
      Starting test: frssysvol
         ......................... XXCS001 passed test frssysvol
      Starting test: frsevent
         There are warning or error events within the last 24 hours after the         SYSVOL has been shared.  Failing SYSVOL replication problems may cause         Group Policy problems.
         ......................... XXCS001 failed test frsevent
      Starting test: kccevent
         ......................... XXCS001 passed test kccevent
      Starting test: systemlog
         An Error Event occured.  EventID: 0x0000164A
            Time Generated: 03/12/2011   20:44:15
            Event String: The Netlogon service could not create server         ......................... XXCS001 failed test systemlog
      Starting test: VerifyReferences
         ......................... XXCS001 passed test VerifyReferences
   
   Running partition tests on : Schema
      Starting test: CrossRefValidation
         ......................... Schema passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... Schema passed test CheckSDRefDom
   
   Running partition tests on : Configuration
      Starting test: CrossRefValidation
         ......................... Configuration passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... Configuration passed test CheckSDRefDom
   
   Running partition tests on : XXC
      Starting test: CrossRefValidation
         ......................... XXC passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... XXC passed test CheckSDRefDom
   
   Running enterprise tests on : XXC.com
      Starting test: Intersite
         ......................... XXC.com passed test Intersite
      Starting test: FsmoCheck
         ......................... XXC.com passed test FsmoCheck
0
 

Author Comment

by:uspiv
Comment Utility
dcdiag on Exchange Server :


Domain Controller Diagnosis

Performing initial setup:
   Done gathering initial info.

Doing initial required tests
   
   Testing server: Default-First-Site-Name\XXCS009
      Starting test: Connectivity
         ......................... XXCS009 passed test Connectivity

Doing primary tests
   
   Testing server: Default-First-Site-Name\XXCS009
      Starting test: Replications
         ......................... XXCS009 passed test Replications
      Starting test: NCSecDesc
         ......................... XXCS009 passed test NCSecDesc
      Starting test: NetLogons
         Unable to connect to the NETLOGON share! (\\XXCS009\netlogon)
         [XXCS009] An net use or LsaPolicy operation failed with error 1203, No network provider accepted the given network path..
         ......................... XXCS009 failed test NetLogons
      Starting test: Advertising
         ......................... XXCS009 passed test Advertising
      Starting test: KnowsOfRoleHolders
         ......................... XXCS009 passed test KnowsOfRoleHolders
      Starting test: RidManager
         ......................... XXCS009 passed test RidManager
      Starting test: MachineAccount
         ......................... XXCS009 passed test MachineAccount
      Starting test: Services
         ......................... XXCS009 passed test Services
      Starting test: ObjectsReplicated
         ......................... XXCS009 passed test ObjectsReplicated
      Starting test: frssysvol
         ......................... XXCS009 passed test frssysvol
      Starting test: frsevent
         There are warning or error events within the last 24 hours after the

         SYSVOL has been shared.  Failing SYSVOL replication problems may cause

         Group Policy problems.
         ......................... XXCS009 failed test frsevent
      Starting test: kccevent
         ......................... XXCS009 passed test kccevent
      Starting test: systemlog
         An Error Event occured.  EventID: 0x0000164A
            Time Generated: 03/12/2011   20:43:36
            Event String: The Netlogon service could not create server

         ......................... XXCS009 failed test systemlog
      Starting test: VerifyReferences
         ......................... XXCS009 passed test VerifyReferences
   
   Running partition tests on : Schema
      Starting test: CrossRefValidation
         ......................... Schema passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... Schema passed test CheckSDRefDom
   
   Running partition tests on : Configuration
      Starting test: CrossRefValidation
         ......................... Configuration passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... Configuration passed test CheckSDRefDom
   
   Running partition tests on : XXC
      Starting test: CrossRefValidation
         ......................... XXC passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... XXC passed test CheckSDRefDom
   
   Running enterprise tests on : XXC.com
      Starting test: Intersite
         ......................... XXC.com passed test Intersite
      Starting test: FsmoCheck
         ......................... XXC.com passed test FsmoCheck
0
 
LVL 74

Expert Comment

by:Glen Knight
Comment Utility
Can you restart them now you have made the DNS change.  Then run DCDIAG again.  Don't need to post them if there is no change.
0
 
LVL 74

Expert Comment

by:Glen Knight
Comment Utility
Only restart the 2 showing errors.

Are they all using the one with no errors for DNS?
0
 

Author Comment

by:uspiv
Comment Utility
No. They are all using the Exchange Server DNS.
Before I restart, please note that I physically checked for the Netlogon folder on the 2 DCs with the error and could not find it. But it is present on the 3rd server.
i.e. \\XXS004\netlogon
Also, the SYSVOL folder looks faulty on the 2 DCs with the errors.
0
 

Author Comment

by:uspiv
Comment Utility
On the 3rd server, \\XXS004\sysvol\XXC.COM shows two folders -
Policies
Scripts

On the PDC, \\XXS001 shows one folder -
DO_NOT_REMOVE_NTFrs_PreInstall_Directory
(nothing inside the folder)

On the Exchange Server, \\XXS009\sysvol\XXC.COM shows two folders - DO_NOT_REMOVE_NTFrs_PreInstall_Directory
NtFrs_PreExisting__See_EventLog - Inside this 2nd folder, we then have the Policies and scrips folders
0
 
LVL 74

Expert Comment

by:Glen Knight
Comment Utility
Please configurevthem all to use the same DNS and I would suggest this would be the one that's working
0
 

Author Comment

by:uspiv
Comment Utility
Had to leave the Exchange Server as DNS (for now). The 3rd DC is not reliable.

Restarted the PDC, then the Exchange Server. When the PDC restarted, several errors came up, maybe because the Exchange Server (DNS) was off. So I restarted the PDC after the Exchange Server booted.
When the Exchange server came up, the first good news is that my outlook connected to the Exchange was immediately restored! The Exchange services are back up!
I will have to keep the other test (Replication, GPO, etc) for tomorrow.
Thanks man for your suggestions.
0
 
LVL 4

Expert Comment

by:pbrane
Comment Utility
Hi,

The first thing you should do is look at your Active Directory Disaster recovery plan. If you don’t have one, I can absolutely guarantee that the first steps of most companies are, try to find what caused the error and prevent the spread of the issue.

From what I have read you have empty Sysvol folders and full sysvol folders. Do not perform any knee jerk fixes at this point you are on delicate ground. How do we know if you fix replication all that will happen is replication will replicate the empty sysvol to the rest.

You need to run the following command on the all three domain controllers until we get a grasp on the issue.

repadmin /options DCNAME +DISABLE_INBOUND_REPL and repadmin /options DCNAME +DISABLE_OUTBOUND_REPL

Now we have stopped the spread of the issue let’s take stock of the situation.

Do you have any sysstate backups?

Find your latest one, also before we end up doing a restore, look at the contents of all of your Sysvol folders on all three Domain controllers, and see if you can tell which one has a full set of policies and scripts. I'm not sure how complicated your domain is or how good your documentation is so let me know if this isn't possible. Try and use the event logs on all three servers as well to try and ascertain which server is still functioning if any.  

Now there are a couple of choices for you from here to pick the domain back up again.

First and foremost you need to try and find what caused the fault. There is no point in putting a fix in place until you know you’re fault has gone.
Go through the basics above as some of the guys have already said.
1.      Check every domain controller and make sure the primary DNS server on the network card is itself, if it is a DNS server and the secondary DNS server is another DNS server within the Domain. If it is not a DNS server itself make sure both primary and secondary dns servers are pointing to different domain DNS servers.
2.      Make sure each Domain Controller can ping all the other domain controllers by NetBIOS name and FQDN.
3.      Run Dcdiag and Netdiag tests again on all three domain controllers.

Now, is there a domain controller that passes all tests? Like the third DC for example, as this sounds like your only functioning DC from what you have said?

If we have a good DC then we have a good, simple way to recover this called the Burflag method.  This will allow the two DC’s that are down to reset themselves and copy the contents of DC3 to themselves and reinstate themselves as domain controllers.

From what you have said, I’m pretty sure you’ll get away with this, if this is not possible them we might be going down the authorative restore route if you have a working sysstate backup.

Let me know how it goes.  
0
 
LVL 74

Expert Comment

by:Glen Knight
Comment Utility
pbrane,

It looks to me like a simple DNS issue, let's not jump to too any conclusions until we know if that's the case or not.

Thanks
demazter
0
 
LVL 4

Expert Comment

by:pbrane
Comment Utility
Hi Demazter,

i disagree that this is the only issue. A DNS issue would not cause a Sysvol folder to become empty. Replication would simply stop and errors would be logged in the directory services event log if there was just an issue with DNS.

Stopping replication is a simple safety measure to prevent the spread of any undiscovered issue until the fault has been found. At which point if it is something simple replication can be resumed again. These are Microsoft best practices and are commonly practiced amongst disaster recovery specialists.

By the time we have finished pondering whether it's something important or not, replicating could have magnified the issue.



0
 

Author Comment

by:uspiv
Comment Utility
I haven't done any other thing yet. but before rebooting the PDC and the Exchange server, I changed the virtual memories on the two DCs to higher values.
After the restarts, though the Exchange Server fire up the Exchange services that previously refused to start, unfortunately, running dcdiag on the Exchange Server showed same errors, i.e. Netlogon & frsevent.
I believe that I still need to clear the SYSVOL issue. The DNS did not take care of that.
0
 
LVL 74

Expert Comment

by:Glen Knight
Comment Utility
DNS issue can stop a sysvol folder to not open/mount/share.  It can also prevent DFS from functioning which is how the SYSVOL folder replicates.

Stopping replication will not help to resolve this issue.

Next step to try resolution is to stop the File Replication Service on all 3 servers, then on the working DC (no.3) set the burflags value to D4 and on the other 2 set the value to D2.  Then start the File Replication Service.

See here for burflags registry keys, values and explanation: http://support.microsoft.com/kb/290762
0
 

Author Comment

by:uspiv
Comment Utility
I have already disabled Replication using the Repadmin /options DCNAME +DISABLE_etc command. Do I need to undo this before stopping File Replication Service?
0
 
LVL 74

Expert Comment

by:Glen Knight
Comment Utility
Yes, if you stop this then the replication will stop and the FRS service will not replicate between the DC's
0
 

Author Comment

by:uspiv
Comment Utility
I undid the Repadmin disable I did early.

I stopped FRS service on the 3 DCs.
I ran System State Backup on all 3 DCs.
I changed the DNS on the DCs so we have the primary DNS as itself and the secondary DNS as another DC. The 3 DCs all run as DNS Servers.
I ran dcdiag /test:DNS on all 3 DCs and found that the PDC is failing Forwarders /Root hints (Forw) test. The other 2 DCs passed.
Please note that the PDC and the Exchange Server are both seen as GCs.
0
 
LVL 4

Expert Comment

by:pbrane
Comment Utility
DONT DO THIS THIS WAY.

I'll write the reason why in a minute. This is a quick message to try and pervent Y
you from the risk of wiping your Sysvol clean . the instructions above are wrong.
0
 
LVL 4

Expert Comment

by:pbrane
Comment Utility
Hi Demazter,

I'm going to be a bit blunt with you now. A DNS issue has nothing to do with sharing permissions or availability. Yes it can stop replication as I said above, but it will not cause the SYSVOL to "empty" itself.

Stopping replication will prevent the spread of any further issues. As I also said above, how do you know the empty SYSVOL folder won't replicate out to the rest and empty them all?

I don't want to blow my own trumpet but this is part of what I do for a living. A bit less diving in, and service restarting etc before you have even understood the issue correctly is called for.

Now when you have quite finished trying to highjack the Burflag method, I'd rather we went through it personally with Uspiv rather than pasting a link in, which tends to happen too much on this site these days.

Upiz, Are you comfortable with which server has a good working version of the Sysvol and policies folder?

When you are let me know, and we'll go through the Burflag method and make sure we get it right.  The above Summary is not the correct way to perform the Burflag method, please do not follow these instructions. You do not EVER perform a simultaneous Authorative and nonauthorative restore at the same time.

If we have a working replica member we only need to perform a nonauthorative D2 restore on the down members.
0
 
LVL 4

Expert Comment

by:pbrane
Comment Utility
When you say you ran a system state backup on all three DC's do you mean you have actually run a Systate backup or you have implemented the Burflag method?

0
 

Author Comment

by:uspiv
Comment Utility
I backed up the System State data as a disaster recovery exercise.
I did the Burflag D2 sometime last week on the PDC and I think on the Exchange.
Right now, don't you think we should first fix the DNS issue on the PDC before continuing?
0
 
LVL 74

Expert Comment

by:Glen Knight
Comment Utility
Hijacking the burflag method? Are you kidding? I Am not interested in what you do for a day job.  I've seen this issue time and time again in my own day job.

String the burflag to D4 on the server that works will cause it to become authoritative.  You are right the D2 shouldn't be set, that was a mistake and that shouldn't have been there and for this I apologise.

Secondly a DNS issue WILL cause the SYSVOL and/or NETLOGON share (seeing as they are one and the same) to stop replicating, and I have also seen this hundreds of times.  The simple fact that the DFS and ADDS rely on DNS means if it's not working then 3 DC's will not function.   Period! Which is why I asked for the DNS to be changed to a single server to eliminate this.  Once DNS is proven to be working and the SYSVOL is still not functioning then we are in a better position to ascertain what the cause is.

The link gives exact instructions, it's not difficult to follow, if it was I would have provided further instructions or I would probably have my own article on how to do it.

We don't do flame wars here and I'd appreciate it as a Zone Advisor if you didn't try to start one.  I am not interested in points, you will see from my profile that I don't actually need them.  However entering a thread with a "cover all" post which as far as I am concerned is an attempt to hijack the thread is plain rude.

It's Fairly obvious there are a number of issue here, and clearly one of them is a DNS issue otherwise changing the DNS configuration woukdnt have resulted in Exchange starting to work again would it?
0
 

Author Comment

by:uspiv
Comment Utility
I guess we were a little distracted and it almost created some confusion here. I have to continue with demazter and hope that we'll get to the end of this issues.

From what I can tell, some of the DCs issues were inherited. In fact there was a time I had to seize the Schema Master because this was set to a DC that was re-built without following the proper steps.

The PDC was the only reliable DC and GC until the Exchange Server was installed. The 3rd DC had virus issues and was not reliable. I am only surprised that it is the only DC that seem to pass the dcdiag test.

Let's get back to the drill once more and see how we can solve this problem. I thank you guys for your help so far but I think that working with demazter may be a better to go at this moment.

NOW... back on course....

On the PDC, when I set the primary DNS to itself and the secondary to the Exchange server, nslookup fails for external domains like yahoo.com or google.com. Setting the DNS to point to only the Exchange server, the nslookup can resolve the external domains.
What is wrong with the DNS of the PDC.  dcdiag /test:dns shows that the Forwarders /Root hints (Forw) test is failing. Can we fix this first?
0
 
LVL 4

Expert Comment

by:pbrane
Comment Utility
Hi,
 I'm not interested in the points either, it's a paid for account. It's not very often I come on here, but when I saw this I thought I'd step in as everyone seemed to be missing the obvious problem it I didn't want the domain to get written off by missing something simple. I answer every now and again when I have five minutes free and I think I can help. Not for points.
I'm not interested in an argument with you, I wasn't intending to be rude so I apologise if you found me to be. I was trying to prevent him from making a mistake that could have potentially cost him his job. I'd certainly loose mine if I made a mistake like that and wiped someone’s domain out for one of our customers.
Plus this advice is still wrong. Why are you trying to perform an authorative restore? There is a working replica on the domain. If the DC that's down uses the D2 flag it will automatically replicate it's self from an upstream partner. The only other thing to watch out for is to make sure that the any other DC's that also have a corrupted sysvol folder are off line as well so the restoring DC only replicates with a known working DC. Then one at a time you can bring the DC’s back up as a D2 restore.
The DC doesn’t need to be marked as authorative unless there has been a replica set rebuild.
I'm not trying to start a war; my only interest is making sure the correct information is passed on.
I didn’t mean you were hijacking the thread, I meant you just seemed to continue with the Burflag method without any real understanding of whether we were in the right place or not for it, by just issuing a link whereas I think it’s nice to have it explained, which I was going to. It’s easy to misinterpret.

To answer the points you made:
Sysvol and Netlogon are not the same share, they point to different locations.
You actually stated the following: “DNS issue can stop a sysvol folder to not open/mount/share”
To which I answered, DNS will not stop you opening this folder. It will however stop replication, you didn’t say that above though. The folder will still be able to open and share with DNS or not.

0
 

Author Comment

by:uspiv
Comment Utility
On the dnsmgt console, I notive that only the PDC has Cached Lookups. But this seem to be empty. The .(root) folder only has net folder (empty) and a file (localhost) with 127.0.0.1 as the Host(A) data .
Looks like the Cache was either deleted or reset.
0
 
LVL 4

Expert Comment

by:pbrane
Comment Utility
Ok no problem. Well, I'll check in tomorrow to see how you're doing.

Good luck.

Please read my above points though. regarding the difference between an authorative and nonauthorative restore. At least you have a systate backup now as a backup, but theres no point in making this any harder than it needs to be.
0
 

Author Comment

by:uspiv
Comment Utility
Thank pbrane. I believe that we'll get this problem resolved. Thanks for reminding me about disaster recovery.
0
 
LVL 74

Expert Comment

by:Glen Knight
Comment Utility
>>>Sysvol and Netlogon are not the same share, they point to different locations.

The NETLOGON share is the scripts folder within the SYSVOL

>>The folder will still be able to open and share with DNS or not.

Without a working DNS this is unlikely because names resolution will not work.

>>>we were in the right place or not

Interesting point, I refer you to your very first post were you posed a coverall post.  Whereas my first post was diagnostic in an attempt to actually find out what the problem might be.

I'm not going to post in this thread and will allow you to continue with the failed SYSVOl replication unless the author specifically wants my input.  Incidentally, exchange has no requirement at all on the SYSVOL yet this was one of the issues.

But all 3 of the problems in the original question can be put down to a DNS issue.
0
 

Author Comment

by:uspiv
Comment Utility
demazter, please can you back-track to my last 3 posts and offer your suggestions on way forward with resolving the DNS issue on the PDC. Thanks.
0
 
LVL 4

Expert Comment

by:pbrane
Comment Utility
LOL, you are argumentative.

As I said above, I'm not arguing with you.

he wants you to continue, he has said as much already.
0
 
LVL 74

Expert Comment

by:Glen Knight
Comment Utility
OK, let's look at the DNS issue first off.

And its probably best to start from the begining since i have now lost track.

Can you provide the names of the servers, which ones have DNS installed and where they are currently pointing to for DNS.  On the servers that have DNS installed can you check the properties of the forward lookup zone for your internal domain, select properties and check it says Active Directory Integrated and what types of updates are required.

If you can post in the format of:

SERVERNAME
DNS INSTALLED AND AD INTEGRATED
CONFIGURED TO USE ITSELF AND SERVER2
SECURE AND INSECURE UPDATES ALLOWED (we will change this later but for now it should be set as this)

When you open the DNS console make sure the server listed is the one you are currently attached to.  And on the properties of the server under the forwarder tab what do you have listed here?

Can you also run:

NETDOM QUERY FSMO

And IPCONFIG /ALL from each server and post the results.

What i am trying to determin is if all the DNS servers are using an AD integrated zone and therefore should be using he same configuration or if they have been configured separately
0
 

Author Comment

by:uspiv
Comment Utility
SERVERNAME

SERVERNAME                                                           XXS001
DNS INSTALLED AND AD INTEGRATED                    YES
CONFIGURED TO USE ITSELF AND SERVER2           NO - to use XXS009
SECURE AND INSECURE UPDATES ALLOWED          SECURE ONLY

SERVERNAME                                                           XXS009
DNS INSTALLED AND AD INTEGRATED                    YES
CONFIGURED TO USE ITSELF AND SERVER2           YES
SECURE AND INSECURE UPDATES ALLOWED          SECURE ONLY

SERVERNAME                                                           XXS004
DNS INSTALLED AND AD INTEGRATED                    YES
CONFIGURED TO USE ITSELF AND SERVER2           YES
SECURE AND INSECURE UPDATES ALLOWED          SECURE ONLY


Netdom query fsmo is same for all 3 DCs

Schema owner               xxcs001.xxC.COM
Domain role owner         xxcs009.xxC.COM
PDC role                         xxcs001.xxC.COM
RID pool manager           xxcs001.xxC.COM
Infrastructure owner     xxcs001.xxC.COM
Schema owner              xxcs001.xxC.COM

The command completed successfully.

IPCONFIG /ALL

DC1 - PDC

   Host Name . . . . . . . . . . . . : XXcs001

   Primary Dns Suffix  . .  . . . : XXC.COM

   Node Type . . . . . . . . . . . . : Hybrid

   IP Routing Enabled. . . .. . . : No

   WINS Proxy Enabled. . . . . : No

   DNS Suffix Search List. . . : XXC.COM

Ethernet adapter XXCLAN:

   Connection-specific DNS Suffix  . :

   Description . . . . . . . . . . . : HP NC7761 Gigabit Server Adapter

   Physical Address. . . . . . : 00-16-35-38-15-FA

   DHCP Enabled. . . . . . . . . : No

   IP Address. . . . . . . .. . . . : 200.200.200.66

   Subnet Mask . . . . . . . . . . : 255.0.0.0

   Default Gateway . . .  . . . : 200.200.200.250

   DNS Servers . . . . . . . . . . : 200.200.200.61

   Primary WINS Server . . . . : 200.200.200.66

   Secondary WINS Server .  : 200.200.200.61

DC2 - Exchange Server

   Host Name . . . . . . . . . . . . : XXcs009

   Primary Dns Suffix  . . . . . : XXC.COM

   Node Type . . . . . . . . . . . . : Hybrid

   IP Routing Enabled. . .  . . . : No

   WINS Proxy Enabled. . . . . : No

   DNS Suffix Search List. . . : XXC.COM

Ethernet adapter LAN:

   Connection-specific DNS Suffix  . :

   Description . . . . . . . . . . . : HP NC373i Multifunction Gigabit Server Adapter #2

   Physical Address. . . . .  . : 00-21-5A-5D-95-78

   DHCP Enabled. . . . . . . .. . : No

   IP Address. . . . . . . . . . . . : 200.200.200.61

   Subnet Mask . . . . . .  . . . . : 255.0.0.0

   Default Gateway . . . .  . . . : 200.200.200.250

   DNS Servers . . . . . .  . . . . : 200.200.200.61

                                                200.200.200.66

   Primary WINS Server . .  . . : 200.200.200.61

   Secondary WINS Server . . : 200.200.200.66


DC3
   Host Name . . . . . . . . . . . . : XXcs004

   Primary Dns Suffix  . . . . . : XXC.COM

   Node Type . . . . . . . . . . . . : Hybrid

   IP Routing Enabled. . . . . . . : Yes

   WINS Proxy Enabled. . .  . . : No

   DNS Suffix Search List. .  . : XXC.COM

Ethernet adapter Local Area Connection 2:

   Connection-specific DNS Suffix  . : XXc.com

   Description . . . . . . . . . . . : HP NC7761 Gigabit Server Adapter

   Physical Address. . . . . . . . . : 00-13-21-AE-F8-15

   DHCP Enabled. . . . . . . . . . . : No

   IP Address. . . . . . . . . . . . : 200.200.200.63

   Subnet Mask . . . . . . . . . . . : 255.0.0.0

   Default Gateway . . . . . . . . . : 200.200.200.250

   DNS Servers . . . . . . . . . . . : 200.200.200.63

                                                 200.200.200.61
0
 
LVL 74

Expert Comment

by:Glen Knight
Comment Utility
OK, great, I will digest this.

Whilst I do that, did you check the zone types?
0
 

Author Comment

by:uspiv
Comment Utility
They are all AD-Intergrated. However I noticed that there is no . zone in the DNS for DC2 & 3. There used to be for DC1 but I think that it was deleted while trying to solve problem with zone creation.

Only the PDC has Cached Lookups. But this seem to be empty. The .(root) folder only has net folder (empty) and a file (localhost) with 127.0.0.1 as the Host(A) data .
0
 

Author Comment

by:uspiv
Comment Utility
Note also that the Replication services is still turned OFF.
0
 
LVL 74

Expert Comment

by:Glen Knight
Comment Utility
Are forwarders configured on the DNS servers??

The . zone is added during DCPROMO but it's not required if the servers are doing Internet resolution, and in fact Internet names resolution will fail if the . zone is in place.

Can you change the DNS zones to allow secure and insecure on xxcs004 (we will change this back later)

On Xxcs004, remove the secondary DNS address on the NIC, on the other 2 DNS servers remove all DNS IP addresses from the NIC config and add 200.200.200.63 as the ONLY DNS entry.  Then restart the NETLOGON service to re-register with DNS.
0
 

Author Comment

by:uspiv
Comment Utility
done
0
 
LVL 74

Expert Comment

by:Glen Knight
Comment Utility
Ok, start the FRS service on one of the other DC's.

Watch for any events in the application log.
0
 

Author Comment

by:uspiv
Comment Utility
Started FRS on the PDC. i.e. XXs001. Note that FRS is still off on XXs004 & the Exchange Server
0
 
LVL 74

Expert Comment

by:Glen Knight
Comment Utility
Any errors? Check the SYSVOL on the local machine by going Start > Run > \\localhost\SYSVOL

Anything there?
0
 

Author Comment

by:uspiv
Comment Utility
on the PDC, same as before. Nothing new. No Policies and Scripts. Same on Exchange Server but then, FRS is not yet started on the Exchange Server
0
 
LVL 74

Expert Comment

by:Glen Knight
Comment Utility
Keys start the FRS service on the exchange server as well.

Can you confirm your DNS is working properly now?

Let's run DCDIAG again on one of the servers with an empty SYSVOL and just check that error has gone.
0
 

Author Comment

by:uspiv
Comment Utility
On the PDC, there are several System event errors from MRxSmb, eventID 3019 - The redirector failed to determine the connection type. DNS Server event  showed no new error.  File replication event still come up with NtFrs Warning Event ID 13508 - replication from XXs004 & XXs009.
However, Replication event earlier showed the positive information event ID 13516 - The File Replication Service is no longer preventing the computer XXs001 from becoming a domain controller....
0
 
LVL 74

Expert Comment

by:Glen Knight
Comment Utility
13516 is good, how long ago was that? Before or after we made the DNS changes?
0
 

Author Comment

by:uspiv
Comment Utility
dcdiag dns test passed on the exchange server but still failed on the PDC. Same Forwarder proplem.
SYSVOL still same on both Exchange & PDC servers. no new entry.
0
 
LVL 74

Expert Comment

by:Glen Knight
Comment Utility
What is the forwarder error you are seeing?
0
 

Author Comment

by:uspiv
Comment Utility
Root hints list has invalid root hint server
0
 

Author Comment

by:uspiv
Comment Utility
Domain Controller Diagnosis

Performing initial setup:
   Done gathering initial info.

Doing initial required tests

   Testing server: Default-First-Site-Name\xxCS001
      Starting test: Connectivity
         ......................... xxCS001 passed test Connectivity

Doing primary tests

   Testing server: Default-First-Site-Name\xxCS001

DNS Tests are running and not hung. Please wait a few minutes...

   Running partition tests on : Schema

   Running partition tests on : Configuration

   Running partition tests on : xxC

   Running enterprise tests on : xxC.com
      Starting test: DNS
         Test results for domain controllers:

            DC: xxcs001.xxC.COM
            Domain: xxC.com

               TEST: Forwarders/Root hints (Forw)
                  Error: Root hints list has invalid root hint server: a.root-servers.net. (198.41.0.4)
                  Error: Root hints list has invalid root hint server: b.root-servers.net. (128.9.0.107)
                  Error: Root hints list has invalid root hint server: c.root-servers.net. (192.33.4.12)
                  Error: Root hints list has invalid root hint server: d.root-servers.net. (128.8.10.90)
                  Error: Root hints list has invalid root hint server: e.root-servers.net. (192.203.230.10)
                  Error: Root hints list has invalid root hint server: f.root-servers.net. (192.5.5.241)
and so on.....
0
 
LVL 74

Expert Comment

by:Glen Knight
Comment Utility
Do you have forwarders configured in that server under the properties of the DNS server?
0
 

Author Comment

by:uspiv
Comment Utility
No. Same as other DNS servers on the other 2 DCs. But like I said earlier, the . (dot) zone was deleted on the PDC.
0
 

Author Comment

by:uspiv
Comment Utility
All other DNS Domains
0
 
LVL 74

Expert Comment

by:Glen Knight
Comment Utility
as I said before the .(dot) zone should be removed anyway for internet names resolution.  If it's not then you will experience problems, so make sure this doesn't exist on any of the DNS servers.  The .(dot) zone basically says it's responsible for everything so don't ask anyone else.

Does this server that is returning this error have internet access?  Do you have your ISP's DNS servers and can you add them to the forwarders tab.
0
 

Author Comment

by:uspiv
Comment Utility
Before problem, it had access to the internet and the DNS on the NIC was set to itself.
I have tried setting the Forwarders to the ISP's DNS servers but it did not make a difference.
0
 
LVL 74

Expert Comment

by:Glen Knight
Comment Utility
Have you removed the .(dot) zone from the local DNS server? This should not be there if you want to be able to resolve internet names.
0
 

Author Comment

by:uspiv
Comment Utility
I got restless and removed DNS from the PDC. Then I re-installed it and created a forwarder to XXCS004. nslookup works on the PDC but internet browsing does not.
0
 
LVL 74

Expert Comment

by:Glen Knight
Comment Utility
You do not need a forwarder to the other DC, the forwarders should be your ISP DNS servers.
0
 

Author Comment

by:uspiv
Comment Utility
It's taking so long.
I changed the forwarder on the PDC to the ISP DNS server. Nslookup now fails to resolve any external DNS.
0
 
LVL 74

Expert Comment

by:Glen Knight
Comment Utility
Did you delete the .(dot) zone?
0
 

Author Comment

by:uspiv
Comment Utility
I did a test Audit on the 3 DCs and the result is as follows:-
1. DCDIAG /C
XXCS001 (PDC) - Failed Netlogons, frsevent, verifyReplicas & DNS
XXCS009 (Exchange) - Failed Netlogons, Systemlog, VerifyReplicas & DNS
XXCS004 (DC) - Failed Services (Ntfrs services was stopped at your request some time back), VerifyReplicas & DNS

2.EVENTVWR
XXCS001
Application -  Userenv (1030, 1038), AutoEnrollment (13)
System  - Netlogon (5719), DCOM (10006), MrsSmb (8003)
Directory Services - NTDS Replication warning (1232)
DNS Server - Warnings (4521, 7062, 9999, 3000)
File Replication Service - Warning (13508)

XXCS009
Application -  HP systems Insight manager (3), AutoEnrollment (13)
System  - Netlogon (5719), DCOM (10016)
Directory Services - NTDS KCC warning (1308), NTDS General warning (1115), NTDS Replication warning (2008)
DNS Server - Warnings (4521, 9999, 3000)
File Replication Service - NtFrs (13568), NtFrs Warning (13508)

XXC004
Application -  AutoEnrollment (13)
System  - Netlogon (5719, 5723, 5805,), DCOM (10006)
Directory Services - NTDS Replication warning (1232, 1188)
DNS Server - Warnings (4521, 9999)
File Replication Service - NtFrs Error (13568)
0
 

Author Comment

by:uspiv
Comment Utility
There's no .(dot) zone. The only place we can find .(dot) is in the Cached Lookups which automatically brings up the .(root) folder when DNS was re-installed.
0
 
LVL 74

Expert Comment

by:Glen Knight
Comment Utility
We are clearly missing something here.

My final suggestion, do you have any antivirus software on the servers? If so remove it, disable the Windows firewall and reboot.  Does replication start?

Other than this, I would suggest you get someone to take a look at it, we are clearly missing something that posting in a thread cannot discover, I'd be happy to take a look if this is something you would be open to?
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 4

Expert Comment

by:pbrane
Comment Utility
Hi both,

I hope no one minds me posting again, just thought I might be able to put an idea your way.

I thought it might be a good idea to rule out outbound connectivity in general for that PDC server.

It sound like your server is capable of recursive queries as when you point it internally it works. (The function itself of recursive query, seems to be working)

When you point it outbound it fails.

From the PDC;

Can you actually ping the Google DNS server 8.8.8.8?

If that works can you telnet to the Google DNS server: telnet 8.8.8.8 53?
A blank screen is acceptable as a connection.

It that works can you receive a referral or A record from the Google DNS server?

Drop into the nslookup command line by typing nslookup in a CMD box and pressing enter.
Set your nslookup session to use the Google DNS sever by typing: server 8.8.8.8
Turn on Debugging by typing: set debug
Then try a query, type yahoo.com or bbc.co.uk and paste the results. You should get a debug list of what happened during the query.

Just interested in whether you server can actually connect to an external DNS server and make a query and then we can rule out external connectivity.

Nslookup will bypass using its own DNS once we have set it to use the Google server directly. I thought at least we can tell if the server is just incapable in general or whether it's fine and it's just DNS set up incorrectly still.

Thanks,

P.s. If you rather I not post please let me know.
0
 

Author Comment

by:uspiv
Comment Utility
It's ok pbrane. I don't mind support from you or any other at this stage. The object of this posts is to resolve the issue. I'll do the tests and post results asap. Thanks.
0
 
LVL 74

Expert Comment

by:Glen Knight
Comment Utility
The offer of taking a physical look is there if you are open to it?
Contact details in my profile.
0
 

Author Comment

by:uspiv
Comment Utility
While carrying out the tests, I discovered that the PDC did not have access to the internet. Now I found out why. The IP address was placed on the shun list of the PIX firewall.
I uninstalled the DNS. Then re-installed it.
Pinging 8.8.8.8 returned Request timed out. It is possible that the ping is blocked by the PIX.
Telnet 8.8.8.8 53 worked ok - A blank screen.
The google.com directory of the DNS server Forward Lookup Zones is working ok. NS and A records returned.
Result of nslookup session using the Google DNS sever
..\nslookup
Default Server:  ns2.kingston-mediastream.net
Address:  217.10.162.8

> server 8.8.8.8
Default Server:  google-public-dns-a.google.com
Address:  8.8.8.8

> set debug
> yahoo.com
Server:  google-public-dns-a.google.com
Address:  8.8.8.8

------------
Got answer:
    HEADER:
        opcode = QUERY, id = 3, rcode = NOERROR
        header flags:  response, want recursion, recursion avail.
        questions = 1,  answers = 1,  authority records = 0,  additional = 0

    QUESTIONS:
        yahoo.com.WRPC.COM, type = A, class = IN
    ANSWERS:
    ->  yahoo.com.WRPC.COM
        internet address = 64.34.175.158
        ttl = 43200 (12 hours)
------------
Non-authoritative answer:
Name:    yahoo.com.WRPC.COM
Address:  64.34.175.158

I think the major problem was that the IP address for the PDC was blocked. Maybe we have to restart from here...
0
 
LVL 4

Expert Comment

by:pbrane
Comment Utility
Hi,

OK, that’s good news then.

So if you put some forwarders in again now on the PDC, say the Google DNS servers, 8.8.8.8 and 8.8.4.4 and then try the simple and recursive tests again from within the DNS management console, do they both pass now?

And if so, does your DCdiag /test:dns pass now as well?

Thanks,
0
 

Author Comment

by:uspiv
Comment Utility
The simple and recursive tests both PASS with and without the Fowarder pointing to the Google DNS servers.
With the forwarder on the Google DNS servers, DCdiag test on the dns failed.

..\dcdiag /test:dns
Domain Controller Diagnosis
Performing initial setup:
   Done gathering initial info.
Doing initial required tests
   Testing server: Default-First-Site-Name\XXCS001
      Starting test: Connectivity
         ......................... XXCS001 passed test Connectivity
Doing primary tests
   Testing server: Default-First-Site-Name\XXCS001
DNS Tests are running and not hung. Please wait a few minutes...
   Running partition tests on : Schema
   Running partition tests on : Configuration
   Running partition tests on : XXC
   Running enterprise tests on : XXC.com
      Starting test: DNS
         Test results for domain controllers:
            DC: XXcs001.XXC.COM
            Domain: XXC.com
               TEST: Delegations (Del)
                  Error: DNS server: XXcs001.XXc.com. IP:200.200.200.166 [Broke
n delegated domain newzone.XXC.com.]
         Summary of test results for DNS servers used by the above domain contro
llers:
            DNS server: 200.200.200.166 (XXcs001.XXc.com.)
               1 test failure on this DNS server
               Delegation is broken for the domain newzone.XXC.com. on the DNS
server 200.200.200.166
         Summary of DNS test results:
                                            Auth Basc Forw Del  Dyn  RReg Ext
               ________________________________________________________________
            Domain: XXC.com
               XXcs001                     PASS PASS PASS FAIL PASS PASS n/a

Eventvwr shows DNS Warning ID=4521, 9999
I don't know where the DNS keep fetching the zone "newzone.XXC.COM".
0
 
LVL 4

Expert Comment

by:pbrane
Comment Utility
Hi,

Do you not have a delegated zone within your XXc.com forward lookup zone?

Should show up like a small folder in the bottom of the zone folder structure.

Like this?

 delegated zone example
0
 

Author Comment

by:uspiv
Comment Utility
Oh yeah. I see it. "newzone". It really shouldn't be there. I guess I can delete it?
0
 
LVL 4

Expert Comment

by:pbrane
Comment Utility
Hi,

Unless you know of any reason not to.

If you don't use any DNS records along the lines of www.newzone.xxc.com in your business then I think it's probably just been made by mistake in the past.

When you have removed that, have one more blast at the DCDiag test again.

Thanks,  
0
 

Author Comment

by:uspiv
Comment Utility
...\dcdiag /test:dns
Domain Controller Diagnosis
Performing initial setup:
   Done gathering initial info.
Doing initial required tests
   Testing server: Default-First-Site-Name\XXCS001
      Starting test: Connectivity
         ......................... XXCS001 passed test Connectivity
Doing primary tests
   Testing server: Default-First-Site-Name\XXCS001
DNS Tests are running and not hung. Please wait a few minutes...
   Running partition tests on : Schema
   Running partition tests on : Configuration
   Running partition tests on : XXC
   Running enterprise tests on : XXC.com
      Starting test: DNS
         ......................... XXC.com passed test DNS
0
 
LVL 4

Expert Comment

by:pbrane
Comment Utility
Happy Days! :-)

So, I'm probably a bit lost now with where we are up to.

Would you be able to do another mini audit of what is failing only on the three servers? Just normal Dcdiags, no switches?

Is the NTFRS Service on XXCS004 Still stopped?

Are there any more services stopped on any of them?

Sorry just trying to catch up on where we are up to.

Thanks,
0
 
LVL 4

Expert Comment

by:pbrane
Comment Utility
Would you be able to do a mini audit on Netdiag as well?

Thanks,
0
 

Author Comment

by:uspiv
Comment Utility

........................................

    Computer Name: XXCS001
    DNS Host Name: XXcs001.XXC.COM
    System info : Microsoft Windows Server 2003 (Build 3790)
    Processor : x86 Family 15 Model 4 Stepping 1, GenuineIntel
    List of installed hotfixes :
       
       abridged...

Netcard queries test . . . . . . . : Passed
Per interface results:
    Adapter : XXCLAN
        Netcard queries test . . . : Passed
        Host Name. . . . . . . . . : XXcs001
        IP Address . . . . . . . . : 200.200.200.166
        Subnet Mask. . . . . . . . : 255.0.0.0
        Default Gateway. . . . . . : 200.200.200.250
        Primary WINS Server. . . . : 200.200.200.166
        Secondary WINS Server. . . : 200.200.200.161
        Dns Servers. . . . . . . . : 200.200.200.166

        AutoConfiguration results. . . . . . : Passed
        Default gateway test . . . : Passed
        NetBT name test. . . . . . : Passed
        [WARNING] At least one of the <00> 'WorkStation Service', <03> 'Messenger

Service', <20> 'WINS' names is missing.
        WINS service test. . . . . : Passed
Global results:
Domain membership test . . . . . . : Passed
NetBT transports test. . . . . . . : Passed
    List of NetBt transports currently configured:
        NetBT_Tcpip_{0CB07251-92A1-49D4-9E0C-5BE94AA160BE}
    1 NetBt transport currently configured.
Autonet address test . . . . . . . : Passed
IP loopback ping test. . . . . . . : Passed
Default gateway test . . . . . . . : Passed
NetBT name test. . . . . . . . . . : Passed
    [WARNING] You don't have a single interface with the <00> 'WorkStation

Service', <03> 'Messenger Service', <20> 'WINS' names defined.
Winsock test . . . . . . . . . . . : Passed
DNS test . . . . . . . . . . . . . : Passed
    PASS - All the DNS entries for DC are registered on DNS server

'200.200.200.166' and other DCs also have some of the names registered.
Redir and Browser test . . . . . . : Passed
    List of NetBt transports currently bound to the Redir
        NetBT_Tcpip_{0CB07251-92A1-49D4-9E0C-5BE94AA160BE}
    The redir is bound to 1 NetBt transport.
    List of NetBt transports currently bound to the browser
        NetBT_Tcpip_{0CB07251-92A1-49D4-9E0C-5BE94AA160BE}
    The browser is bound to 1 NetBt transport.
DC discovery test. . . . . . . . . : Passed
DC list test . . . . . . . . . . . : Passed
Trust relationship test. . . . . . : Skipped
Kerberos test. . . . . . . . . . . : Passed
LDAP test. . . . . . . . . . . . . : Passed
Bindings test. . . . . . . . . . . : Passed
WAN configuration test . . . . . . : Skipped
   No active remote access connections.
Modem diagnostics test . . . . . . : Passed
IP Security test . . . . . . . . . : Skipped
    Note: run "netsh ipsec dynamic show /?" for more detailed information
The command completed successfully
0
 
LVL 4

Expert Comment

by:pbrane
Comment Utility
Hi,

So are you happy that XXCS001 has a SYSVOL folder with everything in, Whatever is supposed to be in your Netlogon and policies etc and that this is a working server now?

And if you can remind me, is it XXCS009 and XXC004 that have empty Sysvol folders?

Thanks,  
0
 

Author Comment

by:uspiv
Comment Utility
XXCS004 has XXC.COM folder with Policies & Scripts subfolders

XXCS009 has XXC.COM folder with the following:-
1. DO_NOT_REMOVE_NtFrs_PreInstall_Directory subfolder (this is hidden and empty)
2. NtFrs_PreExisting__See_EventLog subdirectory (this contains the Policies & Scripts folders)

XXCS001 has XXC.COM folder has  DO_NOT_REMOVE_NtFrs_PreInstall_Directory subfolder (this is hidden and empty)

So to answer your first question, XXCS001 still does not have the appropriate folders in SYSVOL and still fails NetLogons test - Unable to connect to the NETLOGON share!
0
 
LVL 4

Expert Comment

by:pbrane
Comment Utility
Hi,

OK, that narrows it down then :-)

Can you make sure XXCS004 has all its services start and it can pass a standard dcdiag and netdiag without any issues please? It needs to be in good shape it’s our only reference Domain controller now.

Then on XXCS001 and XXCS009, can you make sure they pass the DNS tests for DCDIAG and NETDIAG? I know other stuff will fail. I just want to make sure that when we try the Burflag in a minute they are going to be able to find a replication partner.

Once we are good to go, and this is only in the event that XXCS004 is clean on all tests and XXCS001 and XXCS009 pass their DNS tests, I think this should be a rough outline for what we should do.

1. stop the FRS services on XXCS001 and XXCS009 only.
2. Pick one of these servers to perform a D2 restore on, forcing it to replicate from the only available (Good) domain controller XXCS004
3. Once we are happy this has finished and has worked perform the D2 restore on the remaining domain controller, which at this point now has two good domain controllers to replicate from.
4. Once we are happy this has completed and is working, DCDIAG’s and NETDIAG’s tests all round
5. If these tests are all clean, Run around with our T-Shirts over our heads as your domain is back and it's Friday!

Thanks,


0
 

Author Comment

by:uspiv
Comment Utility
Errors encountered.
Event Type:      Warning
Event Source:      NtFrs
Event Category:      None
Event ID:      13508
Date:            4/1/2011
Time:            1:27:48 PM
User:            N/A
Computer:      XXCS001
Description:
The File Replication Service is having trouble enabling replication from XXCS009 to XXCS001 for c:\windows\sysvol\domain using the DNS name XXcs009.XXC.COM. FRS will keep retrying.
 Following are some of the reasons you would see this warning.
 
 [1] FRS can not correctly resolve the DNS name XXcs009.XXC.COM from this computer.
 [2] FRS is not running on XXcs009.XXC.COM.
 [3] The topology information in the Active Directory for this replica has not yet replicated to all the Domain Controllers.

Event Type:      Warning
Event Source:      NtFrs
Event Category:      None
Event ID:      13508
Date:            4/1/2011
Time:            1:27:48 PM
User:            N/A
Computer:      XXCS001
Description:
The File Replication Service is having trouble enabling replication from XXCS004 to XXCS001 for c:\windows\sysvol\domain using the DNS name XXcs004.XXC.COM. FRS will keep retrying.
 Following are some of the reasons you would see this warning.
 
 [1] FRS can not correctly resolve the DNS name XXcs004.XXC.COM from this computer.
 [2] FRS is not running on XXcs004.XXC.COM.
 [3] The topology information in the Active Directory for this replica has not yet replicated to all the Domain Controllers.

FRS s running on XXCS004.
0
 
LVL 4

Expert Comment

by:pbrane
Comment Utility
Hi,

Shall we move forward with trying to bring the domain back on line then?

Can you make sure you can ping XXCS004 from XXCS001 to make sure you have connectivity and can resolve the name please.

If you could stop the FRS services on XXCS001 and XXCS009

Then on XXCS001 can you go to the following key:

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\NtFrs\Parameters\Backup/Restore\Process at Startup

and edit the BurFlags value to D2 if the BurFlags value isn't there create it as a standard DWORD value

Then start the FRS service on XXCS001 only please.

Check the event log for event ID 13565 so we know it's started the process.

Then when 13516 appears in the event log the process is finished.

After this you should be able to see you NETLOGON and SYSVOL shares when you perform a NET SHARE command in the CMD prompt and the SYSVOL and NETLOGON folder should have their normal contents.

Then we'll move onto the next one.

Let me know how this one goes.
0
 

Author Comment

by:uspiv
Comment Utility
I did all that before my previous post of error encountered.

I did the BurFlags D2 nonauthoritative mode restore on XXCS001.
I got the  event ID 13565 but next I got ID 13553, 13554 and 13508. I am still waiting for ID 13516 to happen.
I just ran the cmd "net share" but SYSVOL and NETLOGON are not there yet.
0
 
LVL 4

Expert Comment

by:pbrane
Comment Utility
Hi,

I noticed that all of your DC's have public IP addresses. I manage networks like this at the minute where they have public ranges as the Private LAN ranges.

The DC's don't go through the PIX to talk to each other do they? Is your LAN actally on the subnet of 200.200.200.0/24 and all DC's talk to each other through a switch?

Can you ping XXCS004 by NetBios name and FQDN?

Can you also try and connect to XXCS004 via telnet on ports 139 and 445 just a couple of randoms, to make sure you can see the DC clearly?

Thanks,

0
 
LVL 4

Expert Comment

by:pbrane
Comment Utility
also you didn't tell me the results of the dciag and netdiag tests regarding DNS.

If you run DCDIAG and NETDIAG on XXCS001 does it actually pass the DNS tests?

Thanks,

0
 
LVL 4

Expert Comment

by:pbrane
Comment Utility
Could you also run a repadmin /options and tell me the result please?

You should get a response of Current DC Options: IS_GC from both XXCS001  and XXCS004

Thanks,
0
 
LVL 4

Expert Comment

by:pbrane
Comment Utility
Sorry to bombard you,

One more thing I haven't asked you to check.

In AD sites and services, can you see a replication connection between XXCS001  and XXCS004?

You can find this out by drilling down to the NTDS settings, see picture. No link on mine but that’s where to find it.

 Sites and services
Thanks,
0
 

Author Comment

by:uspiv
Comment Utility
1. No. The DC's do not go through the PIX to talk to each other
2. LAN is not subnetted that way.. For now, using 255.0.0.0 subnet... more like a FLAT network. (This will change soon)
3. YES I can you ping XXCS004 by NetBios name and FQDN
4. Connection to  XXCS004 via telnet on ports 139 and 445 gives black screen meaning it's OK.
5. DCDIAG and NETDIAG on XXCS001 passed the DNS tests
6. repadmin /options response is Current DC Options: IS_GC for XXCS001. Response for XXCS004 is Current DC options: is <none>. This is expected because only the PDC and the exchange server are GCs. XXCS004 is not a GC.
7. The replication connections are ok in AD Sites and Services. I see both XXCS004 & XXCS009 under NTDS settings for XXCS001.



0
 

Author Comment

by:uspiv
Comment Utility
Don't you think we need to check what is causing this warning on the DNS event...

Event Type:      Warning
Event Source:      DNS
Event Category:      None
Event ID:      4521
Date:            4/1/2011
Time:            11:30:14 PM
User:            N/A
Computer:      XXCS001
Description:
The DNS server encountered error 9002 attempting to load zone . from Active Directory. The DNS server will attempt to load this zone again on the next timeout cycle. This can be caused by high Active Directory load and may be a transient condition.
0
 

Author Comment

by:uspiv
Comment Utility
One more thing. I just observed that all this while, I had left forwarders on XXC001 pointing to 8.8.8.8 and 8.8.4.4 (google). I just removed the forwarders.
0
 
LVL 4

Expert Comment

by:pbrane
Comment Utility
Hi,

What's xxcs001 using for primary DNS?

And can you post a repadmin /showrepl as well please.

Thanks,
0
 
LVL 4

Expert Comment

by:pbrane
Comment Utility
Hi,

OK, you'd be fine to keep them in though if you wanted to.

Would you download FRSDIAG: http://www.microsoft.com/downloads/en/details.aspx?FamilyId=43CB658E-8553-4DE7-811A-562563EB5EBF&displaylang=en

and run the default test please on both xxcs001 and XXCS004 please.

Can you tell me in the lower right hand window which tests pass and fail or just post those results from each server?

I'm not too concerned about the DNS warning above as long as they are passing the DNS tests in DCDIAG and NETDIAG.

You don't have any entries in the HOST file on either one do you?

Thanks,
0
 

Author Comment

by:uspiv
Comment Utility
XXCS001 us using it's own IP as primary dns.
When I removed the forwarders from the google dns server, dns test failed. So I set the Forwarders to point to our ISP DNS.
..\repadmin /showrepl
repadmin running command /showrepl against server localhost

Default-First-Site-Name\XXCS001
DC Options: IS_GC
Site Options: (none)
DC object GUID: e2178bbe-9544-45c0-949b-7fae980a099e
DC invocationID: 62444897-6ed5-4605-9828-e082a73b117d
==== INBOUND NEIGHBORS ======================================
DC=XXC,DC=com
    Default-First-Site-Name\XXCS009 via RPC
        DC object GUID: 3dc487d0-03be-41d2-867e-dacc0ad27843
        Last attempt @ 2011-04-02 13:16:54 was successful.
    Default-First-Site-Name\XXCS004 via RPC
        DC object GUID: 86ca4286-2ea6-487d-94bf-8b7ac54fb564
        Last attempt @ 2011-04-02 13:18:48 was successful.

CN=Configuration,DC=XXC,DC=com
    Default-First-Site-Name\XXCS009 via RPC
        DC object GUID: 3dc487d0-03be-41d2-867e-dacc0ad27843
        Last attempt @ 2011-04-02 12:57:25 was successful.
    Default-First-Site-Name\XXCS004 via RPC
        DC object GUID: 86ca4286-2ea6-487d-94bf-8b7ac54fb564
        Last attempt @ 2011-04-02 12:57:44 was successful.

CN=Schema,CN=Configuration,DC=XXC,DC=com
    Default-First-Site-Name\XXCS004 via RPC
        DC object GUID: 86ca4286-2ea6-487d-94bf-8b7ac54fb564
        Last attempt @ 2011-04-02 12:51:52 was successful.
    Default-First-Site-Name\XXCS009 via RPC
        DC object GUID: 3dc487d0-03be-41d2-867e-dacc0ad27843
        Last attempt @ 2011-04-02 12:51:52 was successful.

NETDIAG /TEST:DNS result
........
    Computer Name: XXCS001
    DNS Host Name: XXcs001.XXC.COM
    System info : Microsoft Windows Server 2003 (Build 3790)
    Processor : x86 Family 15 Model 4 Stepping 1, GenuineIntel
    List of installed hotfixes :
        KB2229593
        KB911564
        ....abridged
Netcard queries test . . . . . . . : Passed
Per interface results:
    Adapter : XXCLAN
        Netcard queries test . . . : Passed
Global results:
Domain membership test . . . . . . : Failed
    [WARNING] Ths system volume has not been completely replicated to the local machine. This machine is not working properly as a DC.
NetBT transports test. . . . . . . : Passed
    List of NetBt transports currently configured:
        NetBT_Tcpip_{0CB07251-92A1-49D4-9E0C-5BE94AA160BE}
    1 NetBt transport currently configured.
DNS test . . . . . . . . . . . . . : Passed
    PASS - All the DNS entries for DC are registered on DNS server '200.200.200.166' and other DCs also have some of the names registered.
The command completed successfully
0
 

Author Comment

by:uspiv
Comment Utility
FRSDIAG test on XXCS001
------------------------------------------------------------
FRSDiag v1.7 on 4/2/2011 2:22:26 PM
.\XXCS001 on 2011-04-02 at 2.22.26 PM
------------------------------------------------------------
Checking for errors/warnings in FRS Event Log ....       

NtFrs      4/2/2011 12:17:49 AM      Warning      13508      The File Replication Service is having trouble enabling replication  from XXCS009 to XXCS001 for c:\windows\sysvol\domain using the DNS name XXcs009.XXC.COM. FRS will keep retrying.     Following are some of the reasons you would see this warning.         [1] FRS can not correctly resolve the DNS name XXcs009.XXC.COM from this computer.     [2] FRS is not running on XXcs009.XXC.COM.     [3] The topology information in the Active Directory for this replica has not  yet replicated to all the Domain Controllers.         This event log message will appear once per connection, After the problem  is fixed you will see another event log message indicating that the connection  has been established.      

NtFrs      4/2/2011 12:17:48 AM      Warning      13508      The File Replication Service is having...

NtFrs      4/2/2011 12:07:19 AM      Warning      13566      File Replication Service is scanning the data in the system volume. Computer XXCS001  cannot become a domain controller until this process is complete.  The system volume will then be shared as SYSVOL.        To check for the SYSVOL share, at the command prompt, type:    net share        When File Replication Service completes the scanning process, the SYSVOL  share will appear.        The initialization of the system volume can take some time.  The time is dependent on the amount of data in the system volume.      

NtFrs      4/1/2011 1:27:48 PM      Warning      13508      The File Replication Service is having...

NtFrs      4/1/2011 1:26:06 PM      Warning      13565      File Replication Service is initializing the system volume with data from another  domain controller. Computer XXCS001 cannot become a domain controller until this process  is complete. The system volume will then be shared as SYSVOL.        To check for the SYSVOL share, at the command prompt, type:    net share        When File Replication Service completes the initialization process, the SYSVOL  share will appear.        The initialization of the system volume can take some time.  The time is dependent on the amount of data in the system volume,  the availability of other domain controllers, and the replication  interval between domain controllers.
      WARNING: Found Event ID 13508 errors without trailing 13509 ... see above for (up to) the 3 latest entries!

 ......... failed 3
Checking for errors in Directory Service Event Log ....       

NTDS Inter-site Messaging      4/2/2011 12:11:52 AM      Error      1832      The SMTP domain administrative namespace is not available at this time. Mail-based replication cannot be configured until this condition is corrected.        As a result, intersite replication using the SMTP transport between the local domain controller and all domain controllers in other sites will fail.        Replication using SMTP will be tried again later.        Additional Data    Error value:  80004005 Unspecified error      

NTDS Inter-site Messaging      3/30/2011 6:33:41 PM      Error      1832      The SMTP domain administrative namespace is not available at this time. Mail-based replication cannot be configured until this condition is corrected.        As a result, intersite replication using the SMTP transport between the local domain controller and all domain controllers in other sites will fail.        Replication using SMTP will be tried again later.        Additional Data    Error value:  80004005 Unspecified error      

NTDS Replication      3/30/2011 6:30:01 PM      Error      2426919      Active Directory could not resolve the following DNS host name of the  source domain controller to an IP address. This error prevents additions,  deletions and changes in Active Directory from replicating between one or  more domain controllers in the forest. Security groups, group policy, users  and computers and their passwords will be inconsistent between domain  controllers until this error is resolved, potentially affecting logon  authentication and access to network resources.        Source domain controller:     XXcs004    Failing DNS host name:     86ca4286-2ea6-487d-94bf-8b7ac54fb564._msdcs.XXC.COM        NOTE: By default, only up to 10 DNS failures are shown for any given 12 hour  period, even if more than 10 failures occur.  To log all individual failure  events, set the following diagnostics registry value to 1:        Registry Path:    HKLM\System\CurrentControlSet\Services\NTDS\Diagnostics\22 DS RPC Client        User Action:         1) If the source domain controller is no longer functioning or its operating  system has been reinstalled with a different computer name or NTDSDSA object  GUID, remove the source domain controller's metadata with ntdsutil.exe, using  the steps outlined in MSKB article 216498.         2) Confirm that the source domain controller is running Active directory and  is accessible on the network by typing "net view \\<source DC name>" or  "ping <source DC name>".         3) Verify that the source domain controller is using a valid DNS server for  DNS services, and that the source domain controller's host record and CNAME  record are correctly registered, using the DNS Enhanced version  of DCDIAG.EXE available on http://www.microsoft.com/dns          dcdiag /test:dns         4) Verify that that this destination domain controller is using a valid DNS  server for DNS services, by running the DNS Enhanced version of DCDIAG.EXE  command on the console of the destination domain controller, as follows:          dcdiag /test:dns         5) For further analysis of DNS error failures see KB 824449:       http://support.microsoft.com/?kbid=824449        Additional Data    Error value:     11004 The requested name is valid, but no data of the requested type was found.          

NTDS Inter-site Messaging      3/30/2011 1:38:03 PM      Error      1832      The SMTP domain administrative namespace is not available at this time. Mail-based replication cannot be configured until this condition is corrected.        As a result, intersite replication using the SMTP transport between the local domain controller and all domain controllers in other sites will fail.        Replication using SMTP will be tried again later.        Additional Data    Error value:  80004005 Unspecified error      

NTDS Inter-site Messaging      3/30/2011 9:31:35 AM      Error      1832      The SMTP domain administrative namespace is not available at this time. Mail-based replication cannot be configured until this condition is corrected.        As a result, intersite replication using the SMTP transport between the local domain controller and all domain controllers in other sites will fail.        Replication using SMTP will be tried again later.        Additional Data    Error value:  80004005 Unspecified error
      WARNING: Found Directory Service Errors in the past 15 days! FRS Depends on AD so Check AD Replication!

 ......... failed 5
Checking for minimum FRS version requirement ... passed
Checking for errors/warnings in ntfrsutl ds ... passed
Checking for Replica Set configuration triggers... passed
Checking for suspicious file Backlog size... passed
Checking Overall Disk Space and SYSVOL structure (note: integrity is not checked)... passed
Checking for suspicious inlog entries ... passed
Checking for suspicious outlog entries ... passed
Checking for appropriate staging area size ... passed
Checking for errors in debug logs ...
      ERROR on NtFrs_0003.log : "ERROR_ACCESS_DENIED" : <SndCsMain:                     5884:   904: S0: 18:38:57> :SR: Cmd 010f3898, CxtG e4d302f6, WS ERROR_ACCESS_DENIED, To   XXcs004.XXC.COM Len:  (372) [SndFail - Send Penalty]
      ERROR on NtFrs_0003.log : "ERROR_ACCESS_DENIED" : <SndCsMain:                     5892:   877: S0: 18:39:37> :SR: Cmd 010f4058, CxtG e4d302f6, WS ERROR_ACCESS_DENIED, To   XXcs004.XXC.COM Len:  (372) [SndFail - rpc call]
      ERROR on NtFrs_0003.log : "ERROR_ACCESS_DENIED" : <SndCsMain:                     5892:   904: S0: 18:39:37> :SR: Cmd 010f4058, CxtG e4d302f6, WS ERROR_ACCESS_DENIED, To   XXcs004.XXC.COM Len:  (372) [SndFail - Send Penalty]
      ERROR on NtFrs_0005.log : "ERROR_RETRY" : <SndCsMain:                     1440:   904: S0: 14:15:11> :SR: Cmd 011065c0, CxtG e4d302f6, WS ERROR_RETRY, To   XXcs004.XXC.COM Len:  (372) [SndFail - Send Penalty]
      ERROR on NtFrs_0005.log : "ERROR_RETRY" : <SndCsMain:                     3904:   877: S0: 14:22:00> :SR: Cmd 010dee68, CxtG e4d302f6, WS ERROR_RETRY, To   XXcs004.XXC.COM Len:  (372) [SndFail - rpc call]
      ERROR on NtFrs_0005.log : "ERROR_RETRY" : <SndCsMain:                     3904:   904: S0: 14:22:00> :SR: Cmd 010dee68, CxtG e4d302f6, WS ERROR_RETRY, To   XXcs004.XXC.COM Len:  (372) [SndFail - Send Penalty]
      ERROR on NtFrs_0005.log : "EPT_S_NOT_REGISTERED(This may indicate that DNS returns the IP address of the wrong computer. Check DNS records being returned, Check if FRS is currently running on the target server. Check if Ntfrs is registered with the End-Point-Mapper on target server!)" : <SndCsMain:                     3904:   883: S0: 14:20:25> ++ ERROR - EXCEPTION (000006d9) :  WStatus: EPT_S_NOT_REGISTERED
      ERROR on NtFrs_0005.log : "EPT_S_NOT_REGISTERED(This may indicate that DNS returns the IP address of the wrong computer. Check DNS records being returned, Check if FRS is currently running on the target server. Check if Ntfrs is registered with the End-Point-Mapper on target server!)" : <SndCsMain:                     3904:   884: S0: 14:20:25> :SR: Cmd 010d9150, CxtG 6f19d45e, WS EPT_S_NOT_REGISTERED, To   XXcs009.XXC.COM Len:  (372) [SndFail - rpc exception]
      ERROR on NtFrs_0005.log : "EPT_S_NOT_REGISTERED(This may indicate that DNS returns the IP address of the wrong computer. Check DNS records being returned, Check if FRS is currently running on the target server. Check if Ntfrs is registered with the End-Point-Mapper on target server!)" : <SndCsMain:                     3904:   904: S0: 14:20:25> :SR: Cmd 010d9150, CxtG 6f19d45e, WS EPT_S_NOT_REGISTERED, To   XXcs009.XXC.COM Len:  (372) [SndFail - Send Penalty]

      Found 12 ERROR_ACCESS_DENIED error(s)! Latest ones (up to 3) listed above
      Found 4758 ERROR_RETRY error(s)! Latest ones (up to 3) listed above
      Found 1005 EPT_S_NOT_REGISTERED error(s)! Latest ones (up to 3) listed above

 ......... failed with 5775 error entries
Checking NtFrs Service (and dependent services) state...
      ERROR : Cannot access SYSVOL share on XXCS001
      ERROR : Cannot access NETLOGON share on XXCS001
 ......... failed 2
Checking NtFrs related Registry Keys for possible problems...
      SYSTEM\CurrentControlSet\Services\NtFrs\Parameters\Enable Journal Wrap Automatic Restore = 0 :: ERROR: Enabling Journal Wrap Automatic Restore is NOT recommended in post-SP2 version of FRS. Please see KB.292438 (Troubleshooting Journal_Wrap Errors on Sysvol and DFS Replica Sets) for further information!
      SYSTEM\CurrentControlSet\Services\Netlogon\Parameters\SysvolReady = 0 :: ERROR: SysvolReady is not set to 1 :: SYSVOL is likely not Sharing! This key should NOT be changed manually but this should be addressed! See article KB.327781 (How to Troubleshoot Missing SYSVOL and NETLOGON Shares on Windows Server) for further information!
failed with 2 error(s) and 0 warning(s)

Checking Repadmin Showreps for errors...passed
0
 

Author Comment

by:uspiv
Comment Utility
FRSDIAG test on XXCS004
------------------------------------------------------------
FRSDiag v1.7 on 4/2/2011 2:28:19 PM
.\XXCS004 on 2011-04-02 at 2.28.19 PM
------------------------------------------------------------

Checking for errors/warnings in FRS Event Log ....       
NtFrs      3/30/2011 6:40:18 PM      Error      13568      The File Replication Service has detected that the replica set "DOMAIN SYSTEM VOLUME (SYSVOL SHARE)" is in JRNL_WRAP_ERROR.         Replica set name is    : "DOMAIN SYSTEM VOLUME (SYSVOL SHARE)"     Replica root path is   : "c:\windows\sysvol\domain"     Replica root volume is : "\\.\C:"      A Replica set hits JRNL_WRAP_ERROR when the record that it is trying to read  from the NTFS USN journal is not found.  This can occur because of one of the  following reasons.         [1] Volume "\\.\C:" has been formatted.     [2] The NTFS USN journal on volume "\\.\C:" has been deleted.     [3] The NTFS USN journal on volume "\\.\C:" has been truncated. Chkdsk can truncate  the journal if it finds corrupt entries at the end of the journal.     [4] File Replication Service was not running on this computer for a long time.     [5] File Replication Service could not keep up with the rate of Disk IO activity on "\\.\C:".      Setting the "Enable Journal Wrap Automatic Restore" registry parameter to 1 will  cause the following recovery steps to be taken to automatically recover from  this error state.     [1] At the first poll, which will occur in 5 minutes, this computer will be  deleted from the replica set. If you do not want to wait 5 minutes, then  run "net stop ntfrs" followed by "net start ntfrs" to restart the File  Replication Service.     [2] At the poll following the deletion this computer will be re-added to the  replica set. The re-addition will trigger a full tree sync for the replica set.        WARNING: During the recovery process data in the replica tree may be unavailable.  You should reset the registry parameter described above to 0 to prevent  automatic recovery from making the data unexpectedly unavailable if this  error condition occurs again.        To change this registry parameter, run regedit.        Click on Start, Run and type regedit.        Expand HKEY_LOCAL_MACHINE.    Click down the key path:       "System\CurrentControlSet\Services\NtFrs\Parameters"    Double click on the value name       "Enable Journal Wrap Automatic Restore"    and update the value.        If the value name is not present you may add it with the New->DWORD Value function  under the Edit Menu item. Type the value name exactly as shown above.      

NtFrs      3/30/2011 12:34:05 PM      Error      13568      The File Replication...

NtFrs      3/9/2011 8:14:48 AM      Error      13568      The File Replication...

NtFrs      1/6/2010 4:05:05 PM      Warning      13508      The File Replication Service is having trouble enabling replication  from WRPCS009 to WRPCS004 for c:\windows\sysvol\domain using the DNS name wrpcs009.WRPC.COM. FRS will keep retrying.     Following are some of the reasons you would see this warning.         [1] FRS can not correctly resolve the DNS name wrpcs009.WRPC.COM from this computer.     [2] FRS is not running on wrpcs009.WRPC.COM.     [3] The topology information in the Active Directory for this replica has not  yet replicated to all the Domain Controllers.         This event log message will appear once per connection, After the problem  is fixed you will see another event log message indicating that the connection  has been established.      

NtFrs      1/6/2010 3:58:43 PM      Warning      13508      The File Replication service is having...

NtFrs      1/5/2010 3:39:13 PM      Warning      13508      The File Replication...
      WARNING: Found Event ID 13508 errors without trailing 13509 ... see above for (up to) the 3 latest entries!
 ......... failed 4
Checking for errors in Directory Service Event Log .... passed
Checking for minimum FRS version requirement ... passed
Checking for errors/warnings in ntfrsutl ds ... passed
Checking for Replica Set configuration triggers... passed
Checking for suspicious file Backlog size... passed
Checking Overall Disk Space and SYSVOL structure (note: integrity is not checked)... passed
Checking for suspicious inlog entries ... passed
Checking for suspicious outlog entries ... passed
Checking for appropriate staging area size ... passed
Checking for errors in debug logs ...
      ERROR on NtFrs_0001.log : "RPC_S_CALL_FAILED_DNE(Indicates RPC Session was established to target, but there was a failure to send RPC call package. Check for Networking problems!)" : <FrsDsGetName:                  2308:  4580: S0: 19:59:00> :DS: ERROR - DsCrackNames(cn=XXcs001,ou=domain controllers,dc=XXc,dc=com, 00000002);  WStatus: RPC_S_CALL_FAILED_DNE
      Found 1 RPC_S_CALL_FAILED_DNE error(s)! Latest ones (up to 3) listed above
 ......... failed with 1 error entries
Checking NtFrs Service (and dependent services) state...passed
Checking NtFrs related Registry Keys for possible problems...passed
Checking Repadmin Showreps for errors...passed
0
 
LVL 4

Expert Comment

by:pbrane
Comment Utility
Hi,

I can see a few problems looking at this lot.

XXCS004 is in Journal wrap and it's the last DC
Looks like it might be struggling to talk to XXCS001 via RPC
and perhaps (but I'm not 100% sure) you might have an SMTP link configured in Sites and services which is failing as well as an IP link.

The Journal Wrap is what’s going to be preventing the D2 restore from working. We need to get this sorted and check on the other things as well to make sure we are in a good place.

Don’t do anything regarding the Journal wrap yet, I need 10 minutes to get a game plan together.

If you could see if you have an SMTP site link as well as an IP site link in Sties and services that would tick one off the list.

Thanks,
0
 
LVL 4

Assisted Solution

by:pbrane
pbrane earned 400 total points
Comment Utility
Hi,

We need to perform the following tasks:

Disable the FRS service on all three domain controllers.

Working on XXCS004 first, we need to shake it out of the journal wrap it's stuck in. We should perform a D4 restore on this first. We haven't got much choice now.

Before we do this, can you just copy the contents of the sysvol folder somewhere else on the same partition for peace of mind please? (I know you have a sysstate backup as well, but it's easier than extracting it from there.)

Then perform a D4 restore on XXCS004 as above and leave the FRS service disabled on the other two.

again we are looking out for the 13516 event log so we know this has completed successfully.

Then we can move on.

Please don't forget to look at the SMTP site links I mentioned above.

Thanks,
0
 

Author Comment

by:uspiv
Comment Utility
AD Sites & Services showing IP
SMTP is empty.
Note that when I open AD Sites and Services on XXC001, It says Active Directory Sites and Services [xxc004.XXC.COM]
0
 

Author Comment

by:uspiv
Comment Utility
Performed the D4 BurFlags on XXC004. It was successful with the 13516 event ID showing almost immediately.
0
 
LVL 4

Expert Comment

by:pbrane
Comment Utility
Hi,

OK, the SMTP was only a hunch.

It seems that even through all this that AD it's self is still replicating successfully so I'm hoping the SYSVOL's are the only problem.

Can you perform a D2 restore on XXC001 again now please? Still leaving the FRS service disabled on 009.

Thanks,
0
 

Author Comment

by:uspiv
Comment Utility
whoopee!!! We got some success!!!
XCS001 passed the BurFlags D2 Restore. 13516 event ID has occured. I guess I'll go ahead and apply the D2 Restore on XXCS009.
0
 
LVL 4

Expert Comment

by:pbrane
Comment Utility
YES!!!!!!


Yeah, try your luck with 009 now.

Let me know,

Thanks,
0
 

Author Comment

by:uspiv
Comment Utility
Beautiful! It worked like charm. event ID 13516 occured immediately. The policies and Scripts folders now in SYSVOL. What do I do with the DO_NOT_REMOVE_NtFrs_PreInstall_Directory folder?
0
 
LVL 4

Expert Comment

by:pbrane
Comment Utility
Hi,

That’s really good news.

No leave those folders where they are. They won't hurt. It's all part of the "system" I've never deleted them to be honest, bit I remember reading somewhere once that you shouldn't do.

How do DCDIAG and NETDIAG look now on all three servers?

Obviously all the tests that check event logs will still fail but we can ignore those now.

Thanks,
0
 

Author Comment

by:uspiv
Comment Utility
The 3 DCs passed both DCDIAG and NETDIAG. XXCS001 failed frsevent. This is ok since it was looking at the eventlog.

I'll proceed to test to see if the issue with the Group Policy Object is cleared.
0
 
LVL 4

Expert Comment

by:pbrane
Comment Utility
Brill,

Ok, let me know.

Thanks,
0
 

Author Comment

by:uspiv
Comment Utility
I still have the persistend DNS server warning event ID 4521. Where is it fetching the . (dot) zone from?
0
 
LVL 4

Expert Comment

by:pbrane
Comment Utility
Hi,

Possibly a dodgy AD partition.

Can you run dnscmd your ip /zoneinfo .

And let me know what the Directory partition is at the bottom please?

Thanks,
0
 

Author Comment

by:uspiv
Comment Utility
...\dnscmd 200.200.200.166 /zoneinfo .
Zone query result:
Zone info:
        ptr                   = 00082FF0
        zone name             = .
        zone type             = 0
        update                = 0
        DS integrated         = 1
        data file             = (null)
        using WINS            = 0
        using Nbstat          = 0
        aging                 = 0
          refresh interval    = 0
          no refresh          = 0
          scavenge available  = 0
        Zone Masters
        NULL IP Array.
        Zone Secondaries
        NULL IP Array.
        secure secs           = 0
        directory partition   = AD-Domain     flags 00000015
        zone DN               = DC=RootDNSServers,cn=MicrosoftDNS,DC=DomainDnsZo
nes,DC=XXC,DC=com
Command completed successfully.
0
 
LVL 4

Expert Comment

by:pbrane
Comment Utility
Hi,

We need to see if anything else is stored in the AD-Domain partition.

Can you post results of: dnscmd /enumzones

Please,

Thanks,
0
 

Author Comment

by:uspiv
Comment Utility
.\dnscmd /enumzones
Enumerated zone list:

        Zone count = 6

 Zone name                      Type       Storage         Properties

 .                              Cache      AD-Domain
 200.200.200.in-addr.arpa       Primary    AD-Legacy       Rev
 facebook.com                   Stub       AD-Legacy       Down
 google.com                     Stub       AD-Legacy
 xxc.com                       Primary    AD-Legacy       Secure Aging
 yahoo.com                      Stub       AD-Legacy

Command completed successfully.

Please note that facebook.com zone is not working ok. You get the error: Zone Not Loaded by DNS Server when I try to access the zone in dnsmgmt.
0
 
LVL 4

Expert Comment

by:pbrane
Comment Utility
Hi,

Can you check a couple of places for me.

If you open Active users and computers, select advanced view, then open System then MicrosoftDNS nd check for a "." in there.

Also can you use ADSIEdit and connect to the following connection point: dc=domaindnszones,dc=xxc,dc=com

and check for a "." in there.

Thanks,
0
 

Author Comment

by:uspiv
Comment Utility
Did all that earlier...

no "," in MicrosoftDNS although there is a "RootDNSServers" folder

A Referal was returned from the Server is the result of the ADSIEdit. Same result also if I choose to point to forestDNSzone.
0
 
LVL 4

Expert Comment

by:pbrane
Comment Utility
Hi,

Sorry missed that above.

Right so in that case it looks like you have two partitions which aren't playing ball then.

Nothing else is using those partitions so this should be pretty painless.

Just to make sure though, can you run the following commands to back up your current zones to file please?

type dnscmd servername /zoneexport facebook.com facebook.com.dns
type dnscmd servername /zoneexport google.com google.com.dns
type dnscmd servername /zoneexport xxc.com xxc.com.dns
type dnscmd servername /zoneexport yahoo.com yahoo.com.dns

Now we need to remove the domaindnszones and forestdnszones partitions from AD and put them back again. We'll do one zone at a time.

Here's how.

Drop in to the NTDSUTIL command line
type Domain Management (press enter)
type connections (press enter)
type connect to server server NETBios name (press enter)
type quit (press enter)
type list NC replicas dc=domaindnszones,dc=xxc,dc=dom (press enter)
write down all the servers which contain this partition from the list produced, should be CN=server name. If you not sure please post the output.

Now we know which servers contain this partition we'll remove it from each server in the list.

Whilst still in NTSUTIL:

Type remove NC replica dc=domaindnszones,dc=xxc,dc=com NULL
This will remove it from the current server. Then for the rest of the servers in the list.

Do this on the current server, for each of the other servers in the list:
type connections (press enter)
type connect to server next server in list (press enter)
type quit (press enter)
type remove NC replica dc=domaindnszones,dc=xxc,dc=dom NULL (press enter)

After removing it from all of them, Wait 10 minutes to make sure this change has replicated amongst all three domain controllers.

Once we have removed the partition from all servers in the list, we will delete it from one server only

Still in NTDSUTIL:

Type connections (press enter)
type connect to server Net Bios name of server you're on (press enter)
type quit (press enter)
type Delete NC dc=domainDNSzones,dc=xxc,dc=com (press enter)

Wait 10 minutes to make sure this change has replicated amongst all three domain controllers.

Make sure the zone has gone, come out of NTDUSTIL by typeing quit a couple of times till you are back to normal command line.

type dnscmd servername /directorypartitioninfo domaindnszones.xxc.com

You should get the following error: Command failed:  DNS_ERROR_DP_DOES_NOT_EXIST     9901

which is good, shows it's gone.

Now type dnscmd servername /createbuiltindirectorypartitions /Domain (press enter)

now see if it was created by typing: dnscmd servername /enumdirectorypartitions

you should see domaindnszones.xxc.com listed as one of the partitions again.

Wait 10 minutes to make sure this change has replicated amongst all three domain controllers.

That's one zone taken care of now. You should be able to connect to the Domaindnszones with ADSI edit again and hopefully the error message should be gone.

It might be worth, at this point, trying to connect to the ForestDNSZones in ADSI edit again before we go through all this again for that zone as well.

Thanks,
0
 

Author Comment

by:uspiv
Comment Utility
Did all you requested long time back but when I type

dnscmd xxc001/directorypartitioninfo domaindnszones.xxc.com, I do not get the Command failed error. I still get listing for the 3 DCs. I get the likes of the following for all the 3 DC servers

..\dnscmd xxcs001 /directorypartitioninfo domain
dnszones.xxc.com
Directory partition info:
  DNS root:   DomainDnsZones.xxC.com
  Flags:      0x15 Enlisted Auto Domain
  State:      0
  Zone count: 1
  DP head:    DC=DomainDnsZones,DC=xxC,DC=com
  Crossref:   CN=45e49eb4-99eb-4bb0-8810-50db1c329a53,CN=Partitions,CN=Con...
  Replicas:   3
    CN=NTDS Settings,CN=xxCS009,CN=Servers,CN=Default-First-Site-Name,CN=...
    CN=NTDS Settings,CN=xxCS004,CN=Servers,CN=Default-First-Site-Name,CN=...
    CN=NTDS Settings,CN=xxCS001,CN=Servers,CN=Default-First-Site-Name,CN=...
Command completed successfully.

0
 
LVL 4

Expert Comment

by:pbrane
Comment Utility
Hi,

OK. bummer!

Luckily enough, when I was testing this out before I posted, on one out of the three test rigs, when I tried a second time I got exactly what you said you have now.

I could delete it over and over again but it would always remain.

That now seems to be a good platform for resolving your issue.

Let me have a dig around on this server,

I'll post back when I can see what it is.

Thanks,
0
 
LVL 4

Assisted Solution

by:pbrane
pbrane earned 400 total points
Comment Utility
Hi,

Oddly enough, I just tried messing around with the order of the commands a little and it deleted it straight away.

This is exactly what I did in this order.

1. Quit all CMD boxes currently open
2. Opened support tools CMD box
3. Went straight into NTDSUTIL and deleted the partition, no remove first, just delete.
4. Checked with dnscmd xxcs001 /directorypartitioninfo domaindnszones.xxc.com (equiv) and it had gone.

Can you just try doing it in this exact order please? Before I did this, I had exactly the same problem as you.

Thanks,
0
 

Author Comment

by:uspiv
Comment Utility
It worked that way...
..\domain management: Delete NC dc=domainDNSzones,dc=xxc,dc=com
The operation was successful. The partition has been marked for removal from the
 enterprise. It will be removed over time in the background.

Note: Please do not create another partition with the same name until the server
s which hold this partition have had an opportunity to remove it. This will occu
r when knowledge of the deletion of this partition has replicated throughout the
 forest, and the servers which held the partition have removed all the objects w
ithin that partition. Complete removal of the partition can be verified by consu
lting the Directory event log on each server.
domain management: quit
ntdsutil: quit
Disconnecting from xxcs001...

C:\Program Files\Windows Resource Kits\Tools>dnscmd xxcs001 /directorypartition
info domaindnszones.xxc.com
Directory partition info query failed
    status = 9901 (0x000026ad)

Command failed:  DNS_ERROR_DP_DOES_NOT_EXIST     9901  (000026ad)
0
 

Author Comment

by:uspiv
Comment Utility
recreated the domaindnszones on xxc001.
0
 
LVL 4

Expert Comment

by:pbrane
Comment Utility
Hi,

Ahhh, great!

Perhaps the remove command before was doing something to prevent the delete command from taking effect. It was working for me that way on another test rig just not this one, or yours obviously.

So fingers crossed, you can connect to the domaindnszones using adsiedit now?

and the .(dot) zone message will now stop.

Thanks,
0
 

Author Comment

by:uspiv
Comment Utility
Yes I was able to connect to the domaindnszones using adsiedit.
The DNS warning 4521 has disappeared.
Event ID 4005 appeared - "The DNS server received indication that zone . was deleted from the Active Directory. Since this zone was an Active Directory integrated zone, it has been deleted from the DNS server."

Now I get several 5504 eventID. - "The DNS server encountered an invalid domain name in a packet from 218.61.3.155. The packet was rejected...."

0
 
LVL 4

Expert Comment

by:pbrane
Comment Utility
Hi,

4005 is just a confirmation of exactly what you have just done. Nothing to worry about.

5004 is nothing to worry about either. If you want to try and correct it though, perhaps you should look at this hotfix:http://support.microsoft.com/kb/920162

Can you connect to the forestsnszones now with adsiedit as well?

If not, you may need to delete and add this partition again as well.

Thanks,
0
 

Author Comment

by:uspiv
Comment Utility
forestdnszones had same referral issue on adsiedit, so I deleted the partition and recreated it. I can now connect to the forestdnszones with adsiedit.
Is there a way I can re-build the facebook.com zone in the Forward Lookup Zones. I still get Zone Not Loaded by DNS Server.
0
 
LVL 4

Expert Comment

by:pbrane
Comment Utility
Hi,

OK good.

What server IPs are you using for your stub zone to facebook.com?

If you go into the properties of the zone facebook.com, and click Find Names do all the IP addresses in the list resolve?

If not remove the dead ones. The one's I can get to resolve are:
66.220.151.20
69.63.186.49
66.220.145.65

once you have removed the dead ones, click Find Names again to make sure what’s left is still resolving then click OK.

Then right click the zone again and select transfer from master.

This should bring it back on line again.

Thanks,

0
 
LVL 4

Expert Comment

by:pbrane
Comment Utility
P.s the total list of name servers to have for Facebook

204.74.66.132
204.74.67.132
66.220.151.20
69.63.186.49
66.220.145.65

You should try all of these for a start in your IP list, just because didn't work for me it doesn't mean they won't for you.