Link to home
Avatar of ovprit
ovprit

asked on

FRS Replication Failures between Windows 2003 SP2 Domain Controllers

-=SETUP=-

--HW--
1 x Windows 2003 SP2 x86 Server FSMO PDC1
1 x Windows 2003 R2 SP2 x86 Server BDC

--CONFIG--
* DFS is configured to have the DCs as the "root servers" so that when a client system/domain member types \\<fqdn> they get a "Shared" folder which they can then delve into to their hearts content.

* The FSMO has been set as D4 on the Burflags setting before with no luck in resolving issue

* The BDC has been set with the D2 Burflags setting with no luck in resolving issue

-=ISSUE=-

From Event Viewer on the BDC:
*Source: NTDS Replication    Event ID: 2087
*Source: NtFrs    Event ID: 13508

As a result, those members authenticating to the domain via the BDC, when they type: \\<fqdnofBDC> they don't get the SYSVOL folder... And there are therefore GPO failures etc. etc. *surprise surprise*

-=STEPS TAKEN THUS FAR=-

Followed recommendations in this order from : http://eventid.net/display.asp?eventid=13508&eventno=349&source=NtFrs&phase=1

1) Reset the BDC Machine Account per Kevin Barnas' suggestion RESULT= NO CHANGE

2) Per Ionut Marin, verified that FRS is running and is accessible by BDC to PDC by running NTFRSUTL VERSION <fqdnofPDC> which gave back successful output

3) Per Event Viewer, ran DCDIAG /test:dns on the PDC (see attached file for dump)

4) stopped ntfrs on BDC, stopped ntfrs on PDC, set Burflags on PDC to D4, which generated event ID: 13516 (source: NtFrs) saying that is succesfully initialized... then set Burflags on BDC to D2 and started ntfrs, which again failed with event ID:13508 stating it cannot talk to PDC

5) Banged head repeatedly on keyboard for 5 minutes RESULT = Temp. blurry vision & burning feather smell

6) Verified that "failed" DNS entry mentioned in attached dump is pointing to correct record

7) Followed instructions from http://support.microsoft.com/kb/216498 upto step 12 and verified that ONLY the 2 DCs are showing

-=SUSPICIONS=-

* This started happening AFTER I removed a failing FSMO... The NOW PDC, back then was a BDC as well. I xferred all roles per http://www.petri.co.il/transferring_fsmo_roles.htm and then demoted dying ex-FSMO (bad RAID card) then removed from domain, followed up with a manual clean-up/verification cycle by me by going through ALL DNS and AD records and ensuring the manual deletion of any/all related records. Finally, turned off ex-FSMO and it's been off for 2 weeks now.

* The DNS failures are NOT making sense to me... The mentioned IPs for the NS servers of the ISP are valid... why are they failing? Is this truly an issue? Since I took over this domain (did not build it, new sysadmin since April '08) that issue was always there... I assumed an app of some sort might be requiring it...

-=REQUESTS=-

* Hopefully, you've deduced that I'm not a n00b when it comes to Win2K3 AD architecture (MSCE:2003); so please don't treat me as such with pointless URL references to definitions.

* I would like clear, consice STEPS on how to go about further debugging this issue any more than I have so far to get to the bottom of this.

* As it is, I've told my users to be patient, and thankfully I've been blessed with an equally patient boss... but I do NOT want to test their limits. So a speedy response is much appreciated, hence the 500 points!

Looking forward to further questions/suggestions... If I don't hear a "good" solution by Tuesday, 7/22/08 I will have to give up and pay the $250 to talk to M$ about it.

Thx,
dcdiag-test-dns.txt
Avatar of wdelgadop
wdelgadop

first....back up and delete events from your system, so you will be able to see what is happening right now...then...run dcdiag, you will see some problems...after dcdiag, you will have a clue where the problem is...
after, go to Active directory sites and services and check if you have in Sites/Servers only your alives Domain controllers.., then...check if they are global catalog and the connection between servers if you have more than one...are fine....
try also testing conectivity.....every step you do...go to see in your event viewer...he will be your work mate..rating everything going on..
for now...you have work to do...
leet us know how is going..
Avatar of ovprit

ASKER

On which server shall I do all these? The PDC or BDC? Logic dictates PDC, but it's the BDC that is failing to sync so... ?
the one who has the FSMO roles..
Avatar of Darius Ghassem
Do you have only your internal DNS servers listed on the TCP\IP settings on the DCs? You shouldn't have any external DNS servers listed as your Preferred or Second DNS servers under the TCP\IP settings.
Go through this post. There is a discussion at the bottom of the post also which has fixed this issue. Check it out.

https://www.experts-exchange.com/questions/21740439/FILE-REPLICATION-SERVICE-is-having-trouble-Event-ID-13508.html
Avatar of ovprit

ASKER

@dariusg: If you read my post, you would see that I've already done the D4/D2 Burflag settings... the PDC/FSMO completes the D4 cycle just fine... the BDC does NOT however complete the D2 cycle... it keeps dying with the mentioned error codes.
Do you have two NIC in the servers? How about the DNS question above? Sorry I must have missed that solution you tried in your post.
Avatar of ovprit

ASKER

@wdelgadop:

Pls find attached a  copy of dcdiag which I ran on the PDC/FSMO with the "dcdiag /c /v /f:dcdiag_comprehensive.txt"
dcdiag-withflags.txt
Avatar of ovprit

ASKER

@dariusg:

On the BDC that is failing, yes I have 2 NICs, which 1 has the "dead" IP of the ex-FSMO and the other IP is it's OWN.

i.e.
ex-FSMO: a.b.c.d
BDC NIC1: a .b.c.e

after removal of ex-FSMO from network and domain:
BDC NIC1: a.b.c.e
BDC NIC2: a.b.c.d
You shouldn't have multihomed computer that is running DNS and AD because of problems like this. Can you disable one of the NICs?
Avatar of ovprit

ASKER

@wdelgadop:


>> back up and delete events from your system : DONE

>> run dcdiag, you will see some problems : DONE see above for dump

>> after, go to Active directory sites and services and check if you have in Sites/Servers only your alives Domain controllers : DONE, VERIFIED ONLY 2 DCs showing which are the PDC and BDC

>> then...check if they are global catalog and the connection between servers if you have more than one...are fine : DONE, VERIFIED ONLY PDC/FSMO has global catalog (per design)

>> try also testing conectivity: DONE, BOTH WAYS from BDC to PDC, and PDC to BDC checked out

During this event, the event viewer gave no errors, only informational msgs
Avatar of ovprit

ASKER

@dariusg:

OK, will disable the 2nd NIC which I had assigned the ex-FSMO IP to. WIll also clean-up DNS of all entries and force BDC to re-register it's DNS.
Make sure you do a netdiag /fix and restart the netlogon service.  Does the other DC\DNS have two NICs?
Avatar of ovprit

ASKER

Hmm... Looks like during copy-paste I forgot the very bottom part of dcdiag, here it is! Some DNS failures for sure :/
dcdiag-withflags-pt2.txt
Avatar of ovprit

ASKER

@dariusg:

Yes it does, but it's unplugged... I'm guessing you're going to want me to re-do that one as well by disabling NIC and cleaning up DNS as well as restarting netlogon and doing a netdiag /fix ?
Yes. This has been a big issue lately. The network works fine then a restart happens or a little change then big issues start and it all ends up to be that the DNS and\or DCs are running on two NICs.
Avatar of ovprit

ASKER

OK both DCs have now their 2nd NICs disabled, netdiag /fix has been run and both have had their netlogon services restarted. I also verified that the active NIC is the one on the very top via Networking-->Advanced Networking
Avatar of ovprit

ASKER

@dariusg:

After the modifications, I tried a D2 Burflag setting on the BDC, and the result was the same... It took 4 minutes for it to fail with:

Source: NtFrs EventID:13508

With the following message:

The File Replication Service is having trouble enabling replication from PDC_FSMO to BDC for c:\shared using the DNS name fsmo.pdc.fqdn. FRS will keep retrying.
 Following are some of the reasons you would see this warning.
[1] FRS can not correctly resolve the DNS name fsmo.pdc.fqdn from this computer.
 [2] FRS is not running on fsmo.pdc.fqdn.
 [3] The topology information in the Active Directory for this replica has not yet replicated to all the Domain Controllers.
 
 This event log message will appear once per connection, After the problem is fixed you will see another event log message indicating that the connection has been established.

**********************************************************************************************************************************

As I've said previously, I've made sure that both DCs talk to each other just fine via Sites and Services, and have verified that FRS is running on FSMO/PDC

So... GAAAAAAAH! >_<
Avatar of ovprit

ASKER

I've just verified that in the PDC/FSMO event logs, in Security there is a corresponding entry for when the BDC is attempting to rebuild the SYSVOL, and it's a SUCCESSFUL AUDIT!

i.e. Source: Security    Event ID: 566     Notes: Control Access Replicating Directory Changes All domainDNS

So... what is it babbling about saying it can't access it?!?!
When you check DNS zones do you see a A records ,SOA , and NS records for both servers. Check this on both servers' DNS zone. Can you do an ipconfig /all for both servers and post the results?
yes, as dariusg says....try dns and check they are working fine...ex. Root hints in your DNS1 should be pointing to DNS2 and same with DNS2(poiting to DNS1) and same thing with Forwarders, DNS1 should have just DNS2 and DNS2 the DNS1, check interfaces and make sure that both NIC you disabled are not listed in there..all those steps check them in your DNS1 and DNS2...
ASKER CERTIFIED SOLUTION
Avatar of ovprit
ovprit

Blurred text
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial