DFSR comparing files but not replicating them

Hello all,

I have a rather unique issue here.  Firstly, my DFSR design.

I have a full mesh topology with 2 servers and 2 different replication groups.  We'll call them server1, server2, group1, and group2

Secondly, my issue,

Server1 Group1 replicates successfully
Server2 Group1 replicates successfully
Server1 Group 2 replicates sucessfully
Server2 Group 2 will not replicate.

In looking at the debug logs on Server2, I can change a file and it will compare to Server1 to see which one "wins". This is as far as it goes.  It does not copy.  There are no errors in the event log nor the debug log.  It appears to just simply sits in the backlog waiting to copy.  It was functioning normally this morning and stopped mid-day.

In looking at the debug logs on Server1, the file is not mentioned anywhere.

This has been running without issue for months so there shouldn't be a configuration issue.

I will be restarting server2 when I can get a small windows of downtime, but it will be a while.  I currently have about 12K files backlogged at the moment.  

Attached snipped is from the dfsr log file.

Any help is appreciated.  Thanks!

--Nate
##ADDING THE FILE TO THE REPLICATED FOLDER##

20100330 17:22:57.209  400 LDBX  3548 Ldb::Insert Inserting idRecord:
+	fid               0x13000000074EEB
+	usn               0x208f20000
+	uidVisible        0
+	filtered          0
+	journalWrapped    0
+	slowRecoverCheck  0
+	pendingTombstone  0
+	recUpdateTime     16010101 00:00:00.000 GMT
+	present           1
+	nameConflict      0
+	attributes        0x20
+	gvsn              {D9809A24-8DBC-4127-944C-05352E2DDB7F}-v1388548
+	uid               {D9809A24-8DBC-4127-944C-05352E2DDB7F}-v1388548
+	parent            {FBF35F2F-C712-4083-86A3-4F37D2375DD2}-v1
+	fence             16010101 00:00:00.000 
+	clock             20100330 22:22:57.209
+	createTime        20100330 22:22:57.209 GMT
+	csId              {FBF35F2F-C712-4083-86A3-4F37D2375DD2}
+	hash              00000000-00000000-00000000-00000000
+	similarity        00000000-00000000-00000000-00000000
+	name              eetest2.txt
+	
20100330 17:22:57.209  400 USNC  2448 UsnConsumer::CreateNewRecord ID record created from USN_RECORD:
+	USN_RECORD:
+	RecordLength:        88
+	MajorVersion:        2
+	MinorVersion:        0
+	FileRefNumber:       0x13000000074eeb
+	ParentFileRefNumber: 0x100000000001e
+	USN:                 0x208f20000
+	TimeStamp:           20100330 17:22:57.209 Central Standard Time
+	Reason:              Basic Info Change Close File Create 
+	SourceInfo:          0x0
+	SecurityId:          0x47f
+	FileAttributes:      0x20
+	FileNameLength:      22
+	FileNameOffset:      60
+	FileName:            eetest2.txt
+

##ALTERING THE CONTENTS TO TRIGGER COMPARISON##
20100330 17:30:08.319  400 LDBX  3665 Ldb::Update Updating idRecord:
+	fid               0x13000000074EEB
+	usn               0x20901d2a8
+	uidVisible        0
+	filtered          0
+	journalWrapped    0
+	slowRecoverCheck  0
+	pendingTombstone  0
+	recUpdateTime     20100330 22:30:05.084 GMT
+	present           1
+	nameConflict      0
+	attributes        0x20
+	gvsn              {D9809A24-8DBC-4127-944C-05352E2DDB7F}-v1388578
+	uid               {D9809A24-8DBC-4127-944C-05352E2DDB7F}-v1388548
+	parent            {FBF35F2F-C712-4083-86A3-4F37D2375DD2}-v1
+	fence             16010101 00:00:00.000 
+	clock             20100330 22:30:08.319
+	createTime        20100330 22:22:57.209 GMT
+	csId              {FBF35F2F-C712-4083-86A3-4F37D2375DD2}
+	hash              00000000-00000000-00000000-00000000
+	similarity        00000000-00000000-00000000-00000000
+	name              eetest2.txt
+	
20100330 17:30:08.319  400 USNC  2202 UsnConsumer::UpdateIdRecord ID record updated from USN_RECORD:
+	USN_RECORD:
+	RecordLength:        88
+	MajorVersion:        2
+	MinorVersion:        0
+	FileRefNumber:       0x13000000074eeb
+	ParentFileRefNumber: 0x100000000001e
+	USN:                 0x20901d2a8
+	TimeStamp:           20100330 17:30:08.319 Central Standard Time
+	Reason:              Close Data Extend 
+	SourceInfo:          0x0
+	SecurityId:          0x47f
+	FileAttributes:      0x20
+	FileNameLength:      22
+	FileNameOffset:      60
+	FileName:            eetest2.txt
+

Open in new window

NateWilliamsAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

NateWilliamsAuthor Commented:
Well, as most things Microsoft, it seems to suddenly begin functioning again.  I suspect it may have something to do with the staging folder being above the "high watermark".  In the event log, there was an information  notice stating that successfully deleted old staging files for the replication folder within the share that I was having issues with.

If anyone has experienced this or have some knowledge they would like to share, I will still award points if information can be provided that would be relevant to my issue.
0
Justin OwensITIL Problem ManagerCommented:
We have about 200 Terabytes in a DFS replica (which is WAY out of scope for what DFS was designed).  We typically have about 80% of our files in good replica status and about 20% in a backlog.  The 20% that stays backlogged are generally in one of two categories: either the entire filename is too long (more than 256 characters including path) or the permissions have been changed.  Ours is unique in that each site has complete administrative control of everything below the ROOT share of their DFS structure.  As a result, they occasionally remove SYSTEM and Administrators from permissions.  
When file that normally replicate, stop, and then restart on their own, we generally chalk it up to the fact that we are so far out of scope we need to expect some discrepancies.  Without looking at your hidden replica system folders, your log files, and your event viewer, it would be almost impossible to tell you why those errors are happening.  If memory serves correctly, MS doesn't support more than 20 terabytes, with individual file size needing to be considered as well.  Here are a couple of links on general DFS size considerations:
I hope that helps.
Justin
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
NateWilliamsAuthor Commented:
Helpful information, unfortunately the root cause of the issue was not definitely determined.  The solution given could possibly provide better prevention methods for other users.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Windows Server 2003

From novice to tech pro — start learning today.