DFSR is not replicating SOME replication folders

Posted on 2013-01-19
Last Modified: 2013-02-18
I have two file servers, which we'll call fs-01 and fs-02. They have one replication group, which we'll call RG-01. Within that replication group are 25 sets of replicated folders. Its clear (by manually looking) that replication is not occurring in 3 of those 25 folder sets. Heres a dump of my troubleshooting so far

The servers are virtual running Windows 2008 R2 in a ESX5.1 environment. Underlying storage is iSCSI SAN. The servers are at 2 different locations (HQ and Hotsite) connected by a 1GB P2P network. The network is trunked across the P2P, though fs-02 is on a different subnet and is defined as such in sites and services.
I inherited this system about 5 days ago. Two weeks prior to that, a network architecture change (the trunking) caused some instabilities in the underlying SAN storage system. By all accounts those have been resolved now.
When I run the "Create Diagnostic Report" and create a propogation test and subsequent report the test files at 22 of the 25 folders are replicated nearly instantaneously (< 1 sec)
I ran the same propogation test in the three affected folders, and 3 days later they still show as "Incomplete tests" - which I take to mean they havent replicated.
If I run a dfsrdiag backlog /Rmem:FS-02 /Smem:FS-01 /RGName:RG-01 /RFName:"Folder name" from the command prompt it returns "No Backlog - member FS-02 is in sync with partner FS-01. Operation Succeeded. It even says this despite the fact that diagnostic test files still show as not yet synced in the report.
These replicated folders are GIGANTIC - 3.4 TB, 8 TB, and 1 TB with individual file sizes sometimes as large as 80 GB, but with (relatively speaking) few files in each individual subfolder - . The staging quote for these 3 volumes is set at 750 GB. All the other replicated folders have a staging quote of 10 GB.
I am ASSUMING that the files in FS-02 were in fact replicated, and not seeded, and that replication worked once upon a time for these folders. That said, I DO notice that these 3 folders all have special characters - ( and ) to be precise - in them.
There are no errors in the application or system logs on either server for DFS-Svc, DFS Replication, DFSR, or DFSR Audit. There are a handful of info notices for routine things like "DFS Server has finished initializing"
Ive looked at the logs in C:\windows\debug, and there are a LOT of them there, but nothing really sticks out as an error.
Anyone have any thoughts, or additional diagnostic tests I can run?
Question by:Eric_Price
  • 3
  • 2
LVL 36

Accepted Solution

ArneLovius earned 400 total points
Comment Utility
special characters are not an issue, same goes for files and folders with a space at the end of the name (mac users...), then you have the joy of fixing them in two locations....

If you had "instabilities in the underlying SAN storage system", i'd check for disk corruption, however with the volume sizes that you have posted, I'd test by copying the files to "something else" or "somewhere else" locally at each site, using something that logs the copy output, possibly robocopy ?

You could try removing one of the folders from the replication group, refreshing both ends, clearing out staging etc and then adding it back agaiin.

I am presuming that you don't have iSCSI going over the link, just the DFS replication.

With the value of the storage for ~13TB 'm going to guess that the value of the data is not low, have you considered opening a Microsoft PSS case ?

Author Comment

Comment Utility
It may quickly come to opening a case. Im a week on the job here and to be honest my past experience has been with one way replication and the old FRS. The system seems pretty straight forward, and m not averse to tinkering either, so long as I have good backups and Ive taken the time to ask for quick assistance either. No sense reinventing the wheel.

It is just the DFS replication.

Those are a couple of great suggestions. I think I'll try one of them today (Sunday) and then based on the results decide on Monday whether to move forward or call Microsoft. Since this replication is "only" to our hotsite, I dont feel QUITE the pressure I think I would if it were something the more directly affected day to day operations, but I still hate the thought of letting it go very long at all. Its almost like an invitation for disaster. lol

Author Comment

Comment Utility
To add some additional information, the most critical folder (labeled "Analysis (Current)" shows no backlog on fs-01, but on fs-02 (the hot site server) dfsr diag reports that there is a backlog of over 900,000 files in that folder. Its clearly been successful replicating some over the past couple of weeks, because those conflicts on fs-01 end up in the conflicts/deleted folder.

The files on fs-01 are the only ones that will ever be modified. There the ones I want to keep at all costs.

PS - I have no good backup, since my predecessor has all the backups going over the P2P link (Appassure). It worked great as long as it was happy making incremental backups, but now it seems to want a new base, and as you can imagine one cant simply make a new base of 14 TB over a 1GB P2P link.

Given the sensitivity, a call to MS is in order methinks. And a stiff drink and a backup.
Backup Your Microsoft Windows Server®

Backup all your Microsoft Windows Server – on-premises, in remote locations, in private and hybrid clouds. Your entire Windows Server will be backed up in one easy step with patented, block-level disk imaging. We achieve RTOs (recovery time objectives) as low as 15 seconds.

LVL 36

Assisted Solution

ArneLovius earned 400 total points
Comment Utility
Nothing wrong with incremental backups that become synthetic full backups (such as Microsoft DPM, or even rsync with hardlinks to existing files, or snapshots) , just incremental backups though is a different kettle of fish...

Presuming you have at least 10Gb Ethernet at each site, I'd be very tempted to order first thing on Monday a box that can take a "quantity" of "inexpensive" disks and setup a backup server at the remote site that you can start ASAP and then bring back to the main site/redeploy when complete...

If you're just backing up files, then using rsync to a *nix box using zfs to do daily snapshots can be an inexpensive simple solution, robocopy in threaded mode can be faster in a LAN environment if you're just comparing modification times, but robocopy only copies whole files, so if you have large files that only have small changes, rsync can be significantly faster...

Anyway, call Microsoft PSS,  pay the ~£200 and open a case, they are open 24/7
LVL 26

Assisted Solution

Pber earned 100 total points
Comment Utility
It sucks the logs are helping you out.

Have you seen this article:

Is your staging area larger than 32 of the largest files in the folders in question?

Author Closing Comment

Comment Utility
Im closing it out and spreading the wealth on the points. Thanks for the assist guys.

Featured Post

Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

Join & Write a Comment

How to update Firmware and Bios in Dell Equalogic PS6000 Arrays and Hard Disks firmware update.
Possible fixes for Windows 7 and Windows Server 2008 updating problem. Solutions mentioned are from Microsoft themselves. I started a case with them from our Microsoft Silver Partner option to open a case and get direct support from Microsoft. If s…
This tutorial will show how to push an installation of Backup Exec to an additional server in both 2012 and 2014 versions of the software. Click on the Backup Exec button in the upper left corner. From here, select Installation and Licensing, then I…
This tutorial will walk an individual through the steps necessary to configure their installation of BackupExec 2012 to use network shared disk space. Verify that the path to the shared storage is valid and that data can be written to that location:…

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now