Solved

DFS Replication Problems

Posted on 2010-08-17
7
11,670 Views
Last Modified: 2013-11-14
Hello Everyone,

It started a few days ag. We would get staging size too small errors (it was set to about 50gb) and files would take about 16 hours to replicate on a full 24/7 schedule.

Right now replication isn't happening at all. I've changed the staging folder to another drive on our main file server (Runs storage server 2008), and increased the staging quota to 100GB. I'm not sure if I moved the staging folder correctly, is there a procedure for this?

Anyways I'm just wondering if anyone has any ideas on how I can get replication going again, and have it not take 16 hours.

The two servers are involed are a storage server 2008 box, and it's replicating with a windows server 2003 box.

Thanks.
0
Comment
Question by:Methodman85
  • 3
  • 2
  • 2
7 Comments
 
LVL 10

Expert Comment

by:rscottvan
ID: 33464793
How much total data is there?  How much of it changes each day?  How fast is the link between the servers?
0
 
LVL 1

Author Comment

by:Methodman85
ID: 33466047
The backlog shows about 55,000 files on the 2003 box, and about 1300 on the 2008 box. The share is over a TB in size. The staging volume is 750GB on both servers now.

The two servers are on the same LAN so 1gbs. The servers are connected to two separate SAN boxes, they each have 4 gigabit ethernet connections to their respective SANs

The disk que length for the volume that hosts the share sometimes reaches over 200. So maybe more connections are needed to the SAN? I'm not sure what the bottle neck is here.

How do I force this backlog through?
0
 
LVL 15

Expert Comment

by:whoajack
ID: 33467531
Just to clarify, have you run the health reports to see what warnings and/or errors may be returned? is still fresh in my mind, as went through these steps this week.

In case you want to review the steps and run through it: http://www.whoajack.com/greg/Blog/Lists/Posts/Post.aspx?ID=227
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 1

Author Comment

by:Methodman85
ID: 33469074
That link isn't working, here's the health report. It was run without the compare all files option.
 


Server health:      Servers with no errors or warnings (0)      Servers unavailable for reporting (0)              
       Servers with DFS Replication errors (1)      Servers with DFS Replication warnings (2)              
       
                  
Report Loading. Please wait . . .
ERRORS (1 server with errors) (Hide All)
 
      CRPTORFS01P (1 error) (View Server Details)              
      The DFS Replication service is restarting frequently.       
WARNINGS (2 servers with warnings) (Hide All)
 
            CRPTOFS02P (1 warning) (View Server Details)              
      The DFS Replication service is restarting frequently.                     
            CRPTORFS01P (2 warnings) (View Server Details)              
      Pre-existing content is not replicated and is consuming disk space.         
      DFS Replication failed to clean up old staging files for replicated folder Internal.       
SERVERS UNAVAILABLE FOR REPORTING (All servers reporting)
SERVER DETAILS (2 servers) (Hide All)
Rendering content. Please wait . . .

CRPTOFS02P (Hide All)
 
DNS name:      CRPTOFS02P.domain.com         
Domain name:      domain.com         
Reference domain controller:      CRPTORDC02P.domain.com         
IP address:      192.168.7.120,192.168.7.121,192.168.7.122,192.168.7.123,10.100.121.10         
Site:      Default-First-Site-Name         
Time zone:      (GMT-5:00)       
ERRORS (There are no errors to report)
WARNINGS (There is 1 warning to report)

 
                    
      The DFS Replication service is restarting frequently.                
      Affected replicated folders:      All replicated folders on this server.         
      Description:      The DFS Replication service has restarted 3 times in the past 7 days. This problem can affect the replication of all replicated folders to and from this server. Event ID: 1004         
      Last occurred:      Tuesday, August 17, 2010 at 5:24:38 PM (GMT-5:00)         
      Suggested action:      If you restarted the service manually, you can safely ignore this message. For information about troubleshooting frequent service restart issues, see The Microsoft Web Site.       
INFORMATIONAL
 
                    
      Service state: Running              
                  
 
                    
      DFS Replication service uptime: 20 hr. 14 min.                
                    
                    
      DFS Replication service version: 5.2.3790.3959            
 
                    
      Summary of replicated folder status              
      The following table provides a high-level overview of replicated folder status on this server.              
                    
      Replicated Folder      Status      Backlogged Sending Transactions      Backlogged Receiving Transactions      # of Files Received      DFS Replication Bandwidth Savings         
       Internal      Normal      1349      --      562      97.23%         
       Data shown about the number of received files and the DFS Replication Bandwidth Savings accumulate from the time the DFS Replication service is started.
Backlogged transactions are relative to member CRPTORFS01P (crptorfs01p.domain.com).                                    
 
                    
      Current used and free disk space on volumes where replicated folders are stored              
      The following table describes the current used and free disk space on volumes where replicated folders are stored.              
      Volume Path      Volume Label      Volume Size      Free Space      % Free Space      USN Journal Size         
       C:      (has no label)      67.7 GB      60.1 GB      88.8%      0 KB         
       E:      Data      2.93 TB      1.70 TB      58.1%      512 MB         
       S:      DFSStaging      750 GB      622 GB      82.9%      0 KB         
                  
 
                    
      DFS Replication Bandwidth Savings:              
      The DFS Replication bandwidth savings are computed by determining the total size of data replicated across the network using a combination of remote differential compression (RDC), which sends only byte-level changes, and stream compression. By comparing this figure to the amount of data that would be replicated across the network if RDC and stream compression were not used, you can determine the percentage of bandwidth saved.              
      Reduction in WAN traffic:      930.50 MB         
      The following table describes the total on-disk size of files that were replicated and compares them to the amount of data actually received over the network with DFS Replication.              
      Replicated Folder      Total Size of Data If Received Without DFS Replication      Actual Data Received Across the Network Using DFS Replication      DFS Replication Bandwidth Savings         
       Internal      957.04 MB      26.54 MB      97.23%         
       Savings from using DFS Replication      957.04 MB      26.54 MB      97.23%         
                  

CRPTORFS01P (Hide All)
 
DNS name:      crptorfs01p.domain.com         
Domain name:      domain.com         
Reference domain controller:      CRPTODC04P.domain.com         
IP address:      fe80::194a:ee91:c196:9260%16,fe80::c42e:1211:b6a:c82f%15,fe80::ec4b:413d:a859:9d67%14,fe80::817d:86f9:cf9f:486c%13,fe80::c51f:b331:8a69:8b27%10,192.168.7.73,192.168.7.72,192.168.7.71,192.168.7.70,10.2.1.112         
Site:      Default-First-Site-Name         
Time zone:      (GMT-5:00)       
ERRORS (There is 1 error to report)

 
                    
      The DFS Replication service is restarting frequently.                
      Affected replicated folders:      All replicated folders on this server.         
      Description:      The DFS Replication service has restarted 6 times in the past 7 days. This problem can affect the replication of all replicated folders to and from this server. Event ID: 1004         
      Last occurred:      Tuesday, August 17, 2010 at 5:10:53 PM (GMT-5:00)         
      Suggested action:      If you restarted the service manually, you can safely ignore this message. For information about troubleshooting frequent service restart issues, see The Microsoft Web Site.       
WARNINGS (There are 2 warnings to report)

 
                    
      Pre-existing content is not replicated and is consuming disk space.                
      Affected replicated folders:      Internal         
      Description:      During the initial replication process for replicated folder Internal, the DFS Replication service identified pre-existing local content that was not present on the primary member and moved the content to G:\Internal\DfsrPrivate\PreExisting. The DfsrPrivate\Preexisting folder is a hidden system folder that is located under the local path of the replicated folder. Content in the DfsrPrivate\PreExisting folder will not be replicated to other members of the replication group, nor will the content be deleted by the DFS Replication service during any automatic clean-up.         
      Last occurred:      Wednesday, August 18, 2010 at 1:39:28 PM (GMT-5:00)         
      Suggested action:      If you want this content to be replicated to other members, move the content into the replicated folder outside of the DfsrPrivate folder. If you want to reclaim this disk space, delete the pre-existing content in the PreExisting folder.       
 
                    
      DFS Replication failed to clean up old staging files for replicated folder Internal.                
      Affected replicated folders:      Internal         
      Description:      DFS Replication failed to clean up old staging files. As a result, some large files might fail to replicate, and the replicated folder Internal might become out of sync. The service will automatically try to clean up the staging folder again. This failure has happened 10 times in the past 7 days. Event ID: 4206         
      Last occurred:      Monday, August 16, 2010 at 9:44:14 PM (GMT-5:00)         
      Suggested action:      For more information, see The Microsoft Web Site.       
INFORMATIONAL
 
                    
      Service state: Running              
                  
 
                    
      DFS Replication service uptime: 20 hr. 28 min.                
                    
                    
      DFS Replication service version: 6.0.6002.18005            
 
                    
      Summary of replicated folder status              
      The following table provides a high-level overview of replicated folder status on this server.              
                    
      Replicated Folder      Status      # of Files Received      DFS Replication Bandwidth Savings         
       Internal      Normal      2      94.16%         
       Data shown about the number of received files and the DFS Replication Bandwidth Savings accumulate from the time the DFS Replication service is started.
No backlog is shown because the backlogged transactions for all members are relative to this server.                        
 
                    
      Current used and free disk space on volumes where replicated folders are stored              
      The following table describes the current used and free disk space on volumes where replicated folders are stored.              
      Volume Path      Volume Label      Volume Size      Free Space      % Free Space      USN Journal Size         
       C:      (has no label)      68.2 GB      33.1 GB      48.6%      32.0 MB         
       G:      EQLData1      2.93 TB      1.54 TB      52.4%      512 MB         
       S:      DFSStaging      750 GB      607 GB      80.9%      0 KB         
                  
 
                    
      DFS Replication Bandwidth Savings:              
      The DFS Replication bandwidth savings are computed by determining the total size of data replicated across the network using a combination of remote differential compression (RDC), which sends only byte-level changes, and stream compression. By comparing this figure to the amount of data that would be replicated across the network if RDC and stream compression were not used, you can determine the percentage of bandwidth saved.              
      Reduction in WAN traffic:      72.09 GB         
      The following table describes the total on-disk size of files that were replicated and compares them to the amount of data actually received over the network with DFS Replication.              
      Replicated Folder      Total Size of Data If Received Without DFS Replication      Actual Data Received Across the Network Using DFS Replication      DFS Replication Bandwidth Savings         
       Internal      76.56 GB      4.47 GB      94.16%         
       Savings from using DFS Replication      76.56 GB      4.47 GB      94.16%         
                  

0
 
LVL 15

Accepted Solution

by:
whoajack earned 500 total points
ID: 33469172
Another area to double-check, for both servers review any available hotfixes. Also then there are good diagnostic examples on this post: http://blogs.technet.com/b/askds/archive/2007/10/05/top-10-common-causes-of-slow-replication-with-dfsr.aspx?PageIndex=13
 
0
 
LVL 10

Expert Comment

by:rscottvan
ID: 33469548
Here's what jumps out at me:

"The DFS Replication service is restarting frequently"

What does the event log have to say about the service restarts?
0
 
LVL 1

Author Comment

by:Methodman85
ID: 33536892
Hey Guys,
It looks like the hotfixes, and possibly deleting all the debug logs, kicked replication back into gear.
0

Featured Post

Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

I previously wrote an article addressing the use of UBCD4WIN and SARDU. All are great, but I have always been an advocate of SARDU. Recently it was suggested that I go back and take a look at Easy2Boot in comparison.
In this article, we will see the basic design consideration while designing a Multi-tenant web application in a simple manner. Though, many frameworks are available in the market to develop a multi - tenant application, but do they provide data, cod…
This tutorial will walk an individual through the steps necessary to join and promote the first Windows Server 2012 domain controller into an Active Directory environment running on Windows Server 2008. Determine the location of the FSMO roles by lo…
This tutorial will walk an individual through the process of transferring the five major, necessary Active Directory Roles, commonly referred to as the FSMO roles from a Windows Server 2008 domain controller to a Windows Server 2012 domain controlle…

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now