Link to home
Start Free TrialLog in
Avatar of jhyiesla
jhyieslaFlag for United States of America

asked on

DFSR ConflitandDeleted folder

So, we've been experimenting with DFSR as a way to replicate our roaming profile folder to another server at our DR site. I tested this in a small way and it worked just perfectly, but, of course when I did it in production we ran into all kinds of issues. I had a ticket open with MS and they helped me with my initial problems and they confirmed that it is OK to replicate roaming profiles using this method.

However, I keep running into issues getting the reverse replication to work.

I set up my environment using the info in the following article: https://blogs.technet.microsoft.com/filecab/2013/08/20/dfs-replication-in-windows-server-2012-r2-if-you-only-knew-the-power-of-the-dark-shell/#comment-103795

It works really well, but, as I said, I have issues with whatever method I use to pre-seed the replicated environment.  So, I stopped everything, deleted anything to do with the replication and started up once again.  However, this time I did NOT pre-seed the environment at the DR site.  I created the profile folder and then built the DFSR environment according to the article above. I figure I have to spend time and resources initially seeding the DR environment anyway, so why not just let the initial replication  from my main site to the DR site be what builds the Dr site's folders and files. By the time I have done the pre-seeding, the database is already probably 20 hours too old so I figured this would be the answer.  

Basically it's working. I see the file and folder count in the backlog on the main server going down and I see folders and files being built at the DR site.  However, as of this morning I had about 29,000 files in the ConflictAndDeleted folder. I checked the event log and I see tons of error messages that read a file has changed on multiple servers and we're not sure who the winner should be so we're going to put it in the CAD folder. WHY??? The ONLY place that the file would be changing is at the main site.  No one knows that the DR side even exists and so absolutely no changes should be happening there. This is VERY frustrating as it, at least to me, makes no sense. I would expect that a file would go from main to DR and if the file at main changed or got deleted then it would either change or delete the file at DR.

The only other thing that I can imagine is that it looks like the first thing that happens is that the DFSR process catalogs all of the files at main and then starts the replication process. There are about 2.8 million files in the profiles of our users. And so, of course they're going to change before the initial replication is done.  Could that be what's happening?  If so, I guess that I'd expect the CAD folder to grow on the main server and not the DR one, but who knows.

BTW, no replication is happening from DR to main at the moment; DR's status is waiting for initial replication and I am totally confident that the initial replication down to DR has not finished.
Avatar of arnold
arnold
Flag of United States of America image

Not clear what the issue is.
When you pre-seed, you have to make sure that the process is a backup/restore that preserves the time stamps/ownership of the files. When you then establish the replication you must make sure to choose the correct server as the reference server which in your case would be the one in the office and not the one in the HQ.

You have to make sure that your DFS referral is set to prefer/primarily connect the user to the NON-DR site and only connect the users to the DR site when the primary is not available.
Check the DFS group member/target server properties and set the non-dr as the first.......

If you have resources at the DR that someone is actively using, it is possible that your GPO/branch settings dictate the this user use the local DR dfs share location....
Avatar of jhyiesla

ASKER

The ultimate goal is to have a profile folder at the main site and a replicate at the DR site. Since replication is two way, what we want is for people logging in at the main site to use that one and when logging in at the DR site to use the one there. The DR site will be used as a second data center that may run some of our servers as well as giving users a second place to log into View desktops. The thought is that if a user logs in at the main site, changes to their profile will be replicated to the DR site so if they then log in there the next time, they will have their profile and it will replicate back to the main site as well.

From my initial testing, this concept works just fine.
I have done my pre-seeding two ways. Initially I used a Veeam migration which took about a day to finish. So, while it would have been a valid machine, obviously over 12 hours things would have changed. Ultimately I got it working with MS's help, but could never get the files to stop flowing into the ConflictAndDeleted folder. I'd get the error that would say a file had change on multiple servers, but at the moment there are no changes being made at the DR site except thru the DFSR process from the main site. So, while everything was working in that I could manually make a change at either site and it would replicate, I keep getting tons of files every day in the CAD folder and I didn't know why.

I eventually tore it all down and deleted everything to do with the replication and I created a new blank profile folder. I then used Robocopy, as I found in another DFSR article, to pre-seed the DR site. Again, I keep getting large amounts of files going into the CAD folder with that same error, but this time the DR site machine always had the error in the health report that it was waiting for the initial replication to complete. This went on for over a week.  All of the things that I had tried before when I saw this and was working with MS failed to start the reverse replication.

So this third time, I wiped everything again and started with a blank profile folder and I decided to just let the replication process itself create the info at the DR site. That seems to be working as I am seeing it build the replica at the DR site with the proper permissions and everything. The problem is that I am still getting files into the CAD folder with that same error and once again no changes are being made at the DR site except files being written and deleted by the DFSR process from the main site. I did uncheck the box in the CAD configuration area of DFS management to NOT move deleted files into the CAD folder. And, as far as I can tell, the initial replication is finished with just a small number of files in the backlog, but the DR server is still "waiting for initial replication".

So, my questions are why are file flowing into the CAD folder and why can't I get the reverse replication to kick off?
ASKER CERTIFIED SOLUTION
Avatar of arnold
arnold
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Both times I did the Pre-seed, I made sure the restore or robocopy was done before setting up the DFSR environment. This time, I'm letting the DFSR process build the DR site's profile folder.

I made the main site profile server the primary server in the DFSR environment.

According to the DFS management utility, everything is enabled and I have two connections, also enabled. I checked security settings on the top level folder before I started and I've done spot checks on the profile folders as it builds them and the security seems the same on both servers.

I'm running the health report with the opposite server as the focus of the report and it seems that the backlogged transactions on the DR site server are going down.  I'll wait it out to see if the Waiting... message goes away.
OK, it's been way more than enough time for everything to be initially done.  The backlog on the primary server is almost nil, but with the number of changes I'm not sure that it ever goes to 0 except maybe in the middle of the night. The secondary server is still in the "waiting... state and I can't imagine what's keeping it there. What "seemed" to fix it the very fist time I tried was increasing the staging space. But the staging space that I have now is 5 GB higher than what I had the very first time. The ConflictAndDeleted folder is still getting files injected into it on the secondary server and in the DFSr event log I am still seeing the 4412 errors that a file was changed on multiple servers... which I totally don't get since the only changes are the DR site are being made but the replication from the primary site.
Double check your replication configuration. The issue might be that a file being replicated from the server on which it changed to a second server and from those the same file is being replicated to the new one.
Server A =>serverB
Server A => newserver
Server B => newserver
replication topology, mesh.
The last two because of delays might be the conflict that is being detected.
Limit the replication to the new server to originate only from Server A, until the replication is done.
The staging area usually should be as large as the largest file being replicated. (double check that..)
There are only two servers involved in the replication.  While DFSR is obviously bi-directional, no one even knows it exists and no one would be accessing it to make changes. In the past when the secondary servers wasn't waiting for initial replication, and the backlog was reasonable, I could manually make a change in my profile on either server and it would be replicated to it almost immediately.

MS and I went over the staging space several weeks ago and they suggested, based on what they saw, to make it 40 GB, I did 45 and now it's 50 on each server. But even when I got it working, I could never get rid of the files going to the conflict and deleted folder...which, again, from my perhaps ignorant perspective, makes no sense since only one server is making any changes.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Basically I am replicating their roaming profiles. When I first got on the phone with MS, the tech checked and confirmed that it was OK to attempt this on roaming profiles. We use VMware View as our virtual desktop environment and we're using the Persona management piece to create the .V2 profiles files. The secondary server is at our DR site and no one knows it's there and no process accesses it except DFSR.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Ours update more frequently, although I would think, especially at night when our staff is at it's lowest, that this would not keep the secondary server from starting it's replication; and initially that part did work. The ultimate goal is for the virtual desktops at the secondary site to access the secondary profile server and for DFSR to keep them in sync.  However, at the moment, since I can never get it to be stable, desktops at that site are still accessing the primary profile server.

We're looking to go to a different profile system that would only update at logoff, but I don't have a way to test how DFSR would work in that situation because the number of files and folders and changes would be so small because off my small testing environment.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I'm not worried about the secondary site desktops seeing the secondary profile server.  On the View masters for that site, I'd just change the Persona management local security profile to point to the secondary server - we didn't set Persona management up thru AD - we did it on the masters.  And we do not use folder redirection so yes, you are right, all changes are shipped back to the server and I'm not going to make that sweeping a change at this point.  

I think I just need to go back to MS and reopen the ticket with them.
Ultimately I am going to have to go back to MS on this one, but Arnold did a good job of helping me with thru the issue.