Link to home
Start Free TrialLog in
Avatar of TheLank
TheLank

asked on

BIG FRS DFS ISSUE!!

I am not new to the Win32 environment but I managed to SEVERELY SCREW UP DFS on 2 Windows 2003 Servers. Firstly I have 2 DC's I created a a root \\linkhere.org with links to \\linkhere.org\companyshares (\\dc01\companyshare:\\dc02\companyshare), \\linkhere.org\userstore (\\dc01\userstorage:\\dc02\userstorage), and \\linkhere.org\userprofile (\\dc01\userprofiles:\\dc02\userprofiles). In the begining the error had started on DC02 with a Journal_Wrap_Error of which I seemingly attempted to clear up. Me being the freaked out jackass I am somehow managed to delete the root thinking that removing FRS and re-instating the root\links just as they were prior to the error would correct the issue, now there's an even larger issue: DC01 is not replicating SYSVOL over to DC02 via NTFRS and is coming up with other errors in the event viewer:

-----------------------------------------------------------------------------------------------------------------------------------------------
EVENT ID: 13516
The File Replication Service is no longer preventing the computer DC01 from becoming a domain controller. The system volume has been successfully initialized and the Netlogon service has been notified that the system volume is now ready to be shared as SYSVOL.
 
Type "net share" to check for the SYSVOL share.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

-----------------------------------------------------------------------------------------------------------------------------------------------
EVENT ID: 13552
The File Replication Service is unable to add this computer to the following replica set:
    "COMPANYFILESYSTEM|USERSTORE"
 
This could be caused by a number of problems such as:
  --  an invalid root path,
  --  a missing directory,
  --  a missing disk volume,
  --  a file system on the volume that does not support NTFS 5.0
 
The information below may help to resolve the problem:
Computer DNS name is "dc01.linkhere.org"
Replica set member name is "{E718E3FC-91A7-4B26-A3D4-717D0285B64A}"
Replica set root path is "d:\userstorage"
Replica staging directory path is "c:\frs-staging"
Replica working directory path is "c:\windows\ntfrs\jet"
Windows error status code is  
FRS error status code is FrsErrorSuccess
 
Other event log messages may also help determine the problem.  Correct the problem and the service will attempt to restart replication automatically at a later time.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

-----------------------------------------------------------------------------------------------------------------------------------------------
EVENT ID 13555
The File Replication Service is in an error state. Files will not replicate to or from one or all of the replica sets on this computer until the following recovery steps are performed:
 
 Recovery Steps:
 
 [1] The error state may clear itself if you stop and restart the FRS service. This can be done by performing the following in a command window:
 
    net stop ntfrs
    net start ntfrs
 
If this fails to clear up the problem then proceed as follows.
 
 [2] For Active Directory Domain Controllers that DO NOT host any DFS alternates or other replica sets with replication enabled:
 
If there is at least one other Domain Controller in this domain then restore the "system state" of this DC from backup (using ntbackup or other backup-restore utility) and make it non-authoritative.
 
If there are NO other Domain Controllers in this domain then restore the "system state" of this DC from backup (using ntbackup or other backup-restore utility) and choose the Advanced option which marks the sysvols as primary.
 
If there are other Domain Controllers in this domain but ALL of them have this event log message then restore one of them as primary (data files from primary will replicate everywhere) and the others as non-authoritative.
 
 
 [3] For Active Directory Domain Controllers that host DFS alternates or other replica sets with replication enabled:
 
 (3-a) If the Dfs alternates on this DC do not have any other replication partners then copy the data under that Dfs share to a safe location.
 (3-b) If this server is the only Active Directory Domain Controller for this domain then, before going to (3-c),  make sure this server does not have any inbound or outbound connections to other servers that were formerly Domain Controllers for this domain but are now off the net (and will never be coming back online) or have been fresh installed without being demoted. To delete connections use the Sites and Services snapin and look for
Sites->NAME_OF_SITE->Servers->NAME_OF_SERVER->NTDS Settings->CONNECTIONS.
 (3-c) Restore the "system state" of this DC from backup (using ntbackup or other backup-restore utility) and make it non-authoritative.
 (3-d) Copy the data from step (3-a) above to the original location after the sysvol share is published.
 
 
 [4] For other Windows servers:
 
 (4-a)  If any of the DFS alternates or other replica sets hosted by this server do not have any other replication partners then copy the data under its share or replica tree root to a safe location.
 (4-b)  net stop ntfrs
 (4-c)  rd /s /q  c:\windows\ntfrs\jet
 (4-d)  net start ntfrs
 (4-e)  Copy the data from step (4-a) above to the original location after the service has initialized (5 minutes is a safe waiting time).
 
Note: If this error message is in the eventlog of all the members of a particular replica set then perform steps (4-a) and (4-e) above on only one of the members.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
-----------------------------------------------------------------------------------------------------------------------------------------------

I know you guys can help. I would love to repair this issue but am up ****s creek right now. Is there some directory I can restore and or replace in order to clear this issue up? Please advise.
ASKER CERTIFIED SOLUTION
Avatar of jmissild
jmissild

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Netman66
Well, here's my take.

EVENT ID: 13516 - this is telling you that SYSVOL is replicating and DC1 is advertising as a DC.

EVENT ID: 13552 - this is normal considering you nuked the root.

EVENT ID: 13555 - this is telling you that any information in your DFS replica that you are expecting to replicate will not.  This is normal considering you deleted the root.

Remove DFS completely.  Allow the DCs to replicate normally for several hours (maybe overnight) before you try reinstalling DFS - this will give the servers time to clean up AD and all the loose ends that DFS has lingering around.

Avatar of jmissild
jmissild

NO, Sorry Netman. Not trying to butt heads with you but what you have suggested is not a good idea.

Do not remove DFS completely as it will stop sharing SYSVOL and stop advertising as a DC.
AD replication is completely seperate from DFS replication. Yes SYSVOL replicates group policy information but they are two seperate beasts. Removing DFS is not going to clear up anything in AD, but is going to cause you to have to recreate your policies, etc...


TheLank,
Just use the articles that I have sent you and you should be fine. Be sure to do it just as the articles say. You can only D4 once for each server if you do it at the server level or once for each replica set. You have to d2 everything else after you do a d4 or you will end up with morphed directories.

I supported DFS / FRS / AD among many other things at Microsoft and the articles that I sent will help you resolve your issues!
Avatar of TheLank

ASKER

I will attempt the hex D4 entry this evening as I work for a healthcare organization and cannot play with the files at the moment, JMISSILD, thanks for the suggestions.
You're welcome Lank. Please post your progress. Again I cannot stress this enough. Stop FRS everywhere before beginning to d4 and d2. If you d4 one server you have to d2 all the others.

Just a couple examples

Full Authoritative
serv1 (DC)

serv2 (DC)

stop ntfrs on both servers, go to the process at startup/burflags and set d4 to the authoritative, then set d2 on the non-authoritative, then start ntfrs on both servers. There is no need to change the registry entry back because it reverts when the service is started.

If you are interested in doing only a replica set, then do the same thing but just for the replica sets be sure to follow the article:
Global vs. Replica Set Specific Reinitialization
section in the article 290762 above

I love this stuff, I thrive on this, my wife hates it though, hehe
Avatar of TheLank

ASKER

How's it going jmissild? I'm having another problem; I followed the steps within the user documentation user the "Full Authoritative" scenario (D4 Hex value for DC01 and D2 Hex value for DC02) but DC02 is reporting the following errors in the event viewer:

---------------------------------------------------------------------------------------------------------------------------------------------
Event ID 13508:

The File Replication Service is having trouble enabling replication from DC01 to DC02 for c:\userstorage using the DNS name dc01.linkhere.org. FRS will keep retrying.
 Following are some of the reasons you would see this warning.
 
 [1] FRS can not correctly resolve the DNS name dc01.projectsamaritan.org from this computer.
 [2] FRS is not running on dc01.linkhere.org.
 [3] The topology information in the Active Directory for this replica has not yet replicated to all the Domain Controllers.
 
 This event log message will appear once per connection, After the problem is fixed you will see another event log message indicating that the connection has been established.

For more information, see Help and Support Center at
 
---------------------------------------------------------------------------------------------------------------------------------------------

I'm getting the same event ID for three links:
c:\userstorage
c:\userprofiles
c:\companyshareddriectories

DC01 is only confirming an outbound connection for one link in the event viewer. I restarted the service on both machines a couple of times with no luck. You've gotten me this far, i'm sure there's something I can do to clear this up. Please advise.


FRS is the mechanism used by AD - thus, DFS should have no impact on SYSVOL whatsoever.

DFS is not installed at all until you install it so how come servers work fine without it if what you are saying is correct?

Anyway, we must have been typing at the same time, so it's all yours.
Avatar of TheLank

ASKER

After the D4\D2 Dword modification I restarted the boxes, THE S.O.B.'s STARTED REPLICATING!!! I'm replicating about 30gb of data on 1GB links, any reason for the lag in speed of Replication? I changed the replication from DC01 outbound only to DC02 and changed the priority to High on the outbound from DC01 to DC02 and vice-versa . I will re-establish the the normal inbound/outbound once multi-master has been established (inbound/outbound DC01 to DC02/DC02 to DC01). Any suggestions on speeding the process of replication up? It is traveling at a snails pace and has always traveled at a snails pace, the last working occurence of replication from DC01 to DC02 took over 72 hours.
Avatar of TheLank

ASKER

I'll post my last statement as another question.
Thanks Lank. Really nothing to do about the data replication taking so long. Just let it go. The d4 is authoritative and the d2's source their data from the "authoritative copy" That is one reason that people do the replica set entries instead of the full server. In your scenario it was not just a replica set that had a problem it was the whole DFS structure.
Lank the 13508 that you seen is basically saying I am not replicating with this server for some reason, it could be as simple as the server was restarted, or the services stopped or whatever. Look for event 13509 following the 13508 stating that replication is now occurring. If you see 13509 you can usually ignore the 508's. Wait until all replication is complete and replication is enabled both inbound and outbound, your 13508's will probably go away. If not let me know.

Lank, I just seen the share names that you are replicating, userprofiles, etc...
FRS is not designed to replicate data that changes often. It is meant more for static, consistent data types. You will probably continue to have problems with replication and people stating that they may be losing data etc... if this is data that is changed a lot. FRS works in a last writer wins mode so consider this:

You have two servers holding replica sets
User1 connects to \\fqdn\share on server 1 and changes 500 things in an excel spread sheet and then disconnects

User2 then connects to the same share name but server 2 responds and they change 1 thing before the 500 changes that user1 made replicate to server2. The 1 change on server 2 is timestamped with a later time and will have a higher USN so the 500 changes that user1 made are gone.

Article:
http://support.microsoft.com/default.aspx?scid=kb;en-us;q221089

Some utilities to help you monitor FRS:
SONAR:
http://www.microsoft.com/downloads/details.aspx?FamilyID=158cb0fb-fe09-477c-8148-25ae02cf15d8&DisplayLang=en

ULTRASOUND:
http://www.microsoft.com/downloads/details.aspx?FamilyID=61acb9b9-c354-4f98-a823-24cc0da73b50&displaylang=en



Netman66 - FRS does not work without DFS
You can stop DFS and FRS and Active Directory will still replicate (Group policy will not but AD will)
DFS will work without FRS though as you can have DFS links with no need to replicate.
DFS is installed on a DC by default, check it out.
We always seem to be meeting like this and I would like to be able to get along. I am only here to help people.

I do not try to correct to insult, only to give the best information to our customers. I am new here and I know that you have been around on this site for much longer than I but I am confident that we can help others and be on the same team.
Not to continue this without due cause but have a look at these:

The Windows Server 2003 System Volume (SYSVOL) is a collection of folders and reparse points in the file systems that exist on each domain controller in a domain. SYSVOL provides a standard location to store important elements of Group Policy objects (GPOs) and scripts so that the File Replication service (FRS) can distribute them to other domain controllers within that domain.

Note: Only the Group Policy template (GPT) is replicated by SYSVOL. The Group Policy container (GPC) is replicated through Active Directory replication. To be effective, both parts must be available on a domain controller.

FRS monitors SYSVOL and, if a change occurs to any file stored on SYSVOL, then FRS automatically replicates the changed file to the SYSVOL folders on the other domain controllers in the domain.

The day-to-day operation of SYSVOL is an automated process that does not require any human intervention other than watching for alerts from the monitoring system. Occasionally, you might perform some system maintenance as you change your network.

Taken from here: http://www.microsoft.com/technet/itsolutions/cits/mo/winsrvmg/adpog/adpog3.mspx#EXAA

If you look back at this thread, you asked to poster to rebuild SYSVOL with the burflag method - with what service?  So, your Group Policy replication comment above was just invalidated by yourself - as proof by your own assistance.


DFS is a service unto itself:

http://www.microsoft.com/resources/documentation/windowsserv/2003/all/techref/en-us/w2k3tr_dfs_how.asp

In Server 2003 they have made good on a promise not to make FRS a dependancy for the DFS service.


If you want to get along, like I suspect you do, and be part of a team (your words) then before you tell someone they're wrong, make sure you're right.  That's the only issue I have with all this.  I have never jumped into a thread to state you were wrong, but if you review our mutual threads this seems to be what you do to me.  I could care less about our differences, I'm here to help as you are.  

Enough said.  If you want to comment on this, email me directly and I will be glad to chat.

My apologies to TheLank.