Link to home
Start Free TrialLog in
Avatar of David Williamson
David WilliamsonFlag for United States of America

asked on

DFS and remote sites - part II

This thread was born of
It has been created because the original focus of the thread has shifted, as well as a lot of time and effort has gone into the previous thread.  I felt that it warranted a new thread...
Avatar of Netman66
Flag of Canada image

I'll be with you shortly.
Post the errors you're getting.

As for the third-party app - I'm not following you.  I think you can restore to the pre-staging area and start DFS this will then use the pre-stage area to fill the real DFS share - however, it still verifies every file against the root before it moves it from pre-stage.

Avatar of David Williamson


The article mentions restoring from NTbackup or a comparable 3rd party backup solution as a way of pre-staging the DFS shared folder.  Then, DFS will move the restored backup to the pre-existing folder, and then once it gets the MD5 checksum, will choose to move files from the pre-existing folder if the checksum matches.
Didn't the replication complete the other day?

I'm sorry, I am speaking in reference to Server3!  I wanted to add it to the DFS root (having previously removed it), but the frs-staging folder filled up the C drive, causing the server to act 'werid', not printing, not responding to IIS stuff, etc.  I was looking for a way to move the staging folder to the drive that has enough space, or a way to pre-populate the shared folder, ie, restore from backup.

Sorry for the confusion....

It turns out that my attempt to restore from backup and start the process over has failed again.  The frs-staging folder has filled up the C drive again, causing the afrementioned issues.  Dang.
I have to ask this...both Server2 and Server3 are at the same location - correct?

Forget Server3 as a DFS replica - it's not worth the effort since they're both on the same LAN.  You can normally bring up a dead server fairly quickly unless it's toast.  Failing that, a backup of the DFS folder can be restored manually to Server3 in a crisis.

Yes, that's correct, they are on the same LAN.  

The full restore of all our data that I did to Server3 just recently took over 5 hours.  5.5 hours x 100 people x about $125/hour/person = over $68,000 that we cannot bill to clients for the time it takes to restore all the data.  My only goal was to give us some redundancy and immediate failover in the case that our ever-getting-older main fileserver bites it.  One thing the boss hates the most is wasted time...

I'm just about to take Irvine off-wire and change its IP....
Once I move Irvine in to the Irvine Site, what should the DNS settings on Irvine and on the client machines in irvine be?
DNS for Irvine server should be itself with the ISP as a Forwarder.

All clients there should point to Irvine only.

As far as the DNS for Irvine as you mentioned before, since Irvine should update its own DNS entries, does that mean I shouldn't go in and manually update them in Server2's DNS?
Avatar of ewtaylor

I think  if you require 0 downtime then you should look into a active/passive cluster solution.
That would be nice, and actually is where I would like to head.  One of the prevailing reasons for that is that I am developing a Cold Fusion based intranet site which will eventually become the nerve center of our business operations, serving up all pertinent data like client info, billing, payroll, timesheets, invoicing, memos, etc.  Once its gets to the point where it is indispensible, it would be nice to have a cluster solution so that if one web server goes down, the system will be able to keep on humming.
Irvine should - in the perfect world.

You can let it try and just monitor and confirm it has or has not.  At least you know what to look for now~!~

Clustering is an idea, but keep in mind you need the Enterprise versions of the OS (either W2K Adv. Server or 2003 Enterprise) to cluster without a third party tool.

Also, in clustering you'll need a shared data array - something I was hinting at earlier.  You could certainly start there.

What kind of device is a shared data array?  Is is a SCSI kind of thing, or a NAS kind of thing?  Are there any performance issues?
The one I have is a shared scsi bus that connects to both computers. They then run a continous ping on the second private NIC if no reply the software will fail the control of the scsi array to the other inactive cluster and activate it. I was running this with windows nt 4.0 mscs 1.0 and had to work out a few bugs, I was actually running it on a file server and had another one running for exchange and after getting the initial bugs worked out they worked really well. Windows 2k and 2k3 both have clustering technology built into it (from advanced server up).
ok, Irvine is up and running.  I have NetSupport access to it on it new IP  What should I check first?  Shall I do the Sysvol/scripts test you mentioned first, or should I wait for Event viewer to give me  a 13509?
For the shared storage - we have a Fibre channel SAN with a dedicated fibre switch - but you don't need to go that high end.

You'll definitely need to use a SAN that is capable of being access from two servers.

Now, with respect to the testing - do the SYSVOL test to make sure the AD replication is happening.  Also, try creating a user from Irvine and see if it replicates to Server2.

Check DNS on Server2 for Irvine and check Irvine's DNS for Irvine.

I have checked the DNS on both, and they both reflect the correct addresses.  I created a text document in the winnt\sysvol\sysvol\\scripts.  Repadmiin /showreps shows that replication inbound from server2 was good:

C:\Documents and Settings\Administrator.WSE>repadmin /showreps
DC Options: IS_GC
Site Options: (none)
DC object GUID: 51f814c3-f364-482a-8553-72a476a41261
DC invocationID: ba8b3fc4-dd78-4614-8bf1-0e933e7450e5

==== INBOUND NEIGHBORS ======================================

    Vegas\SERVER2 via RPC
        DC object GUID: 6233f4eb-40c9-47a7-9096-2f1e88d0c8b1
        Last attempt @ 2004-03-30 12:13:47 was successful.

    Vegas\SERVER2 via RPC
        DC object GUID: 6233f4eb-40c9-47a7-9096-2f1e88d0c8b1
        Last attempt @ 2004-03-30 12:13:47 was successful.

    Vegas\SERVER2 via RPC
        DC object GUID: 6233f4eb-40c9-47a7-9096-2f1e88d0c8b1
        Last attempt @ 2004-03-30 12:13:47 was successful.

But, the file (which was there before then) did not appear on Irvine in the same location.  What do you think?
I created a user on Irvine, which did eventually appear on Server2, but it took a couple of cycles.  Still no test files in sysvol appearing on Irvine.

Irvine's event viewer just came up with a couple of 13508's, one for sysvol, and one for datastore
Looks like the hosts file was the culprit.  I remembered that we added entries for all the servers  hosts files, so I went back into them and corrected them.  The files in Server2's sysvol ended up on Irvine.  I put a file in Datastore on Irvine, and it showed up on Server2.  Things are looking good, but there are still a couple of things that I'm not sure of:

1) The ghost IrvineServer still comes up when I do repadmin /showreps from Server2 only.

2) In Sites and Services, should there be two connections showing in NTDS for all three servers?  That is NOT the case currently.  
    Under Server2 there is a connection to Server3;
    under Irvine there is a connection to Server2;
    under Server3 there is a connection to Irvine AND a connection to Server2.  That seems strange...

3)  I have the replication interval set to 15 minutes.  Does that mean that replication begins every 15 minutes and keeps going until it has caught up with all files that need replicating?  Or does it mean that it checks every 15 minutes and only replicates for the next 15 minutes?

4) We went through so much so quickly, I can't remember if there is something else I can check, or even where some of the tools were.  Part of our interaction, Netman66, was via Netsupport chat, which did not get recorded.  What do you think?
Avatar of Netman66
Flag of Canada image

Link to home
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Great job netman66, I learned a lot following these threads. Thanks to both of you.
Is it your opinion that I need to extend the interval beyond 15 minutes?

By the way, I have had several people come up to me this morning telling me that 'the file they worked all day on and saved before leaving work yesterday' did not have all the changes on it that they made.  Their changes were not saved somehow.  I didn't fix the hosts file until about 9 pm, after which I noticed that the replication started working.  Could replication have taken taken the older file on Irvine and overwrote the one on Server2?  Isn't replication supposed to move last-saved, or most recently changed files--last writer wins?  Or is there some kind of priority setting that needs to be adjusted?  If I check the replication topology on Irvine, it says 'full mesh'.  I assume that that means that the most recent file from anywhere gets replicated everywhere, right?

I was able to restore these couple files from the backup last night, which happens at 8 pm.  I fixed the hosts file at 9 pm....seems fishy...

I wonder how many other people today will be coming up to me saying that their files didn't get saved somehow.

Could this just be some 'settling in' that AD and FRS have to do before their in good sync?  
I just looked in the event viewer and saw event 13503, that FRS had stopped, and so, I went into services and started it.  Then came up 13521, which says:

The File Replication Service cannot enable replication on the comptuer SERVER2 until a backup/restore application completes.
A backup/restore application has set a registry key that prevents the File Replication Service from starting until the registry key is deleted or the system is rebooted.
The backup/restore application may still be running. Check with your local administrator before proceeding further.
The computer can be rebooted by clicking on Start, Shutdown, and selecting Restart.
WARNING - DELETING THE REGISTRY KEY IS NOT RECOMMENDED! Applications may fail in unexpected ways.
The registry key can be deleted by running regedit.
Click on Start, Run, and type regedit.
Expand HKEY_LOCAL_MACHINE, SYSTEM, CurrentControlSet, Services, NtFrs, Parameters, Backup/Restore,"Stop NtFrs from Starting". On the toolbar, click on Edit and select Delete. Be careful! Deleting a key other than "Stop NtFrs From Starting" can have unexpected sideeffects.

There are no backups currently running, only scheduled for later tonight.  I did, however, just restore a couple of files (as I mentioned above), but those have been done for a while.  We are using Backup Exec 9.
The Restore is likely what set the key so that replication won't continually change files during the backup state.

This should clear itself.  I would certainly watch it.  If it doesn't clear then fix the key - but I think it should.

It terms of replication - if the file was saved into a rep partner with a lower USN - which in all likelihood couldn't happen, then during replication it would get overwritten by the USN with the higher number.  Windows is "supposed" to prevent this kind of thing.

Full Mesh is a great sign - it means KCC has reconfigured the topology and it has converged.

Full Mesh is how I set it up to begin with, actually.  

So, how can I avoid this problem in the future?  Is there a way?  I have a steady stream of people coming to me about files.  I'm wondering if I restore them if they will be restored with the same USN value; will I then have the same problem just happen again?  At the moment, FRS is not running, presumably because of all the restores I'm doing.
No sign of that registry key being removed yet.  I wonder how long I should wait before deleting it?
Ok.  Things are getting weird.  This whole replication thing seems way too unpredictable/unstable/unmonitorable.  I had to restart Server2 because an update installer I ran caused the system to freeze.  It took over 20 minutes, but I let it restart by itself.  When it came up, FRS started again, but then moved some of the folders out of DataStore into the 'pre-existing' folder, making them inaccessible to my users, of course.

I don't get it!  I thought the replication was done with Irvine before it left the building?  I CANNOT have DFS moving my files/folders around!!! This is killing me!!  Is there anything I can do, any decent monitoring tool that will show me EXACTLY what is happening, a list of files that are being compared with USN numbers, files that are about to be replicated to and from where and when, etc?  All of this seems way too out of my ability to control and monitor...

Its starting to feel like I'm either in way over my  head or I need to consider switching to another OS...(feeling exasperated!)
Take a deep breath and count to ten..........

Replication needs to be configured for every hour or two - 15 minutes is too short.

FRS is likely choking on the whole restructure thing.

I think you'll need to reconfigure DFS from scratch.  Take the time to set up things correctly and prestage the data from what is already there.  

Use the documents we looked at earlier and see if you can find a TechNet article that steps you through the setup from the start.

Personally, 120GB over the WAN is a big deal for anyone.  However, technically, it should work.

Post some netdiag /v and dcdiag logs for me or send them to work so I can look at them.

It probably wouldn't hurt to do those large logs for me too.

Thank you for your words of wisdom...

I think you're right, I will tear down the DFS root and recreate it, letting it run over the weekend.  I've been doing some hardware and software updates and such on Server2 tonight, and it seems that everytime I restart, DFS moves files into the pre-existing folder.  It seems as if its almost starting all over with every restart.

Also, that registry key that prevents FRS from starting never did get removed automatically.  I am going to call Veritas and ask them about that; perhaps they know something about it.

I've also been thinking that there has to be a better plan than this; there has to be a better way to accomplish the kind of data availability that we're trying to achieve.  Perhaps we need to break up the data into groups and decide what reduce it to the bare minimum?

--I have deleted the DFS root.  I will send you the logs you have requested.

What do you think about upgrading server2 & 3 to 2003 Standard?  I mentioned it to the boss and he went for it.  It would be nice if our DFS issues got better, but I also want to stay up-to-date.  2000 server is now 4 years old.  What do you think?

Good idea.

I also think you need to determine what data needs to be where.  Shared data is fine to have accessed across the WAN depending on the size.  We run an entire Province (think State) from one location - so, it's possible.

You really should look at a fibre-channel SAN - it's not cheap, but if you move that kind of data then purchasing one is inevitable.