[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

DC no longer replicating 1 way only

Posted on 2011-05-04
49
Medium Priority
?
606 Views
Last Modified: 2012-05-11
Hi there,

we have a very strange issue at a customer’s location with replication of their DC. last night they stopped replicating. Below is the setup:

Site A
1 2008 R2 server which is a DC and GC. This also has Exchange 2010 installed on it.
name: ServerA
IP: 192.168.1.100

Site B
1 2008 R2 server which is a DC and GC.
name: ServerB
IP: 192.168.2.100

There are VPNs from A - B and i can ping each server from each location. By name (including FQDN) and IP. All traffic over the VPN both ways is permitted.

When i access AD sites and services from ServerA and try to pull replication from ServerB to ServerA i get the error message saying the RPC server is unavailable.

When i try to pull from ServerB in SiteB it reports a success.

When i log into ServerB and open AD sites and services and do the exact same as above i get success for both.

In the event log on ServerA we are receiving DFS and AD replication errors which are saying that the RPC server is unavailable to connect to ServerB. so i did some testing:

1. I can ping by name ServerB from ServerA and this would also append the DNS domain suffix.

2. I can ping by name ServerA from ServerB and this also would append the DNS suffix.

This is where it gets strange...

From ServerB i can access and see the shares/printers on ServerA by typing "\\ServerA" from the run box.

From ServerA i CAN'T access the shares when typing "\\ServerB". It says the path or server cannot be found.

However, when i use the IP address "\\192.168.2.200" it works fine and the shares and printers are displayed.

This now leads me to think that this is the issue. Why would the server not be able to access the shares using the name or the FQDN of ServerB from ServerA if it CAN access using IP address?

If it was something on the firewall it would not allow me to access using the IP as it would just block the ports. Also the firewall does not block packets based on name?

All windows firewalls are disabled on both servers and there are no AV or third party software installed.

If someone can point me in the right direction to help troubleshoot this further that would be most appreciated.

thanks,

mike
0
Comment
Question by:Bertling
  • 24
  • 10
  • 8
  • +3
49 Comments
 
LVL 24

Accepted Solution

by:
Awinish earned 1000 total points
ID: 35689065
RPC error are related to network ports/firewall.
Make sure all the necessary ports are opened, you can use portquery tool.

http://support.microsoft.com/kb/839880
http://blogs.technet.com/b/abizerh/archive/2009/06/11/troubleshooting-rpc-server-is-unavailable-error-reported-in-failing-ad-replication-scenario.aspx
0
 
LVL 11

Author Comment

by:Bertling
ID: 35689076
thanks i will look at this but the confusin is that why can i access the shares via IP address and not FQDN?

and i can ping the FQDN?
0
 
LVL 19

Expert Comment

by:Miguel Angel Perez Muñoz
ID: 35689109
Check DNS config on clients. Check dns suffix on TCP IP properties.
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 11

Author Comment

by:Bertling
ID: 35689130
there is no DNS suffix in the TCP adapter on either of the servers. It is appending its AD domain name siffix...

I have also tried adding a entry to the host file and it still will not access the shares using the name...

just out of interest does AD atall use NETBIOS names anymore for replication?
0
 
LVL 22

Assisted Solution

by:chakko
chakko earned 500 total points
ID: 35689174
One time I had a problem similar to this.  Replication  only 1 way was communicating.



In the end I found that I had to adjust the MTU of packet sizes.  The Routers and VPN were causing some fragmentation I guess.  Lowering the MTU size fixed the problem.  1 site had a cheap ADSL SOHO router and a local small ISP (it was on an island) - so anything could happen at that site in regards to internet.

You might try some ping tests and use the -L parameter to adjust packet sizes and maybe something useful will come out of it.
0
 
LVL 11

Author Comment

by:Bertling
ID: 35689189
chakko, thanks for this i will test. the default MTU is 1500 correct?

the issue is that i am working remote and i worry that i will disconnect them so i may need to visit onsite.

Just one question... why would the MTU prevent me connecting using the DNS name of the server in siteB but allow me to connect using IP?

thanks,
mike
0
 
LVL 24

Expert Comment

by:Awinish
ID: 35689201
Take a look at the ports required. Netbios name is used for WINS & its not been used for replication.
http://technet.microsoft.com/en-us/library/dd772723%28WS.10%29.aspx

0
 
LVL 24

Expert Comment

by:Awinish
ID: 35689217
is Domain controller pointing to only Local DNS server.

Take a look at below article.
http://awinish.wordpress.com/2011/03/08/dns-recommendations-from-microsoft/
0
 
LVL 11

Author Comment

by:Bertling
ID: 35689223
Each DC is pointing to itself first then the other DNS server after. it is setup correct.

thanks,

mike
0
 
LVL 22

Expert Comment

by:chakko
ID: 35689228
No idea about not being able to connect with a NETBIOS name at this point.

I just threw that out because of the Replication.  It was some time ago, I think I used a program called Dr. TCP to see MTU values on the server and adjust the server TCP/IP properties.
1500 is default size most of the time for MTU.  

I had Windows 2000 servers on each side.
0
 
LVL 11

Author Comment

by:Bertling
ID: 35689241
chakko are you asking me to change the MTU on the local server or the router/firewall?
0
 
LVL 22

Expert Comment

by:chakko
ID: 35689244

I did all of this remotely (regarding adjusting the TCP MTU).
I only adjusted the side that was having the problem.
0
 
LVL 11

Author Comment

by:Bertling
ID: 35689250
ok at this point im thinking maybe is should try to update the HP NIC drivers on the Proliant ML150 servers?

the issue this is something ill have to do out of hours...
0
 
LVL 22

Expert Comment

by:chakko
ID: 35689251
I adjusted it on the server.  Being on site may be a good idea though, depends on the character of the people at that office.
0
 
LVL 11

Author Comment

by:Bertling
ID: 35689268
well we do have ILO which is great. but i think ill upgrade the NIC drivers first.

Awinish, i am just doing the tests you advised at the start and will revert asap.
0
 
LVL 24

Expert Comment

by:Awinish
ID: 35689308
For accessing the share, take a look at below link. I too had issue in the past, server was accessible using IP but not with hostname even it can be pinged.

http://www.techsupportforum.com/forums/f31/solved-error-code-0x80070035-network-path-not-found-175665.html
 I presume local windows firewall is turned off on DC's.
0
 
LVL 11

Author Comment

by:Bertling
ID: 35690002
i have been doing more digging and have found that if i try to access the shares on server A from serverB i can see the shares.

but if i then click into it it says that it is not accessable, i may not have permissions to use it or the network name no longer exists?

this also happens when i try to use the IP address? i have never had an issue as strange as this one before...
0
 
LVL 11

Author Comment

by:Bertling
ID: 35692585
Strang issue here. i rebooted ServerA and then the AD replicated all the test accounts and i could replicate in sites and services without an error.

10 minutes later i then get the same issue again?

can anyone point in the right direction? it cant be a network issue?
0
 
LVL 21

Expert Comment

by:snusgubben
ID: 35693698
Until your last post it sounded like the secure channel on ServerA was broken and needed a reset.

http://www.experts-exchange.com/OS/Microsoft_Operating_Systems/Server/2003_Server/Q_26810356.html

Have you ran a dcdiag on a DC to see what it says?

ie. dcdiag /v /e /c /f:dcdiag.txt
0
 
LVL 11

Author Comment

by:Bertling
ID: 35693719
i will look at that.

I have restarted the server 2 times after the first reboot now and replication starts to work fine and i can even access the other server using \\serverb.

around 20 minutes - 1hr later it would stop working again and we would have the same issue?

can anyone point me in the right direction?
0
 
LVL 21

Expert Comment

by:snusgubben
ID: 35693818
can anyone point me in the right direction?

1. Check AD health with dcdiag
2. HW issues? Did you update the NIC driver on ServerA?
3. Antivirus programs exhausting/virus-malware? (Memory/CPU usage -> Performance Monitor)
0
 
LVL 11

Author Comment

by:Bertling
ID: 35694029
Snusgubben, thanks for this i will try the drivers next as i didnt get round to that yet. DCdiag does not give much clues away apart from what is in the event log or to check the firewall which im sure isnt the issue.

there is no AV on the server.
0
 
LVL 11

Author Comment

by:Bertling
ID: 35694030
There is also another confusing issue which is occouring now with the outlook clients in this site.

The Exchange Server 2010 is installed and running on the server in siteB.

All outlook clients in site A connect over the VPN to the ServerB Exchange server. They DONT use Cache mode and instead work on Outlook Online mode.

When they user logs in to their profile and opens outlook after around 1 minute they get the retry, work offline or cancel buttons as if the server is not availible.

When they hit retry after another minute it will go into their outlook mailbox. it ALWAYS happens on the second try.

when they are in outlook everytthing works fine and the access of mail is fast.

can anyone advise why this may be happening? im sure its related but why would it not connect first time round then connect 100% on the second time?

Does outlook try different ways to connect to the exchange server and on the seconds attempt when the first is unsucessful it  manages to find the server?

also please note that there are NO issues in SiteB where the Exchange server is. all users (including user accounts from site A) can connect first time and quickly.

thanks,

Michael
0
 
LVL 24

Expert Comment

by:Awinish
ID: 35695287
Dcdiag will give you clue use dcdiag /v /c /d /e >>c:\dcdiag.log

http://blogs.technet.com/b/askds/archive/2011/03/22/what-does-dcdiag-actually-do.aspx
0
 
LVL 11

Author Comment

by:Bertling
ID: 35696543
Hi All,

I would like to say thanks for your help and input for this case. I think i nailed down the issue and it has now been working for the last 12 hours or so touch wood!

Your comments have been informative and i appreciate that. Following this comment i will advise what i have done to resolve the issue. and i also have a question with regards to why this is happening.

thanks,

mike
0
 
LVL 11

Author Comment

by:Bertling
ID: 35696556
hi all just an update.

after a lot of messing around i have found that it must be to do with the firewall in SiteA.

but this is what is confusing me:

The problem was that i could not access ServerB using \\ServerB or \\ServerB.domain.local. But i COULD however access using \\192.168.2.200.

When i rebooted ServerA i then CAN access using \\ServerB. At this point it would stay "working" for around 20 - 60 minutes before reverting back to the same issue where i then cannot access via \\serverB.

Sure enough when i can access \\ServerB, replication in AD Sites and Services works fine when selecting "replicate now" on all connections. And as soon as it starts to block \\ServerB share access i can no longer replicate.

If i then reboot the server again it will work as before then fail as above.

So for a long shot I thought i would test the following:

1. Reboot the server and wait for it to stop replicating or allow access to shares using \\ServerB
2. Reboot the firewall (Netgear Prosafe VPN Firewall FVX538).
3. Once the firewall came back up I could then access the shares using \\ServerB and then replicate in AD Sites and Services.

The servers have now been replicating fine for the last 12 Hours. Furthermore The Outlook client now go into outlook first time and take only 15 Seconds to connect into their mailbox rather than the 2 - 3 Minutes as stated above.

We will of course upgrade the firmware etc of the firewalls first and maybe reset to factory default to be sure there isnt a corruption or setting lingering but this will be chargeable to the customer. So we would like to be sure.

My question is if it is the firewall how is it possible that when ServerA is rebooted it can connect to serverB for the first 20 - 60 minutes before it then reverts back to the same issue where it cannot connect again. But when i rebooted the firewall this permanently fixed the issue?

thanks,

mike
0
 
LVL 24

Assisted Solution

by:Awinish
Awinish earned 1000 total points
ID: 35696563
In my first post, i pointed  you to firewall & i was sure its a firewall issue. Rebooting the firewall might have clear the ARP table info & that's how it started to work.

0
 
LVL 11

Author Comment

by:Bertling
ID: 35696581
Awinish, yes you are right and ARP is likely to be the issue somewhat. But the confusion is why could i access if i reboot the serverA for 20 - 60 minutes?

This is one of those problems which mess with your head and make you beat around the bush with other possibilities wasting time because it dont add up?
0
 
LVL 24

Expert Comment

by:Awinish
ID: 35696588
Rebooting a windows server resolves 90% of problem & that's why MS release patches every month, so server gets rebooted, but i have seen the issue when system is not updated with latest security patches or server driver/firmware is old, these issues occurs.
0
 
LVL 11

Author Comment

by:Bertling
ID: 35696606
Sure but if that is the case the problem would be with the ServerA and not the SiteA firewall?

What is causing the firewall to allow the traffic from ServerA out over the VPN for the small duration when the server was rebooted?
0
 
LVL 24

Expert Comment

by:Awinish
ID: 35696611
I already told you, there might be something & clearing the table did, but if you really want to go in depth of this, firewall vendor is the right person.
0
 
LVL 11

Author Comment

by:Bertling
ID: 35696719
lets be honest the manufacture wont be much use. i just dont know what changes to the packets when the server is rebooted to permit it to connect for the short time...
0
 
LVL 24

Expert Comment

by:Awinish
ID: 35696797
Did you every though it can be dns issue & if you see others, everyone took it as DNS issue, so its not wrong to consult them & get it check because we can't rule out it can be issue with the firewall firmware too. So, there is nothing concrete until it proves & i would still ask them once to check the device, because rebooting the device solve the issue.
0
 
LVL 11

Author Comment

by:Bertling
ID: 35697381
Ok well we are still facing the same issue again! it came back and rebooted the firewall and no luck. i cannot reboot the server until 18:00 or so.

I think the best place for us to start will be to replace our current firewalls/router. Each office has 1 router and 1 firewall.

The routers connect to the ADSL network and then port forward all traffic to the WAN port on the firewall. This way we can have the VPN tunnel end points on the firewalls.

We have 1 static IP address at each location.

What we want to do is just have 1 appliance at each end. So an ADSL router/modem that has IPSEC VPN functionality. This will eliminate 2 points of failures in the future and we feel it would be a neater setup.

I have been looking at the Draytek Vigor 2820n http://www.draytek.co.uk/products/vigor2820.html

if anyone else has any other recommendations please do advise.

thanks.

Michael
0
 
LVL 22

Expert Comment

by:chakko
ID: 35697588

Where I am at on the ADSL Modem/Router I put that into Bridge mode.
Then the firewall has PPoE capability and it will get the Public IP on it's interface.
0
 
LVL 11

Author Comment

by:Bertling
ID: 35697627
i see, but im not sure if our routers have bridge mode functionality?

also do some ISPs prevent the use of bridge mode?
0
 
LVL 21

Expert Comment

by:snusgubben
ID: 35704130
Can you from Site A reach some other shares in Site B that is shared by another host?

0
 
LVL 11

Author Comment

by:Bertling
ID: 35704416
yes i can its messed up...
0
 
LVL 21

Expert Comment

by:snusgubben
ID: 35704715
So (from ServerA), a "net view ServerB" is the only thing that don't work?

ie. From "PC in Site A", "net view ServerB" lists its shares?
 
0
 
LVL 21

Expert Comment

by:snusgubben
ID: 35704879
Sorry. Got the syntax wrong. "net view \\server"
0
 
LVL 11

Author Comment

by:Bertling
ID: 35705125
snusgubben: Yes that is correct. it is however slow to access from the clients but it will eventually connect.

I can confrim that the internet speed too and from site A an B over the VPN will transfer at ~100KBS which is pretty good if you ask me.

We have other customers who have slower speeds that access shares over a VPN and they are very quick.

Please note that from serverA we can ping ServerB and it will resolve and append the domain.local.

There are a lot of strange issues at siteA even on the PCs some of which are that they cannot access random sites e.g one we found yesterday:

to access hp.co.uk would hang then fail.
to access www.hp.co.uk would work.
I can access hp.co.uk from any other PC out of that office.

We have ordered 2 new Draytec ADSL router/firewalls and should be here tomorrow for us to instal lat each location.

this will:
A) Rule out the possiblilities wit hthe netgear ADSL modem routers and Firewalls at each end.
B) prevent 2 points of failure in the future as all will be consolidated on 1 box.
C) be able to host PPTP VPN on the router which is a bonus.

further more siteB want wifi and these are WiFi enabled so that is an extra bonus too.

This site is a head F**k and i have never experianced issues like this before where we cant pint point a problem for sure. At least ordering these routers will begin the process of elimination if the issue still arrises.
0
 
LVL 21

Expert Comment

by:snusgubben
ID: 35705269
This sounds more like a virus thing. Have you tried to scan the DC in site A with a virus scanner of some sort?

Just to get an quick answer since you didn't have a AV scanner. Try i.e. the free version from Malware bytes just to rule out viruses.

http://www.malwarebytes.org/
0
 
LVL 11

Author Comment

by:Bertling
ID: 35705455
snusgubben: i will give that a go.

The strange thing is that it is working again now. and all the PCs connect to outlook fast again.

How does the local PCs in siteA rely on the local server being able to contact siteB server when opening outlook?

from my understanding this is what happens:

1. SiteA workstations login usign theur usernames and this authenticates to the local DC in the site. (sites and services is setup correct wit hthe correct subnets).

2. They click outlook. This then resolves the name serverb.domain.local to IP. for the record i can ping and it instantly resolves. all clients have the DNS server set to ServerA and secondary to ServerB. DNS is working fine on ServerA and ServerB.

3. As the users are authenticated it will just access the excahnge server on serverB.domain.local and go into their mailbox.

I cant understand why when there IS a problem that the inital try will prompt "retry, work offline and cancel". Surely this must mean that packets over the VPN are slow from the clients to ServerB when accessing outlook which when setup like this will use RPC IIRC?

Simply put it there are strange things going on in sitea network and fingers crossed these new routers wil solve the issues.
0
 
LVL 21

Expert Comment

by:snusgubben
ID: 35705517
In your initial post you said Exchange was installed in site A. Is that incorrect?

Have you configured Outlook AnyWhere on the Exchange and the Outlook clients?
0
 
LVL 7

Expert Comment

by:FemSteenkamp
ID: 35705686
temporarily chaneg serverA to point to server B as DNS first and second to itself ( I am presuming that SeverB points to itself here for DNS.

reboot server A and see if it works now.   if it does force repliaction etc
use nslookup, and overide teh default ( which should stil be Server B her) to point to itself
maks sure that ServerA can do DNS resolution for teh domain and server names

0
 
LVL 11

Author Comment

by:Bertling
ID: 35705773
In your initial post you said Exchange was installed in site A. Is that incorrect?

Sorry! no excahnge is in siteB on ServerB. sorry for the confusion.

I have not tested outlook anywhere to see if that works but i understand when it detects that the server is local and logged onto the domain it will force to connect via RPC over the VPN direct to the server?
0
 
LVL 21

Assisted Solution

by:snusgubben
snusgubben earned 500 total points
ID: 35707006
They will use MAPI to connect by default.

On an Outlook client, you can just press CTRL + right click the Outlook icon in the tray bar. Choose "Connection status" and you'll the the DC the client use and the Exchange server.

You'll also if they use TCP (MAPI) or "Http over RPC" to connect to Exchange.

Any luck in the "virus hunt"?


You previously asked why you could browse \\IP-to-ServerB but not \\FQDN.

Probably because computers communicate with eachother with use of Kerberos. The PC you are sitting on passes CIFS (as service) and hostname to the KDC to request a ticked to a file system. SPNs are only registered as Hostname and FQDN to host. Not IP. (On 2008 R1 and newer it's not possible to manually register a SPN with IP).

So when you type "\\IP-to-ServerB" you can browse the "status" of the shares. Kerberos is not in play until you try to access the share by opening them.

I think you ruled out the routers/firewall since any other host in Site A could access Site B. It's only ServerA that has problems. If this is correct, you could assume the problem is ServerA, or the port on the switch ServerA uses.
0
 
LVL 11

Author Comment

by:Bertling
ID: 35754785
hi all,

just to clarify we have resolved the issue.

changing out both the firewalls and routers at each location with the Draytec resolved the issue.
Everything runs fine now including outlook which is much faster.

the ping is now 40ms instead of he 110ms it used to be.

thanks all!
0
 
LVL 24

Expert Comment

by:Awinish
ID: 35755037
Good to know issue is resolved & thanks for the update.

Regards
__________________________________
Awinish Vishwakarma| CHECK MY BLOG
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Had a business requirement to store the mobile number in an environmental variable. This is just a quick article on how this was done.
It’s time for spooky stories and consuming way too much sugar, including the many treats we’ve whipped for you in the world of tech. Check it out!
This tutorial will show how to configure a new Backup Exec 2012 server and move an existing database to that server with the use of the BEUtility. Install Backup Exec 2012 on the new server and apply all of the latest hotfixes and service packs. The…
This video shows how to use Hyena, from SystemTools Software, to bulk import 100 user accounts from an external text file. View in 1080p for best video quality.

829 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question