asked on

Wiresharking domain controllers to diagnose MPLS latency issues

I have 10mbps links between my locations and we have been experiencing network latency, primarily in the afternoons. I contacted my ISP and they were rather tight-lipped about what traffic was causing the problem. It took 8 months, countless tickets, and repeated threats to finally get the small amount of information that I have- that the primary network congestion is being caused by 4 domain controllers communicating with a domain controller in the hub of our network. I have modified the link costing and replication schedules and am pulling utilization reports tomorrow. I am going to run a packet capture with Wireshark. It is going to run from a desktop computer connected to a port that monitors the MPLS pot and filter by IP address for each of the DC’s communicating back to the hub. What should I look for in the capture that might indicate the source of what is flooding the network?

Aard Vark

I would be looking at the data you can get from the host themselves via performance counters first. If your DC's are causing the problem you should be seeing significant spikes at traffic which you can see via perfmon (even better, SCOM). You can view your:

Inbound/outbound traffic volume.
Inbound/outbound traffic packets.
Inbound/outbound replication traffic volume.
Inter-site inbound/outbound replication traffic volume (via bridgeheads).

Just to name a few. Windows built-in performance monitoring tools can tell you a lot about what is happening with the right context. Wireshark is going to give you volumes of data which will not be relevant that you need to sift through to find a pattern. If your ISP claims replication traffic between DC's is the cause, this should be visible via perfmon. If you have SCOM, even better, it has built-in dashboards for viewing this sort of information. Also look into the volume of LDAP binds/queries during high latency periods vs. normal periods.

You need to establish a baseline of what you consider normal traffic, and compare that to the times where you are experiencing latecy. Tools like pathping will also be helpful to identify high latency hops during those periods. Establish a baseline of what is normal, compare the pathping results with what you experience during high latency periods.

I have 10mbps links between my locations and we have been experiencing network latency, primarily in the afternoons.

You have your sites setup correctly? A DC connecting across an MPLS link should not be considered to be in the same site as DC's on the other site. It should be on its own subnet with its own subnet defined. I assume you would have done this but good to ask the question. Intra-site replication is instant, inter-site replication is every 15 minutes or more, unless you override this with use notify settings.

noci

The same measurements would apply to the link themselves.
You may need to collect traffic statistics on all interfaces that feed that MPLS link.
It might be that the normal traffic is ok but some other traffic is pushing those DC links over the edge...
(If DC-DC is using 80% of bandwith, then anything that adds 20% or more (although not peak use of traffic) will cause trouble.

So try to get some traffic grapher that will take data of all concerned interfaces using SNMP f.e.
MRTG, Solarwinds, ... etc.

This question needs an answer!

Become an EE member today

7 DAY FREE TRIAL

Members can start a 7-Day Free trial then enjoy unlimited access to the platform.

View membership options

Learn why we charge membership fees

We get it - no one likes a content blocker. Take one extra minute and find out why we block content.