glenn22
asked on
Network freezes for a few seconds randomly
I've been having a strange issue on our network that has been causing a lot of problems for some time now. Throughout the work day there will be what i can only describe as network "freezes" at random times. There does not appear to be any regular pattern to the freezes, sometimes there will be 3 in the space of 15 minutes, other times only 1 or 2 a day. These "freezes" only last for a few seconds, but it is long enough to cause internet connections to drop, webpages to fail to load, etc. We have a DSL connection AND a cable connection to the internet, and they are connected to our network in separate locations. Both connections experience these freezes so it doesn't appear to be a modem issue. Also, these freezes do not occur on weekends, which leads me to believe it's something being used by our staff on weekdays. I've tried using Capsa 7.0 to analyze the network during these freezes, but it hasn't yielded anything useful, the network traffic is low, utlization is at 3% or less, no flooding occurring.
Suggestions?
Suggestions?
Have you check that its not your ISP? maybe setting some monitoring solution to check? Does it affect internal file transfers
ASKER
Yes it does affect file transfers internally. and like I said we have 2 ISP accounts (with 2 separate companies) and these freezes occur on both so its highly unlikely to be the ISP.
Sorry I was assuming VPN between 2 sites.
Are you using a Windows Server, Standalone Pc's with windows Shares. can you detail your network layout a little?
:)
Are you using a Windows Server, Standalone Pc's with windows Shares. can you detail your network layout a little?
:)
Does every PC on the network get this network freeze? Do you have an old PC somewhere where full / half duplex is set manually?
http://en.wikipedia.org/wiki/Duplex_mismatch
How did you monitor the network traffic? Did you use port mirroring? Maybe there is a network card gone crazy (defective)? I think that only good analysis of the network traffic will help you get out of trouble.
also netstat -s, netstat -e could help you...
http://en.wikipedia.org/wiki/Duplex_mismatch
How did you monitor the network traffic? Did you use port mirroring? Maybe there is a network card gone crazy (defective)? I think that only good analysis of the network traffic will help you get out of trouble.
also netstat -s, netstat -e could help you...
ASKER
No VPN is in use. It is a Windows network with the majority of the PCs on Windows XP SP3, but some Windows 7 PCs. The Servers are all Windows 2008 Enterprise. The physical network is spread out over approximately 1km of land (connecting several buildings) with wireless (cisco aironet) connection where we cannot physically wire, and fibre connections where we can.
It is possible there is an old PC with a duplex mismatch, but I'm not sure how to track that down? I'm using port mirroring to monitor the port from which all network traffic enters our server stack and the firewall out to the internet.
It is possible there is an old PC with a duplex mismatch, but I'm not sure how to track that down? I'm using port mirroring to monitor the port from which all network traffic enters our server stack and the firewall out to the internet.
Is anyone loading any big exchange mailboxes ?
Are you using Offline Files, and someone is logging of and sending lots of data back to the server?
Had this a couple of times, AntiVirus also on the server side can cause this ... Its little bit of a needle in a haystack to eleminate, port mirroring is the right way but im thinking if its a duff network card it would be happening more frequently...
Are you using Offline Files, and someone is logging of and sending lots of data back to the server?
Had this a couple of times, AntiVirus also on the server side can cause this ... Its little bit of a needle in a haystack to eleminate, port mirroring is the right way but im thinking if its a duff network card it would be happening more frequently...
ASKER
We do have ESET Antivirus remote administrator running on our server and each client connects to it every 10 minutes to update their status and download updates if necessary. How could this cause this sort of issue?
We have no exchange servers in use.
It's possible offline files are in use, but again the port being mirrored is the one which all network traffic passes through to get to the server stack and it shows very little traffic before during and after the network freezing. Yeah I know this is definitely a needle in a haystack as I cannot cause the error to happen myself and it occurs at random times for such a short time period.
We have no exchange servers in use.
It's possible offline files are in use, but again the port being mirrored is the one which all network traffic passes through to get to the server stack and it shows very little traffic before during and after the network freezing. Yeah I know this is definitely a needle in a haystack as I cannot cause the error to happen myself and it occurs at random times for such a short time period.
I'm currently on a site (while writing this) where the server has been crashing every 40mins to 1hr. I have Disabled ESET NOD4 and the issue has just disappeared for the last 3hours.
I't seems that something that they have updated is having a little big of effect with servers, try disabling the services on the server side fora short period (check firewall is secure) see if that helps any ?
I can feel our frustration...
I't seems that something that they have updated is having a little big of effect with servers, try disabling the services on the server side fora short period (check firewall is secure) see if that helps any ?
I can feel our frustration...
Does ping while the network freezes work?
How many of your servers are multihomed?
ASKER
I will try disabling the ESET services and update later how it goes...
ASKER
I tried running ping with the -t flag Jelcin and saw no interruption (reponse times remained the same, no timeouts) during the freezing times, but the freezing is very short term... so not sure if I am getting accurate info there.
ASKER
no multihomed servers at all.
Make sure your servers are updated to service pack 2. there was a bug in SP1 that caused the same issue.
ASKER
Update to this issue, I tried disabling the Anti-virus services (all of them including the remote administration service) and left it for a day... I still had 2 "freezes" during that time, so it doesn't seem to be the issue unfortunately.
As for Service Pack 2, I checked the servers and it does appear that one of the servers only has service pack 1, so I am updating it now, will check back in and let you all know how it goes.
As for Service Pack 2, I checked the servers and it does appear that one of the servers only has service pack 1, so I am updating it now, will check back in and let you all know how it goes.
ASKER
I installed service pack 2 on the server which didn't have it, but today we are experiencing the same "freezes" again, so that didn't seem to fix the issue.
On the DC, go to the command prompt and type:
DCdiag /V
and
DCdiag /test:DNS
Provide the errors you might see.
Another command you might try is Netdiag /v
And look for errors on that.
DCdiag /V
and
DCdiag /test:DNS
Provide the errors you might see.
Another command you might try is Netdiag /v
And look for errors on that.
ASKER
@ ChiefIT
The DCdiag /V returns all tests passed
The DCdiag /test:DNS returned srs-local.local (our domain) failed test DNS.
Here's the snippet of the returned results
The DCdiag /V returns all tests passed
The DCdiag /test:DNS returned srs-local.local (our domain) failed test DNS.
Here's the snippet of the returned results
* rIDNextRID: 2157
......................... SR-DC-2 passed test RidManager
Starting test: Services
* Checking Service: EventSystem
* Checking Service: RpcSs
* Checking Service: NTDS
* Checking Service: DnsCache
* Checking Service: DFSR
* Checking Service: IsmServ
* Checking Service: kdc
* Checking Service: SamSs
* Checking Service: LanmanServer
* Checking Service: LanmanWorkstation
* Checking Service: w32time
* Checking Service: NETLOGON
......................... SR-DC-2 passed test Services
Starting test: SystemLog
* The System Event log test
Found no errors in "System" Event log in the last 60 minutes.
......................... SR-DC-2 passed test SystemLog
Test omitted by user request: Topology
Test omitted by user request: VerifyEnterpriseReferences
Starting test: VerifyReferences
The system object reference (serverReference)
CN=SR-DC-2,OU=Domain Controllers,DC=srs-local,DC=local and backlink on
CN=SR-DC-2,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configurat
ion,DC=srs-local,DC=local
are correct.
The system object reference (serverReferenceBL)
CN=SR-DC-2,CN=Domain System Volume (SYSVOL share),CN=File Replication S
ervice,CN=System,DC=srs-local,DC=local
and backlink on
CN=NTDS Settings,CN=SR-DC-2,CN=Servers,CN=Default-First-Site-Name,CN=Si
tes,CN=Configuration,DC=srs-local,DC=local
are correct.
......................... SR-DC-2 passed test VerifyReferences
Test omitted by user request: VerifyReplicas
Test omitted by user request: DNS
Test omitted by user request: DNS
Running partition tests on : ForestDnsZones
Starting test: CheckSDRefDom
......................... ForestDnsZones passed test CheckSDRefDom
Starting test: CrossRefValidation
......................... ForestDnsZones passed test
CrossRefValidation
Running partition tests on : DomainDnsZones
Starting test: CheckSDRefDom
......................... DomainDnsZones passed test CheckSDRefDom
Starting test: CrossRefValidation
......................... DomainDnsZones passed test
CrossRefValidation
Running partition tests on : Schema
Starting test: CheckSDRefDom
......................... Schema passed test CheckSDRefDom
Starting test: CrossRefValidation
......................... Schema passed test CrossRefValidation
Running partition tests on : Configuration
Starting test: CheckSDRefDom
......................... Configuration passed test CheckSDRefDom
Starting test: CrossRefValidation
......................... Configuration passed test CrossRefValidation
Running partition tests on : srs-local
Starting test: CheckSDRefDom
......................... srs-local passed test CheckSDRefDom
Starting test: CrossRefValidation
......................... srs-local passed test CrossRefValidation
Running enterprise tests on : srs-local.local
Test omitted by user request: DNS
Test omitted by user request: DNS
Starting test: LocatorCheck
GC Name: \\SR-DC-2.srs-local.local
Locator Flags: 0xe00013fc
PDC Name: \\SR-DC-1.srs-local.local
Locator Flags: 0xe00013fd
Time Server Name: \\SR-DC-2.srs-local.local
Locator Flags: 0xe00013fc
Preferred Time Server Name: \\SR-DC-2.srs-local.local
Locator Flags: 0xe00013fc
KDC Name: \\SR-DC-2.srs-local.local
Locator Flags: 0xe00013fc
......................... srs-local.local passed test LocatorCheck
Starting test: Intersite
Skipping site Default-First-Site-Name, this site is outside the scope
provided by the command line arguments provided.
......................... srs-local.local passed test Intersite
C:\Users\srsadmin>DCdiag /test:DNS
Directory Server Diagnosis
Performing initial setup:
Trying to find home server...
Home Server = SR-DC-2
* Identified AD Forest.
Done gathering initial info.
Doing initial required tests
Testing server: Default-First-Site-Name\SR-DC-2
Starting test: Connectivity
......................... SR-DC-2 passed test Connectivity
Doing primary tests
Testing server: Default-First-Site-Name\SR-DC-2
Starting test: DNS
DNS Tests are running and not hung. Please wait a few minutes...
......................... SR-DC-2 passed test DNS
Running partition tests on : ForestDnsZones
Running partition tests on : DomainDnsZones
Running partition tests on : Schema
Running partition tests on : Configuration
Running partition tests on : srs-local
Running enterprise tests on : srs-local.local
Starting test: DNS
Test results for domain controllers:
DC: SR-DC-2.srs-local.local
Domain: srs-local.local
TEST: Basic (Basc)
Warning: The AAAA record for this DC was not found
TEST: Records registration (RReg)
Network Adapter
[00000006] Intel(R) PRO/1000 EB Network Connection with I/O Ac
celeration:
Warning:
Missing AAAA record at DNS server 172.20.72.12:
SR-DC-2.srs-local.local
Warning:
Missing SRV record at DNS server 172.20.72.12:
_kerberos._tcp.dc._msdcs.srs-local.local
Warning:
Missing SRV record at DNS server 172.20.72.12:
_kerberos._tcp.srs-local.local
Warning:
Missing SRV record at DNS server 172.20.72.12:
_kerberos._udp.srs-local.local
Warning:
Missing SRV record at DNS server 172.20.72.12:
_kpasswd._tcp.srs-local.local
Warning:
Missing SRV record at DNS server 172.20.72.12:
_kerberos._tcp.Default-First-Site-Name._sites.dc._msdcs.srs
-local.local
Warning:
Missing SRV record at DNS server 172.20.72.12:
_kerberos._tcp.Default-First-Site-Name._sites.srs-local.loc
al
Warning:
Missing AAAA record at DNS server 172.20.72.12:
gc._msdcs.srs-local.local
Warning:
Missing AAAA record at DNS server 172.20.72.250:
SR-DC-2.srs-local.local
Warning:
Missing SRV record at DNS server 172.20.72.250:
_kerberos._tcp.dc._msdcs.srs-local.local
Warning:
Missing SRV record at DNS server 172.20.72.250:
_kerberos._tcp.srs-local.local
Warning:
Missing SRV record at DNS server 172.20.72.250:
_kerberos._udp.srs-local.local
Warning:
Missing SRV record at DNS server 172.20.72.250:
_kpasswd._tcp.srs-local.local
Warning:
Missing SRV record at DNS server 172.20.72.250:
_kerberos._tcp.Default-First-Site-Name._sites.dc._msdcs.srs
-local.local
Warning:
Missing SRV record at DNS server 172.20.72.250:
_kerberos._tcp.Default-First-Site-Name._sites.srs-local.loc
al
Warning:
Missing AAAA record at DNS server 172.20.72.250:
gc._msdcs.srs-local.local
Warning:
Missing AAAA record at DNS server 172.20.72.250:
SR-DC-2.srs-local.local
Warning:
Missing SRV record at DNS server 172.20.72.250:
_kerberos._tcp.dc._msdcs.srs-local.local
Warning:
Missing SRV record at DNS server 172.20.72.250:
_kerberos._tcp.srs-local.local
Warning:
Missing SRV record at DNS server 172.20.72.250:
_kerberos._udp.srs-local.local
Warning:
Missing SRV record at DNS server 172.20.72.250:
_kpasswd._tcp.srs-local.local
Warning:
Missing SRV record at DNS server 172.20.72.250:
_kerberos._tcp.Default-First-Site-Name._sites.dc._msdcs.srs
-local.local
Warning:
Missing SRV record at DNS server 172.20.72.250:
_kerberos._tcp.Default-First-Site-Name._sites.srs-local.loc
al
Warning:
Missing AAAA record at DNS server 172.20.72.250:
gc._msdcs.srs-local.local
Warning:
Missing AAAA record at DNS server 172.20.72.12:
SR-DC-2.srs-local.local
Warning:
Missing SRV record at DNS server 172.20.72.12:
_kerberos._tcp.dc._msdcs.srs-local.local
Warning:
Missing SRV record at DNS server 172.20.72.12:
_kerberos._tcp.srs-local.local
Warning:
Missing SRV record at DNS server 172.20.72.12:
_kerberos._udp.srs-local.local
Warning:
Missing SRV record at DNS server 172.20.72.12:
_kpasswd._tcp.srs-local.local
Warning:
Missing SRV record at DNS server 172.20.72.12:
_kerberos._tcp.Default-First-Site-Name._sites.dc._msdcs.srs
-local.local
Warning:
Missing SRV record at DNS server 172.20.72.12:
_kerberos._tcp.Default-First-Site-Name._sites.srs-local.loc
al
Warning:
Missing AAAA record at DNS server 172.20.72.12:
gc._msdcs.srs-local.local
Error: Record registrations cannot be found for all the network
adapters
Summary of test results for DNS servers used by the above domain
controllers:
DNS server: 192.168.72.7 (sr-gateway.srs-local.local.)
1 test failure on this DNS server
PTR record query for the 1.0.0.127.in-addr.arpa. failed on the DN
S server 192.168.72.7
Summary of DNS test results:
Auth Basc Forw Del Dyn RReg Ext
_________________________________________________________________
Domain: srs-local.local
SR-DC-2 PASS WARN PASS PASS PASS FAIL n/a
......................... srs-local.local failed test DNS
OK, the freezes seem to come from some missing DNS records:
IPv6 =AAAA as IPv4= Host A DNS records (I hope you followed that).
SRV records are short for SeRVice records for DNS...
These are the fixes you need to overcome:
You must re-register your SRV records, and then enable IPv6 on the nodes that are not registering the AAAA records, and then initiate an registration of DNS records on the server for those nodes. I would fix SRV records first, because these are domain services like the DNS server service and Domain controller authentication server services.
SRV records fix:
Go to these domain controllers/DNS servers:
172.20.72.250
172.20.72.12
First off, look in the DNS snappin forward lookup zone for any greyed out folders. If any greyed out folders for MSDCS.. STOP right here, and please report.
If no greyed folders, let's continue. Next (WITHOUT EVER LOGGING OFF) follow these instructions on both DCs (one at a time).
1) go to the nic card properties and make sure IPv6 AND IPv4 are both enabled. Go to the advanced properties of IPv6 and IPv4 and select the DNS tab. On both IPv6 and IPv4 select these settings:
-Append primary and connection DNS suffix radio button (selected)
-Append parent suffixes of primary DNS suffix (checked)
-Register this connections address in DNS (checked)
2) Go to the command prompt with elevated privileges and type these commands (in exact order):
IPconfig /flushDNS
IPconfig /registerDNS
Net Stop Netlogon
Net Start Netlogon
DCdiag /fix:DNS <(I don't remember if this is a PIPE | or a colon :)
3) go to the other server and do the exact same.
Missing AAAA records:
Any client missing AAAA records on the DNS server you need to perform this on:
1) go to the nic card properties and make sure IPv6 is enabled. Go to the advanced properties of IPv6 and select the DNS tab. On IPv6 settings follow select these attributes:
-Append primary and connection DNS suffix radio button (selected)
-Append parent suffixes of primary DNS suffix (checked)
-Register this connections address in DNS (checked)
2) Go to the command prompt of the machine:
Type:
IPconfig /registerdns
ONCE DONE: let's again check the health of DNS:
on DNS servers command prompt (with elevated priveleges), type:
DCDiag /test:DNS
Once the SRV records AND Host records are fixed in DNS, you should see a performance change for the better.
IPv6 =AAAA as IPv4= Host A DNS records (I hope you followed that).
SRV records are short for SeRVice records for DNS...
These are the fixes you need to overcome:
You must re-register your SRV records, and then enable IPv6 on the nodes that are not registering the AAAA records, and then initiate an registration of DNS records on the server for those nodes. I would fix SRV records first, because these are domain services like the DNS server service and Domain controller authentication server services.
SRV records fix:
Go to these domain controllers/DNS servers:
172.20.72.250
172.20.72.12
First off, look in the DNS snappin forward lookup zone for any greyed out folders. If any greyed out folders for MSDCS.. STOP right here, and please report.
If no greyed folders, let's continue. Next (WITHOUT EVER LOGGING OFF) follow these instructions on both DCs (one at a time).
1) go to the nic card properties and make sure IPv6 AND IPv4 are both enabled. Go to the advanced properties of IPv6 and IPv4 and select the DNS tab. On both IPv6 and IPv4 select these settings:
-Append primary and connection DNS suffix radio button (selected)
-Append parent suffixes of primary DNS suffix (checked)
-Register this connections address in DNS (checked)
2) Go to the command prompt with elevated privileges and type these commands (in exact order):
IPconfig /flushDNS
IPconfig /registerDNS
Net Stop Netlogon
Net Start Netlogon
DCdiag /fix:DNS <(I don't remember if this is a PIPE | or a colon :)
3) go to the other server and do the exact same.
Missing AAAA records:
Any client missing AAAA records on the DNS server you need to perform this on:
1) go to the nic card properties and make sure IPv6 is enabled. Go to the advanced properties of IPv6 and select the DNS tab. On IPv6 settings follow select these attributes:
-Append primary and connection DNS suffix radio button (selected)
-Append parent suffixes of primary DNS suffix (checked)
-Register this connections address in DNS (checked)
2) Go to the command prompt of the machine:
Type:
IPconfig /registerdns
ONCE DONE: let's again check the health of DNS:
on DNS servers command prompt (with elevated priveleges), type:
DCDiag /test:DNS
Once the SRV records AND Host records are fixed in DNS, you should see a performance change for the better.
ASKER
@ ChiefIT
The MSCDS folder is greyed out for the local domain.
The MSCDS folder is greyed out for the local domain.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.