It may be an issue with a failing switch flooding the network. Do you have any type of network monitoring software.
Are all your network devices on one switch.
Main Topics
Browse All TopicsHello,
We have recently began to experience a severe issue within our network. The issue happens very randomly (it could be as infrequent as once every 2 or 3 weeks, or it could be 2 weeks consecutively). The issue is that ALL network functionality on ALL machines (servers, workstations, IP phones, etc...) completely ceases for 5 - 7 minutes before the problem clears itself and network connectivity is restored. Please note, I can ping the loopback address, but no other addresses (DNS or the Default Gatway). I have done this on multiple machines to verify. Previously I noted that there were MRxSmb 8003 errors appearing on a secondary DC coming from our PDC so I disabled the Computer Browser service on all DC's and member servers (note that I did not modify the registries of any server). Unfortunately, there is nothing in the Event Viewer on any machine for the most recent issue that occured (7:30pm MST), except for a WINS error that it was unable to replicate during the issue.
There are 2 DC's and 8 Member Servers as well as 70 W2K clients and 70 XP Pro boxes. We are using a DLINK DFL 800 NetDefend firewall/VPN appliance.
This problem is driving me absolutely bonkers! Any help or suggestions would be greatly appreciated. Please let me know if you need any additional inforamtion.
This Question has been solved and asker verified All Experts Exchange premium technology solutions are available to subscription members.
Experts Exchange has been collecting answers to technology questions since 1996…3 million and counting! If you have a question, chances are we already have your answer.
If you can't find the exact answer you're looking for, ask our exclusive community of 50,000 experts. You’ll get a personalized answer from a trusted professional.
Thousands of free tech tips, tricks, how-to’s and tutorials are available in our peer reviewed articles section. See for yourself how smart our experts are, no login required.
Access the answers to your technology questions today.
30-day free trial. Register in 60 seconds.
Members of the expert community talk about why the experience at Experts Exchange is different than what you will find anywhere else.

Try it out and discover for yourself.
30-day free trial. Register in 60 seconds.
Join the community of experts here and help other tech pros by answering question in your area of expertise. You can earn FREE access to all Experts Exchange's premium features and resources.
We struck an issue a while back with a D-Link switch which was teamed to a Cisco Switch, breaking the teaming fixed it along with playing around with Spanning Tree & IGMP Snooping.
On a seperate issue we also found that with a new HP C3000 (until we upgraded the IOS/Firmware of the main switch) that we had to remove a teamed port from one of the Blades to actually get things working much more efficiently. Similar issues did appear in both scenarios to what you stated but so randomly that by the time we looked into it - it had either corrected itself or only hit certain PC's on the network..
Maybe try checking that any Domain Controllers ports (teaming) is up to date & running fine as well as the switchgear having the latest firmware.
Hi ChiefIT - thanks for the quick response. I imagine your talking about Windows Server 2003 SP1. I have verified all are running Server 2003 SP2 and our SQL Server is running Server 2008. If you meant the XP clients, I believe all are imaged to XP SP2 and upgraded to SP3.
PriceD - sounds interesting. We are using 8 DLINK DGS-1224 WebSmart Switches. The WebSmart switches provide a simple interface to view collisions, etc... but nothing too fancy. All servers are connected to the first switch, and clients to the rest.
As I was typing this, the issue just occured again - I will check the event viewer logs and report back what I find.
So far the logs look clean. I do see on the PDC around the exact same time the error starts or clears itself (not sure which at this point) there are two events logged in the System Log. This is an informational message:
Type: Information
Source: e1express
EventID: 42
Message: Intel(R) PRO/1000 PM Network Connection driver had been started.
The servers are multihomed, but one of the NIC's is disabled.
Disregard, I see you have D-link switches.
So, the duplex settings are probably not the issue.
On a client, type IPconfig /all, and let's see what problems we have with DNS.
What it sounds like is you have multiple connections BETWEEN switches, to your backbone switches.
You should have one connection per switch to your wiring backbone. Otherwise swtiching gets confused and knocks down switches.
Sorry for the delayed reply. I have run an ipconfig /all from my laptop (Vista 32 Business) and this is what shows:
Windows IP Configuration
Host Name . . . . . . . . . . . . : BRIAN-LAPTOP
Primary Dns Suffix . . . . . . . : LUCER.LUCERESEARCH.LOCAL
Node Type . . . . . . . . . . . . : Broadcast
IP Routing Enabled. . . . . . . . : No
WINS Proxy Enabled. . . . . . . . : No
DNS Suffix Search List. . . . . . : LUCER.LUCERESEARCH.LOCAL
LUCER
LUCERESEARCH.LOCAL
Wireless LAN adapter Wireless Network Connection:
Connection-specific DNS Suffix . : LUCER
Description . . . . . . . . . . . : Intel(R) WiFi Link 5100 AGN
Physical Address. . . . . . . . . : 00-22-FB-2C-09-82
DHCP Enabled. . . . . . . . . . . : Yes
Autoconfiguration Enabled . . . . : Yes
Link-local IPv6 Address . . . . . : fe80::a891:fde6:a668:b676%
IPv4 Address. . . . . . . . . . . : 192.168.3.100(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Lease Obtained. . . . . . . . . . : Tuesday, September 15, 2009 09:38
Lease Expires . . . . . . . . . . : Wednesday, September 16, 2009 09:37
Default Gateway . . . . . . . . . : 192.168.3.1
DHCP Server . . . . . . . . . . . : 192.168.3.1
DNS Servers . . . . . . . . . . . : 204.130.255.3
64.122.32.71
192.168.3.1
NetBIOS over Tcpip. . . . . . . . : Enabled
Ethernet adapter Local Area Connection:
Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Intel(R) 82567LM Gigabit Network Connecti
on
Physical Address. . . . . . . . . : 00-21-70-EC-01-0E
DHCP Enabled. . . . . . . . . . . : No
Autoconfiguration Enabled . . . . : Yes
Link-local IPv6 Address . . . . . : fe80::1054:b9cb:6b6b:ae61%
IPv4 Address. . . . . . . . . . . : 172.16.1.70(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . : 172.16.1.1
DNS Servers . . . . . . . . . . . : 172.16.1.2
NetBIOS over Tcpip. . . . . . . . : Enabled
Ethernet adapter Bluetooth Network Connection:
Media State . . . . . . . . . . . : Media disconnected
Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Bluetooth Device (Personal Area Network)
Physical Address. . . . . . . . . : 00-23-4D-EB-2D-1D
DHCP Enabled. . . . . . . . . . . : Yes
Autoconfiguration Enabled . . . . : Yes
Tunnel adapter Local Area Connection* 7:
Connection-specific DNS Suffix . : LUCER
Description . . . . . . . . . . . : isatap.LUCER
Physical Address. . . . . . . . . : 00-00-00-00-00-00-00-E0
DHCP Enabled. . . . . . . . . . . : No
Autoconfiguration Enabled . . . . : Yes
Link-local IPv6 Address . . . . . : fe80::5efe:192.168.3.100%2
Default Gateway . . . . . . . . . :
DNS Servers . . . . . . . . . . . : 204.130.255.3
64.122.32.71
192.168.3.1
NetBIOS over Tcpip. . . . . . . . : Disabled
Tunnel adapter Local Area Connection* 11:
Media State . . . . . . . . . . . : Media disconnected
Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : isatap.{082354EA-5AC1-44D3
006}
Physical Address. . . . . . . . . : 00-00-00-00-00-00-00-E0
DHCP Enabled. . . . . . . . . . . : No
Autoconfiguration Enabled . . . . : Yes
Tunnel adapter Local Area Connection* 12:
Media State . . . . . . . . . . . : Media disconnected
Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Microsoft ISATAP Adapter #3
Physical Address. . . . . . . . . : 00-00-00-00-00-00-00-E0
DHCP Enabled. . . . . . . . . . . : No
Autoconfiguration Enabled . . . . : Yes
Tunnel adapter Local Area Connection* 13:
Media State . . . . . . . . . . . : Media disconnected
Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Teredo Tunneling Pseudo-Interface
Physical Address. . . . . . . . . : 02-00-54-55-4E-01
DHCP Enabled. . . . . . . . . . . : No
Autoconfiguration Enabled . . . . : Yes
CheifIT - It sounds like you may be on to something with the switches. The consultant that set the network up in 2006 used the DLINK WebSmart switches and it looks like a wiring nightmare. In fact, he even has our internet data circuits connected directly into the switches instead of the DLINK DFL 800 NetDefend/Firewall.
Here is a lame ascii drawing of what I mean.
[DMARC]===>[nternet T1's]===>[SWITCH1]===>[WAN Port 1 on DFL800]===>[LAN Port 1 on DFL800] ==>[SWITCH6]
OK, now I see two problems:
Default Gateway . . . . . . . . . : 192.168.3.1<< GATEWAY/ROUTER
DHCP Server . . . . . . . . . . . : 192.168.3.1<< GATEWAY PROVIDING DHCP
DNS Servers . . . . . . . . . . . : 204.130.255.3<< OUTSIDE DNS SERVER
64.122.32.71<< OUTSIDE DNS SERVER
192.168.3.1<< GATEWAY AS A DNS SERVER
I also see problems with whom is supplying DHCP. 192.168.3.1 appears to be your gateway router. This means it is probably the router for your internet.
The problem with non- Windows servers supplying DHCP is, it will also try to supply DNS. In doing so, it will also not store the DNS SRV records for your servers. SRV records are used for domain services, like AUTHENTICATION, and File replication.
Let me tell you what happens:
Your client will go to the router for DNS, It will see outside DNS servers, (as you can see in your IPconfig). So, it will go to outside DNS for internal DNS resolution. That means it will go to JOE ISP's DNS server in an attempt for you to contact YOUR DOMAIN SERVER. That doesn't work.
So, what you need to do is prevent your router from supplying DHCP. Then, make sure your DHCP server tells your clients to go to YOUR DNS server for DNS.
The inability for your DHCP clients to see your DNS server and Domain server will cause what appears to be intermittent communications between the client and servers.
So, here is a list of things for you to do.
THESE CAUSE PROBLEMS WITH COMMUNICATING WITH THE SERVER ON ALL DHCP CLIENTS:
-Configure your Windows servers to supply DHCP.
-Under DHCP scope options, configure YOUR DNS servers within the list of servers. Under forwarders you can put your Gateway router as a forwarder to DNS.
-Disable your router from supplying DHCP on the LAN side of the router, leaving the WAN Side alone.
-Ensure you are on SP2 for all 2003 servers.
THIS CAUSES UP/DOWN TIMES WITH THE ENTIRE NETWORK WHERE NETWORK CONNECTIONS WILL DISSAPEAR:
-Make sure your switches don't have TWO connections to your backbone switches. ONE connection only, switches are not capable of multicast like that.
Also, download DHCPloc.exe. Install it and run this program to find out if there are any other rogue DHCP servers. Yes, your router is considered a rogue DHCP server.
(ONE LAST THING)
This client appears to be multihomed:
This too causes problems unless configured right.
Get your domain's DHCP and switches in order. Then, I will help you with multihomed computers.
Hi ChiefIT - thanks again for the reply. I will correct the settings on the wireless access point (Linksys WRT300N).
Keep in mind, this was an ipconfig /all from my laptop which has multiple interfaces (wifi, ethernet, bluetooth, etc...)
The network crashes were occuring when there were NO devices connected to the WiFi access point.
All IP addresses on the network are static, except for those that connect to the wifi router (these are set by DHCP). I can simply remove the Linksys WRT300N from the network to simplify troubleshooting if necessary.
Exactly. The switches are going to be another story.
The wiring is absolutely atrocious. Looks like I'm going to have to pick a night to redo it. Sadly the patch panels aren't even labelled. *sigh*
I will upgrade the firmware on the switches and then start with the MOST basic connection from the DMARC > T1's > WAN Port on DFL800 > LAN Port on DFL800 > Switch 1. Looks like a lot of work ahead of me. I will get this straightened out and post back.
Thanks again so far ChiefIT.
Please note I have generated another question based on an issue that came up during the re-wiring.
http://www.experts-exchang
ChiefIT - I rewired from the patch panel to the switch.
There is only one connection from the isp gateway to the firewall, and from the lan port on the firewall back to the switch. The problem occured Sunday after the changes had been made. Again the only error reported was on the PDC at the time of the crash:
Type: Information
Source: e1express
EventID: 42
Message: Intel(R) PRO/1000 PM Network Connection driver had been started.
Additionallly, there are 2 wireless routers that are connected to the switches. The wifi routers are WAN addressed to our private network (172.16.1.X) and LAN addressed to their own subnet (192.168.3.X and 192.168.2.X).
Its the same as before with the exception that the ISP gateway connection is now plugged into WAN port 1 on the firewall and LAN port 1 on the firewall is plugged directly into the 1st port of the 1st switch. All servers are connected to ports 2-13 on the first switch.
Same symptoms - at a random time the entire network loses connectivity. You can only ping the localhost, but no other server or address. When the entire loss of connectivity occures, there are no events logged on the servers except for the PDC:
Type: Information
Source: e1express
EventID: 42
Message: Intel(R) PRO/1000 PM Network Connection driver had been started.
You won't find event log errors in the DC. This is a conflict on how packets are routed over the switches and routers to the outside world.
The symptoms you are describing, I have seen on two different occasions:
1) we had multiple connections, (lets say two ethernet connections), between a remote switch and a backbone switch. In that case, just the remote switch kept loosing intermittently crashing.
2) the second problem was the router was conflicting with another router and the packets didn't know the default route for routing packets. The gateways of the routers needed to be set to the gateway of the enterprise router, (WAN router in your case). It's an easy test to UNPLUG all routers except your WAN router to see if the interference is there.
This sounds like multiple connections between switches, or a router conflict.
It isn't consistent with a domain controller problem.
The log in my firewall is littered with these Warnings:
Date - 2009-09-21 16:15:47
ipdatalen=342 udptotlen=342
Severity - Warning
Category/ID - RULE6000051
Rule - Default_Rule
Proto - UDP
Src/DstIf - lan
Src/DstIP - 172.16.1.85 239.255.255.250
Src/DstPort - 34093 1900
Event/Action - ruleset_drop_packet drop
OK, Looks like a suspect.
Tell me something.
Does your entire network crash with all data, or just UDP data?
DNS uses TCP data. If you periodically don't get internet, I don't think this error pertains to your root problem. Instead, you probably have a router conflict between your WAN router and the Wireless routers.
UDP data is a connectionless, transport layer protocol. In other words, the sending machine will broadcast the data out, and not require a response from the distant end. This UDP problem is not your problem. UDP is generally not routeable. However, If your firewall were blocking TCP or IP as well as UDP, then I would say this is your problem.
Your issue is the network ceases at various times. In other words, you're having problems with TCP/IP as well as UDP.
That is a physical layer or data link layer problem, not a problem with UDP.
A physical layer or data link layer problem means its either routing or Hardware conflict of the items that are responsible for routing, (like switches and routers).
This is why it would be best to get a networking engineer or Networking expert on EE to help you. I am best at domain administration.
If in your shoes, I would seek someone with a bit more experience in networking than me. So, press the request attention button and ask a moderator to put this in the networking.
OR create a new post in the networking zone, with routers.
If you choose that option. Let me know of this new post and I will show up there to work with the other experts and you on getting this resolved.
If you want to know about the LAYERS I was talking about. You can read this information after you have been helped:
http://en.wikipedia.org/wi
Internal communications are good, but external communications are bad. This sounds more and more like a routing conflict.
Can you ping www.Google.com during the bad times?
@LR Brian:
Just a little extra information on STP:
STP stands for "Spanning Tree/Portfast". If portfast is not enabled, you will default to spanning tree. Spanning tree can cause intermittent communications on the network. Portfast strips some routing informaiton that delays the packets from routing. Spanning tree takes about 45 seconds to discover routes to the nodes. That will time out XP machines or newer AND you will see intermittent network connectivity on all xp machines an newer, but not see this on 2000 machines and older.
--A network topology will help your networking experts troubleshoot and fix this emensely.
@ikalmar: Thanks for joining us ikalmar. We need a networking tech with us.
Just a little info for you:
The wiring has been changed and the author has 2 wireless routers within the network. The wiring change was to prevent dual connections to the backbone switches of the network. The inability to multicast on these switches has caused intermittent network connectivity. From What I am see, it appears like we may have two routers that are conflicting and therefore the routes are confused. You have seen two routers mess each other up, haven't you???
Issue occured again... Here is what the switches look like:
Heres what it looks like
ISP Gateway ==> DFL800 Firewall WAN1
DFL800 Firewall LAN1
|| (port 23 of 1st switch)
Switch 9 - 172.16.1.21 (DXS-3227) = STP Enabled
||
Switch 2 - 172.16.1.14 (DXS-1224) = STP Enabled
||
Switch 3 - 172.16.1.15 (DXS-1224) = STP Enabled
||
Switch 4 - 172.16.1.16 (DXS-1224) = STP Enabled
||
Switch 5 - 172.16.1.17 (DXS-1224) = STP Enabled (Priority set to 8092)
||
Switch 6 - 172.16.1.18 (DXS-1224) (Older FW, doesn't have ST option, STOP NOT enabled)
||
Switch 7- 172.16.1.19 (DXS-1224) = STP Enabled
||
Switch 8 - 172.16.1.20 (DXS-1224) = STP Enabled
||
Switch 1 - 172.16.1.13 (DXS-1224) = STP Enabled
I have plans tonight to upgrade the firmware on all switches (should bring the .18 switch up with Spanning Tree).
Additiionally, I installed the clunky DLINK monitoring software that auto discovers the websmart switches said that .13, .14, .15, .16 were "Not alive" during the issue.
Business Accounts
Answer for Membership
by: ChiefITPosted on 2009-09-14 at 19:26:40ID: 25331139
Intermittent networking can be caused by a number of different factors.
First off, it sounds like your having problems with Service pack 1. If using SP1, then download and isntall SP2.
If this works, I will give you the information on what's wrong. But, let's get this fixed for you first.
If not SP1, then reply back.