Link to home
Start Free TrialLog in
Avatar of TheITGuy
TheITGuy

asked on

Excessive network broadcast traffic, non-virus related.

Issue: Users complaining the of slowdowns in network applications. The software pauses (but doesn't totally lockup) and then comes back afterabout 30 seconds. We see a lot of broadcast traffic on the switch activity lights. Seems to be lookup requests, "Who has IP address x?", response "I'm at IP address x".  The DNS server is functioning ok and we can resolve addresses.  We have had this problem for about a year now. We had Intel switches before but have replaced them with brand new HP ProCurve switches about a month ago. We're still seeing the same problems even with the new switches.

We do have a unusual DNS and DHCP server setup. We're running Lucent QIP service on a Dell Optiplex as our DNS server and using a Shiva LanRover VPN for a DHCP server.

We also have a frame-relay between our main office and two branch offices.  I thought maybe there could be broadcast traffic coming from either of the two locations, or a loop somewhere in the router causing excessive traffic:

Main Office:
192.168.110.1 - 192.168.111.254 (servers and workstations)
192.168.2.1 - 192.168.3.254 (printers)

Connected to Main office via frame-relay and Cisco 1601 routers:
Branch office #1:  192.168.19.1 - 192.168.19.254
Branch office #2:  192.168.16.1 - 192.168.16.254


-- Start of config ---

#show config
Using 2107 out of 7506 bytes
!
version 12.0
service timestamps debug uptime
service timestamps log uptime
service password-encryption
!
hostname LBPP-OK
!
enable password 7 0014010915
!
!
!
!
!
ip subnet-zero
no ip domain-lookup
!
!
!
process-max-time 200
!
interface Ethernet0
 description connected to Tulsa Office
 ip address 192.168.110.253 255.255.255.0 secondary
 ip address 192.168.111.253 255.255.255.0 secondary
 ip address 192.168.2.5 255.255.255.0
 no ip directed-broadcast
 media-type 10BaseT
!
interface Serial0
 no ip address
 no ip directed-broadcast
 encapsulation frame-relay
 cdp enable
!
interface Serial0.1 point-to-point
 description connected to SHOP
 ip address 192.168.18.1 255.255.255.0
 no ip directed-broadcast
 frame-relay interface-dlci 300
!
interface Serial0.2 point-to-point
 description connected to HC
 ip address 192.168.17.1 255.255.255.0
 no ip directed-broadcast
 frame-relay interface-dlci 200
!
router rip
 version 2
 network 192.168.2.0
 network 192.168.16.0
 network 192.168.17.0
 network 192.168.18.0
 network 192.168.19.0
 network 192.168.110.0
 network 192.168.111.0
 no auto-summary
!
ip classless
ip route 0.0.0.0 0.0.0.0 192.168.2.254
ip route 192.168.0.0 255.255.0.0 Ethernet0
ip route 192.168.16.0 255.255.255.0 192.168.17.2
ip route 192.168.19.0 255.255.255.0 192.168.18.2
ip route 192.168.110.0 255.255.255.0 192.168.17.2
ip route 192.168.111.0 255.255.255.0 192.168.17.2
ip http server
!
snmp-server engineID local 0000000902000030943FBAAA
snmp-server community public RO
snmp-server location Tulsa
banner exec ^CCC

--End of config--
Avatar of naveedb
naveedb

You mentioned that you suspect broadcasts, how have you arrived on this conclusion?
ASKER CERTIFIED SOLUTION
Avatar of Rick Hobbs
Rick Hobbs
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Sorry, but I though I might mention one more thing,  Cisco IOS 12.0 had a few weird issues that were resolved in 12.1.   If you can, flash that sucker.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of TheITGuy

ASKER

naveedb,

I say broadcasts because when we experience excessive traffic on the network, ALL ports are lighting up with activity.  Of course there is the standard traffic flashing when one port is going point to point with another, but intermittently we see all ports light up on the core switch. It doesn't happen at any certain time during the day either. We thought maybe SMS was kicking off a hardware inventory or the Windows Update Server was pushing out updates, so the issue comes and goes throughout the day.

Rick,

We have found a few loops in our network.  I did a physical walkthrough of our network and made a Visio diagram of every core and end switch including the small 5-port switches we have where people are sharing cubicles. I believe those have been resolved. These new HP ProCurve switches have a great web based monitor and we've used it to track down at least one bad NIC so far (which was causing tons of broadcast traffic until it was replaced, but we're still not quite back to normal).  I'll take your advice on hooking up a hub and my Ethereal notebook to it. I'll see if I can spot any unusual traffic. I'll also see if I can flash the Cisco routers to 12.1.

ECNSSMT,

Just to give you a little background info on our network configuration,  I can say the Cisco config was created out of necessity. Basically we lost the config and had to go off an old one and modified it to fit our network IP scheme.  

I can tell you that we're running a messed up subnet mask. It's a class C subnetted as a class A.  255.255.0.0  or /16

For example our main office servers and workstations are on:

192.168.110.0
192.168.111.0
Subnet mask: 255.255.0.0

Printers are on:

192.168.2.0
192.168.3.0
Subnet mask: 255.255.0.0

As you can see from the config, our Cisco router at the main office has an ethernet0 interface IP address of:
192.168.2.5
192.168.110.253
192.168.111.253

The frame-relay is assigned 192.168.17.1 going out to "HC" which is a remote office and 192.168.18.1 which going out to our "SHOP" remote office.
192.168.17.2 is the interface on the backside, basically the interface for the "HC" router.  192.168.18.2 is the interface for the "SHOP" router.

I was never sent to Cisco training so I've learned by using the terminal help reference, trial and error, and what I could find on the web. I have no idea how to configure RIP, so I have added all the IP addresses that I thought might be used.  So I'm assuming I should go back and modify those only to include the IP addresses for that particular router (which I can easily do).

There is no direct connectivity between 192.168.2.0 and 192.168.16.0, 192.168.19.0, but there IS connectivity between 192.168.110.0 & 192.168.111.0.  This is because our local network printers (HP JetDirects) are sitting on 192.168.2.0 - 192.168.3.0.

If you need any more info, I'll be glad to answer any questions!  We're basically exhausted our troubleshooting ideas and the info you guys are providing is a great help.
Sounds like you are resolving multiple issues.  Keep us posted as to your results.
naveedb,

I say broadcasts because when we experience excessive traffic on the network, ALL ports are lighting up with activity.  Of course there is the standard traffic flashing when one port is going point to point with another, but intermittently we see all ports light up on the core switch. It doesn't happen at any certain time during the day either. We thought maybe SMS was kicking off a hardware inventory or the Windows Update Server was pushing out updates, so the issue comes and goes throughout the day.


--------------

Select a workstation as a test machine and the server to monitor the activity between the two. To start, try an extended ping from the machine to server during course or day and see if the response times changes drastically when the application slow down. If it does, then we can try to further narrow it down.

As rickhobbs mentioned earlier, you are looking at many issues at the same time, so keep us posted as it could be any of above
Hi TheITGuy,

I've been thinking about this and I've been thinking about the best and safest way to clean up your network.  A lot of the commands tha you have in this config is unnecessary and probably consumes needless layer3 bandwidth if packets took those routes.  

Before saying any thing else, it would be advisable to back up your configs as a general precaution.

much of the implicit directions are not needed as far as I can tell.  
ip route 0.0.0.0 0.0.0.0 192.168.2.254
ip route 192.168.0.0 255.255.0.0 Ethernet0
ip route 192.168.16.0 255.255.255.0 192.168.17.2
ip route 192.168.19.0 255.255.255.0 192.168.18.2
ip route 192.168.110.0 255.255.255.0 192.168.17.2
ip route 192.168.111.0 255.255.255.0 192.168.17.2

could be gotten rid of as RIP should be maintaining your routes.



 network 192.168.2.0
 network 192.168.17.0
 network 192.168.18.0
 network 192.168.110.0
 network 192.168.111.0
These should be the only networks this router should see and advertise.  Everything else per this topology should be good.  But it would be nice to see the other router configs.  And without giving ut your internet IP address, it would be nice to note which router and port the internet connection is on.

192.168.110.0 /255.255.0.0 & 192.168.111.0/255.255.0.0; I do not see in the config of the router, but I do see a 255.255.255.0 SM which is good; I can only assume that this information with the 255.255.0.0 SM is placed on your PCs, Servers and printers.  Which bring me to the 1st thing that should be talked about

In you main office; how many PCs, servers and printers do you have?  And what is the justification for this topology (technical or non technical)

I got more thinking (and sleeping) to do...

Regards,


I cleaned up some of the ip routes that were listed there. I left the "router rip" entries alone.  I do recall now why I added that "ip route 0.0.0.0 0.0.0.0 192.168.2.254" statement in there.  That was to allow internet (web surfing) traffic from our remote fabrication shop office.  Basically they are coming in through their Cisco 1601, across the frame-relay into our Cisco router and then on to our ISP provided router sitting on 192.168.2.254 in order to get to a name server and pull up internet web sites.  That's a long way around to get there, but it works.  

I couldn't get anything else to work besides opening it up all the way to allow "0.0.0.0" or "all traffic" into 192.168.2.254, the internet gateway.  The default gatway on the machines at the shop office are set to 192.168.19.5, which is the IP address of the Cisco 1601 out there. I know there has to be a better way to do it. Because here's the thing, when I enable that route, it totally maxes out the CPU on the router, the telnet session becomes very slow when typing. That must be due to all the traffic the Cisco is filtering through.

"192.168.110.0 /255.255.0.0 & 192.168.111.0/255.255.0.0; 255.255.255.0 SM
255.255.0.0 SM on your PCs, Servers and printers"
That's correct, what's listed above. We used to run a subnet mask of 255.255.255.0 back when we only had an IP address range of 192.168.2.0, which is also why the shop office machines is running on 192.168.19.0, SM 255.255.255.0.  Once we moved into more IP addresses, we had to go to a SM of 255.255.0.0 on our PCs, servers, and printers.

I wish I could give you this Visio diagram I have of our network.  You'd probably cringe when you saw the way it's layed out.  Most of the config is due to a combination of technical and non-techical justification. We started out with a small 100 computer LAN running on a 192.168.2.0 IP address range and a subnet mask of 255.255.255.0. Once we outgrew that and moved on to 192.168.3.0 and started using 255.255.0.0 and just kept building from there. It was poor planning at the time. We're being forced to move over to the "corporate" network IP address range at the end of July, so all of this will change at that time.  But we're still having problems with the network ARP traffic at the moment. Which brings me to the next discovery.

Rick suggested above that I should place a hub on the core switch and plug my Ethereal workstation to it. Once I did that and started a capture, I could see 80% of our traffic was ARP.  When I ran Ethereal before, I had it plugged into the switch, but since all that ARP traffic wasn't intended for the machine I was using, I'm assuming it was just dropping the packets, like a switch is supposed to do. But now, that hub is re-broadcasting everything coming though the pipe and I'm seeing it now.

Just another piece to the puzzle!  

I appreciate any additional input and I think we're on the right track now. I'm going to bump this problem up to 500 points because it is a very difficult question.

Thanks,
-Jason


Is it a specific PC, Server, or peice of network hardware generating the ARP traffic?  If you can locate the offending machine or machines, you can determine what on the machine is causing the problem with  sysinternals.com tcpview.
I suspected it might be a worm or virus that was searching the network for PCs to attack, but when I sort through the Ethereal logs I don't see any one machine(s) in particular that are causing the traffic. We did have a virus outbreak that slipped past Computer Associates eTrust and Symantec AV last year but we tracked down all those systems and manually removed that virus.

I do notice our PDC is getting almost solid traffic through the day. I'm assuming that's normal because the machines are authenticating as they access network shares and also browsing for network resources.
Yes that is normal.  What percentage of your network bandwidth is being used up by the overhead of ARP and broadcasts?
I've been running Ethereal throughout the day for 30 minutes at a time and I see about 60-80% ARP traffic and about 30% TCP traffic. UDP only shows about 5% or so.

We hired an outside "expert" last week and he suggested we enable broadcast control on the switches. It caps the traffic on a port if it reaches 20% utilization. To me that's a band-aid to the real problem.

I think most of the traffic could be NetBIOS requests/announcements as you suggested.  We really can't disable NetBIOS right now because there are a few legacy (NT 4.0) servers still running on our network and not hanging off our domain.  If we disable NetBIOS the workstations are unable to access those servers.

We are running a WINS server which I think might be unnecesary now that we have a real DNS server in place. We setup a WINS server a few years back so that the computers at our shop office (connected via Cisco router) could resolve server addresses on our network.  Now that all the machines out there are running XP and we have our network range opened up to allow traffic between the DNS server and those workstations and they seem to resolve fine now.

I'm going to keep monitoring to see if I can pin down the "top talkers" on our network and find out what the cause of the broadcasts are.
Sounds like you've got it under control.  If you identify the top talkers you may be able to cap the traffic on just their ports (or find out why they are being so damn chatty).  But what I am curious about is what percentage of network bandwidth, not percentage of used bandwidth.   The reason I ask is that if the total of your used bandwidth is less than half of your available bandwidth, then you are fine even if the broadcasts are 60-80% of it.
Did you revolve your problem?  If not, go to Community Support and ask for a refund of points.  If yes, close this question.
We're still having problems even at our new office location with brand new wiring.  I've tried everything I know to troubleshoot and fix the issues.  Until they bring in a network expert to analyze the our network configuration we're just getting by as it is.  I appreciate your comments and suggestions. They have helped me understand a little bit more about what's going on in our network system.  

Thanks!
Glad I could be of some assistance.  Thanks!