Link to home
Start Free TrialLog in
Avatar of jfdpratt
jfdprattFlag for United States of America

asked on

Network timeout traversing VLAN

We have currently replaced our Dell network with a new Cisco based network.  We have a weird issue that we can't seem to resolve.  We have a DVR system that has four units that report to one server.  The IP of the server displays the cameras of all four units in a web browser.

All of these units are on the same VLAN as the server except for two of them.  For some reason, if you watch these two units in the web browser they will drop exactly every 10 minutes (600 seconds).  This is the only visible indication that we have, however, we have been told that some users are having issues printing.  They will print to a print server that is on another VLAN as well.  There are a few client/server apps that have had issues as well.  If we move these DVR units to the Server VLAN, there are no issues.

We have a 2821 Router, two 4900 Layer 3 switches that handle most of our routing, and 2960 switches all with a 10G fiber backbone.  I have been looking all over for anything that is triggered or occurs every 10 minutes (600 seconds).  We are at a loss with this.  We have been looking at captures, isolating equipment, and patching things around to make this work.  Looking for any advice.  Thanks.
Avatar of mat1458
mat1458
Flag of Switzerland image

For troubleshooting it would be helpful to have the configurations of the devices along with show version, show interface status of the switches and a small layout of the network.
Avatar of jfdpratt

ASKER

I've attached the Config's and Version information for both our 4900's as well as one of our Switch stacks.

I've also attached a small network diagram.  There are only two things that I can notice that this drawing doesn't reflect.  The connection from our 2821 is not going to the 4900, it goes to gig 1/0/1 on the Mainstack-2960-1.  Each of the stacks also have two separate fiber connections back to each 4900.

If you have any questions, let me know.  I really do appreciate any help that you may be able to provide.
2960-MainStack-Config.txt
2960-MainStack-Version.txt
4900-Switch-Config.txt
4900-Switch-Version.txt
Network.pdf
The configs do not show anything that seems to be the cause of your problems. However (not having a 4900 around to test) don't you have to give the "ip routing" command to turn it into a layer 3 switch? Furthermore the tracking in HSRP does not make much sense to me. You decrement the priority of an interface that already has line protocol down.

If you say that if you move the DVR units to the server VLAN you can watch all units without interruption? Can you watch them from other VLAN or only withing the server VLAN? In which VLAN are you sitting while having the problems and in which VLAN are the DVR systems that are not accessible? Do you see the traffic in one VLAN go up regularly?

Maybe show log, show interface | include error, show interface status and show ip interface brief can hepl further.
I didn’t see anything in the configs either, that is why we at a loss.  I’m not sure about the “ip routing” command to turn it into a layer 3 switch.  I know that when we had our old network in place, all of this routing was being done on our Cisco 2821 router.  Now that we have the 4900’s in place, we have turned off our router to test, and the network still functions between VLANs, so it must be working as a layer 3.  To make sure that it wasn’t any conflicting issues with the router, we powered it down for 30 minutes and the problem still existed on the network.

Honestly, I had some help with HSRP, and a third party vendor took care of that portion for us.  So your statement about “decrementing the priority of an interface that already has line protocol down” isn’t something I’m sure about.

As far as the setup, I have attached a DVR network pdf showing how they are connected.  We have three Kollectors that all the cameras physically connect to, and they relay that information to the Matrix/Server over the network.  2 - 4 are on the 32 VLAN, the same as the Matrix/Server, and have no issues at all.  Kollector 1 is on the 34 VLAN and is the one that cuts out every 10 minutes (600 seconds) exactly.  You can set a stopwatch and it will drop exactly at 10 minutes.  By dropping I mean that if you are watching the cameras via the web on the address of the Matrix/Server, from any VLAN on our network, it drops.  You can immediately put the cameras back in the browser, but they will drop exactly 10 minutes after the last drop.  If we set the IP on Kollector 1 to the 32 VLAN, and jumper it over to a switch on that VLAN, there are no issues at all.

This is just the only visual indication we have, but others have issue with large print jobs (all servers/print servers on their own 31 VLAN), client/server apps drop all the time, etc.

We are at a loss, and the only work around at this time is to make sure those users who print large jobs, or have servers they need to have access to, are patched over to the VLAN of the server they need.  I have attached anything I think may be relevant, with hopes of any advice.  Thanks for looking at this.
show-log.txt
show-int--in-error.txt
show-int-status.txt
show-ip-int-brief.txt
DVR-Network.pdf
sh-standby.txt
sh-int.txt
Thanks for the input, everything still looks pretty good. The good news that I can bring is that i personally doubt that you have a network problem, the bad news is that I do not yet not what the source of your error could be.

Reviewing your outputs has only shown me one thing that seems strange: you have a very high rate of broadcast in your network. In the old times the number of input packets compared to the broadcasts was said to be ok at a ratio of 10/1 in a normal windows network. Now in your case the number of broadcast is much higher than the input packets which seems strange to me but this might be a new way of show interface output. However, for you it might be good to find out what the source(s) of these broadcasts are. You could attach a Wireshark PC to a monitor port and see what comes in (top speakers etc.)

I am not sure but I think you can't use IP accounting on the switches, however giving it a try might also shed some light on the traffic flows if it works.

Can you also do a ping in parallel from/to different devices in VLAN 32 and 34? Just to see if you experience any outages in the ping.

What OS and what application runs on Kollektor1? Can you shed some light on that machine as well? If it's Windows add a route print and an ipconfig /all, if Linux a netstat -rd and an ifconfig -a. What product are the cameras and in which VLAN are they located?

So much for now, we sure find the needle in the haystack.
The Kollector's run an embedded Win XP OS provided by the Vendor.  We have ran a ping from the Kollector to the Matrix, PC viewing the webpage to the Matrix, and from the PC viewing the webpage to the Kollector. . . none of them drop during this time.

We ran a wireshark capture and didn't see anything that looked abnormal.  We had a case opened with TAC and they looked at it as well. . . nothing.  We aren't totally sure that this is the same thing that is causing the printing and client/server issues, but at the same time we know it is something that shouldn't be happening and is the only visual indication we have to look at.

At one point, we even started disconnecting switches thinking that it may be some sort of external device causing issues.  Couldn't find anything that pointed to anything in particular.  I agree, that it is becoming a needle in an extremely large haystack.
ASKER CERTIFIED SOLUTION
Avatar of mat1458
mat1458
Flag of Switzerland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I would have to replicate a port or see if I can find a hub somewhere to capture the traffic on that port.  I will try and do that sometime today.
We did some deeper troubleshooting, and even had an outside firm come in to take a look at our entire network.  There was nothing found to be wrong, and the problem has just disappeared.  It appears that there may have been some sort of rogue device on the network, or something that is no longer there.