Link to home
Start Free TrialLog in
Avatar of MagnaTPF
MagnaTPFFlag for United States of America

asked on

Network connectivity issue

We have been having issues tonight in our plant with network connectivity drops.  We're running on a Cisco 4507R network CORE switch.  The drops occur sporadically at random intervals for about 5 to 10 minutes each time.  None of our switches are showing any errors in the logs, no CPU spikes, plenty of memory left on the CORE switch.  The drops are occurring on several different VLANS which suggests to me that its an issue with our CORE switch.  There isnt very robust logging enabled on the switch and I'm unable to make changes to that without prior approval.
I have a theory and was hoping for input.  We have an HP Proliant ML350G4 with Windows Server 2003 that has been having several software crashes and errors.  This server also happens to be our HP SNMP trap server that receives the traps from 25 other HP servers on the same network.  If this trap server was unable to process or receive snmp traps for any reason, could that cause a network drop or spike if the other server are spammings smp requests or heartbeats??  Your input would be greatly appreciated, thanks!
Avatar of Soulja
Soulja
Flag of United States of America image

Have you insured there a no loops on your network. It could be as simple as a phone plugged into data and voice port. How is your spanning tree looking?
Avatar of MagnaTPF

ASKER

How can i check for loops?  I'm not a networking guy, but our company Cisco guy looked at out switch and was unable to find a problem.  He is currently unavailable, but we have not had the issue in an hour or so.  It seems to have stopped when our trap server was rebooted, thats why I put that in there.
There were no errors in the core log, unfortunately the error logging isnt setup for port logging.
If the server could be the issue. Unplug it from the network and see if the issue reoccurs. If so, then it's not the server.
I would like to but we're in production right now and they wont let me risk additional downtime...they're happy if its running and dont want to mess with it until shift end...which is at 4am :(
So if it is the server, unplugging it right now isnt a good step...
You sound like you are in a GM Factory. That's the line I used to get from plant staff when I used to support GM. Okay, then try is after the shift ends. As for loops, they are the hardest to diagnose. Tell your cisco guy to look for any ports that should be blocked that are not. Look for and trunks that possibly have port fast configured. Enable bpdu guard on all access layer ports connected to hosts.
LMFAO  Not GM but One of the Big Three :)
This is gonna be very difficult to see an issue when we're not running as our network load dies off considerably when the plant software and PLCs aren't in production mode, although it could still happen.  Guess we'll have to wait and see.   Thanks, will let you know if or when I can isolate this server.  If you come up with anything else I can look at please let me know!
Hi,

This can be as simple as a misconfigured switch, a patch cord o a physical loop...

Well, You mentioned SNMP, so perhaps you have SNMP on the switchs/router too, check it out
Clean the counters and review it
Check on the arp tables if some switch see itself on a same switch port.
Verify configurations and Vlan status and VTP VTP Operating Mode
If possible restart the switches and/or disconnect from the net all non-critical devices.

Hope helps
I checked our core switch for snmp stats and I can see we had 10630 "No Such Name errors" on the packet output side.  Could this have caused a network problem?
I also checked the ARP table.  There were some incomplete entries but for addresses no longer used
ASKER CERTIFIED SOLUTION
Avatar of hvillanu
hvillanu
Flag of Mexico image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
No Duplicate IP address in the ARP
There are some duplicate MAC addresses but those mac address have multiple IP assigned to them for Cluster Management and load balancing reasons.
Can you install a Sniffer to see if have some Virus or overload traffic?
I can put wireshark on a laptop and setup a monitoring session to a port on the CORE for our server VLAN.  Don't know what I'm looking for but maybe something will show itself.
Hi,
Yes, look for something unusual  on the traffic, if you already check the switches/router and cable connections.
I had a similar issue a few years back - no indication of where the problem was....

turned out to be a bad NIC on one of our servers - it was experiencing other issues too.

there were also no indications in the events or logs on the server that there was an issue with the card

I remember seeing crc errors in the switch and finally identifying the port the server was on.
Avatar of D_Vante
D_Vante

Sounds like the switch is in a production enviroment and would be very costly if it where to die one day.  Request a seconds switch and divide the load.   Then watch to see if the problem stays with the old switch
The network issues stopped following some software fixes with several troubled pieces of software on our SNMP server causing faults and crashes. Since then, there have been no issues with that server and no SNMP errors reported on our switches, and no network connectivity losses.
Thank you all for your help and ideas