Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

Reoccurring Network Connectivity problem

Posted on 2008-06-16
15
255 Views
Last Modified: 2010-04-10
For several months we have a reoccurring problem with network connectivity on some workstations.  They will loose network connectivity beyond their subnet.  Regardless of various troubleshoot steps it has always been found that changing to a different IP address resolves the problem.  Network connectivity beyond their subnet is restored.

This has been occurring on the same subnet.  The IP address is obtained automatically without trouble through DHCP.  The DHCP request is forwarded by the ip helper-address command on the Cisco Catalyst switches.  But the workstations are unable to contact services beyond their subnet.  For example ping the gateway is successful.  Printing to a network printer on the same subnet is successful.  But the domain controllers and DNS which are on another subnet are unreachable.  The problem is resolved by nothing more than forcing the workstation to use a different IP address.  Such as configuring a static IP address outside of the DHCP range.  If the workstation has a static address, configuring back to DHCP returns the original IP.  Since the problem began it has rotated on some of the same workstations.  Toggling between static IP and DHCP IP has been the temporary resolution until they loose network connectivity again.  

The DHCP is using reservations to prevent foreign objects from easily obtain IP information.

Can anyone relate to this problem?  Do any particular unusual troubleshooting steps come to mind?
0
Comment
Question by:Dovermen
  • 8
  • 7
15 Comments
 
LVL 57

Expert Comment

by:giltjr
ID: 21801817
When this occurs can the PC's ping the IP address of the router for their subnet?

Is the catalyst switch on the same subnet as the computers?  Can the catalyst switch ping the router?

0
 

Author Comment

by:Dovermen
ID: 21803318
Yes, the PC can ping the gateway for their subnet.

The only IP address assign to the switch is a different subnet.  The Catalyst switch can ping the gateways.
0
 
LVL 57

Expert Comment

by:giltjr
ID: 21803466
O.K., that means that the path between the computer and the router/gateway is fine.

So you need to start looking at the path between the router/gateway and the rest of the network.

When the problem happens, is it a single (or a couple) of computers or all computers on the subnet that is having a problem?
0
Portable, direct connect server access

The ATEN CV211 connects a laptop directly to any server allowing you instant access to perform data maintenance and local operations, for quick troubleshooting, updating, service and repair.

 

Author Comment

by:Dovermen
ID: 21804103
When the problem happens it is usually one computer that looses connectivity.  Since the problem started it has generally been the same computers.  In the morning the problem computer will need the IP toggled.  No pattern has been found though.

This seems to be a switching problem.  The computer will have the correct IP information.  The computer is able to contact other devices on the same subnet, and ping the gateway.  The gateway fails to route the traffic to other subnets.  While the gateway is routing traffic for other devices
0
 
LVL 57

Assisted Solution

by:giltjr
giltjr earned 40 total points
ID: 21804618
It should not be a switching problem, otherwise the computer would not be able to communicate with devices on the same subnet.


The next time it happens what I would suggest is that you check the mac-address list and arp table the device that is acting as the gateway/router and on the next "upstream" switch.  So if you have:

 COMPUTER <-> SW1 <-> SW2/ROUTER/GATEWAY <-> SW3 <-> Other Subnet
You need to check SW2/ROUTER/GATEWAY and S3.  

I would also suggest doing two trace routes.  One from the computer that is having a the problem to another device in another subnet, and one from that device to the computer that is having the problem.
0
 

Author Comment

by:Dovermen
ID: 21847580
This problem has occurred from the same two access switches.  They are Catalyst 3524-PWR XL.  Switch A is uplink to the core switch.  Switch B is uplinked to Switch A.  This cascade arrangement is not preferred, but this is how the switches were originally installed.  There are other switches cascaded similarly, with no problems.

I have checked MAC address and ARP table information in the past.  The show arp command on the core switch contains the correct MAC address, IP address, and Interface of the problem computer.  The show mac-address-table command on each switch contains the MAC address and Interface of the problem computer.

I have checked the trace route information from the problem computer and to the problem computer from a different subnet.  The trace reaches the first gateway and then times out continuously.

Maybe this is a routing problem on the core switch.  The core switch will not route the packets to the correct subnet for the problem computers.
0
 
LVL 57

Expert Comment

by:giltjr
ID: 21847668
So you have:

    USER <--> SWB <--> SWA <--> CORE

Now, is the user loosing contact with devices on SWA that are  on a different subnet?  Or is everything on SWA and SWB on the same subnet and they are lossing access to devices that are on, or "beyond" CORE.

I would double check core for the correct route table entries and subnet masks.

Depending on what you have and have not done, you may want to setup a mirror port)(s) as needed on the core switch to verify that traffic is getting in and out of the CORE.
0
 

Author Comment

by:Dovermen
ID: 21859621
The problem computers on both SWA and SWB are on the same subnet and they are losing access to devices that are beyond the CORE.

Each access switch on site is configured similarly.  The core switch has ip routing running.  The route table has each local subnet and the correct route, and a default route to the firewall.  The core switch is able to ping the problem computer and the domain controllers on destination subnet.  The same time the problem computer loses access there are computers on the same switch and subnet without a problem.  The routing problem seems to just apply to the IP of the problem computer.

I will investigate using a protocol analyzer soon.  

The core switch is running the original IOS version of 12.1.  The core switch uptime is over 4 years.  I am considering restarting the core switch, but there many devices that work trouble free.  I could cycle the interface with the problem switches.  The problem switches have been power cycled though.
0
 
LVL 57

Assisted Solution

by:giltjr
giltjr earned 40 total points
ID: 21861456
-->  The core switch uptime is over 4 years.  I am considering restarting the core switch, ...

Make sure you have done a write mem and that you have a offline backup.  I would hate to see you loose something with be being 4 years since a restart.   What type of switch is the core switch?

In addition of doing port mirroring and checking where the traffic is getting lost:

What happens if you issue "ipconfig /release" and then "ipconfig /renew" on the problem computer?

What happens if you systematically clears the arp cache starting the core switch and going out to the switches on the switches that are on a different subnet?
0
 

Author Comment

by:Dovermen
ID: 21884066
The core switch is a Catalyst 3550 12G.

Using ipconfig to release and renew the IP address has always resulted in obtaining an IP address.  This presumable is because of the ip helper command.  The DHCP packets are always passed to the DHCP server.  The problem computers will obtain an IP address automatically, but are unable to reach service beyond their subnet.

I used clear arp-cache command on the core switch and the access switches.  Initially after this command the problem computer today worked, but lost connectivity again.  Then several other problem computers lost connectivity.  But toggling the IP resolved all these computers.

I am planning to restart the switch during off hours.  I am under the impression that this will not resolve the problem, but should be done before escalating.

Thank you for your help thus far!
0
 

Author Comment

by:Dovermen
ID: 21985292
The core switch has been restarted.  The problem continues though.  I am planning to update the IOS.  I believe the version currently on the device is not longer supported.  
0
 
LVL 57

Expert Comment

by:giltjr
ID: 21987360
An upgrade to the IOS will not hurt, but I would still suggest mirroring ports and doing some packet captures.    If the IOS upgrade does not resolve the issue, then port mirroring and packet captures will be the only way to see where the issue is.
0
 

Author Comment

by:Dovermen
ID: 22161342
This problem continues.  I have requested help from a technology solutions partner that has helped us in the past.  They have helped setup port mirroring and packet captures.  The captures seem to show that DNS requests have correct responses, but the workstations seem to ignore the response.

We have ordered new access switches to replace each Catalyst 3524-PWR XL because they are end of life equipment.  Once the switches are replaced if the problem continues, I will continue to investigate the problem.  My focus now is replacing theses access switches.

This question can be closed, or I can continue to post information about the problem.
0
 
LVL 57

Expert Comment

by:giltjr
ID: 22166777
Hopefully they can help.

A couple of tid bits.

Typically the timeout for a DSN request is 30 seconds.  So if it take longer than 30 seconds to get a response back, the computer making the response has stopped listening.

Windows based computers will "shotgun" reqeusts, that is if you have 3 DNS server configured on a Windows box, instead of sending the reqeust to #1, wait 30 seconds, send to #2, wait 30, and then send to #3 and wait 30 seconds. WIndows will send to #1, #2, and #3 at the same time.  It will then honor the first response.  So if #1 comes back with "unknown host" before #2 or #3 returns "here is the IP address"  the "unknow host" response is the one the computer will use and thus return, no such host.
0
 

Accepted Solution

by:
Dovermen earned 0 total points
ID: 22488732
We have replaced each switch.  The problem seems to have ended.

Thank you very much for the information and suggestions.
0

Featured Post

Connect further...control easier

With the ATEN CE624, you can now enjoy a high-quality visual experience powered by HDBaseT technology and the convenience of a single Cat6 cable to transmit uncompressed video with zero latency and multi-streaming for dual-view applications where remote access is required.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Data center, now-a-days, is referred as the home of all the advanced technologies. In-fact, most of the businesses are now establishing their entire organizational structure around the IT capabilities.
If you're not part of the solution, you're part of the problem.   Tips on how to secure IoT devices, even the dumbest ones, so they can't be used as part of a DDoS botnet.  Use PRTG Network Monitor as one of the building blocks, to detect unusual…
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.
This video gives you a great overview about bandwidth monitoring with SNMP and WMI with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're looking for how to monitor bandwidth using netflow or packet s…

860 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question