Solved

Reoccurring Network Connectivity problem

Posted on 2008-06-16
15
252 Views
Last Modified: 2010-04-10
For several months we have a reoccurring problem with network connectivity on some workstations.  They will loose network connectivity beyond their subnet.  Regardless of various troubleshoot steps it has always been found that changing to a different IP address resolves the problem.  Network connectivity beyond their subnet is restored.

This has been occurring on the same subnet.  The IP address is obtained automatically without trouble through DHCP.  The DHCP request is forwarded by the ip helper-address command on the Cisco Catalyst switches.  But the workstations are unable to contact services beyond their subnet.  For example ping the gateway is successful.  Printing to a network printer on the same subnet is successful.  But the domain controllers and DNS which are on another subnet are unreachable.  The problem is resolved by nothing more than forcing the workstation to use a different IP address.  Such as configuring a static IP address outside of the DHCP range.  If the workstation has a static address, configuring back to DHCP returns the original IP.  Since the problem began it has rotated on some of the same workstations.  Toggling between static IP and DHCP IP has been the temporary resolution until they loose network connectivity again.  

The DHCP is using reservations to prevent foreign objects from easily obtain IP information.

Can anyone relate to this problem?  Do any particular unusual troubleshooting steps come to mind?
0
Comment
Question by:Dovermen
  • 8
  • 7
15 Comments
 
LVL 57

Expert Comment

by:giltjr
ID: 21801817
When this occurs can the PC's ping the IP address of the router for their subnet?

Is the catalyst switch on the same subnet as the computers?  Can the catalyst switch ping the router?

0
 

Author Comment

by:Dovermen
ID: 21803318
Yes, the PC can ping the gateway for their subnet.

The only IP address assign to the switch is a different subnet.  The Catalyst switch can ping the gateways.
0
 
LVL 57

Expert Comment

by:giltjr
ID: 21803466
O.K., that means that the path between the computer and the router/gateway is fine.

So you need to start looking at the path between the router/gateway and the rest of the network.

When the problem happens, is it a single (or a couple) of computers or all computers on the subnet that is having a problem?
0
 

Author Comment

by:Dovermen
ID: 21804103
When the problem happens it is usually one computer that looses connectivity.  Since the problem started it has generally been the same computers.  In the morning the problem computer will need the IP toggled.  No pattern has been found though.

This seems to be a switching problem.  The computer will have the correct IP information.  The computer is able to contact other devices on the same subnet, and ping the gateway.  The gateway fails to route the traffic to other subnets.  While the gateway is routing traffic for other devices
0
 
LVL 57

Assisted Solution

by:giltjr
giltjr earned 40 total points
ID: 21804618
It should not be a switching problem, otherwise the computer would not be able to communicate with devices on the same subnet.


The next time it happens what I would suggest is that you check the mac-address list and arp table the device that is acting as the gateway/router and on the next "upstream" switch.  So if you have:

 COMPUTER <-> SW1 <-> SW2/ROUTER/GATEWAY <-> SW3 <-> Other Subnet
You need to check SW2/ROUTER/GATEWAY and S3.  

I would also suggest doing two trace routes.  One from the computer that is having a the problem to another device in another subnet, and one from that device to the computer that is having the problem.
0
 

Author Comment

by:Dovermen
ID: 21847580
This problem has occurred from the same two access switches.  They are Catalyst 3524-PWR XL.  Switch A is uplink to the core switch.  Switch B is uplinked to Switch A.  This cascade arrangement is not preferred, but this is how the switches were originally installed.  There are other switches cascaded similarly, with no problems.

I have checked MAC address and ARP table information in the past.  The show arp command on the core switch contains the correct MAC address, IP address, and Interface of the problem computer.  The show mac-address-table command on each switch contains the MAC address and Interface of the problem computer.

I have checked the trace route information from the problem computer and to the problem computer from a different subnet.  The trace reaches the first gateway and then times out continuously.

Maybe this is a routing problem on the core switch.  The core switch will not route the packets to the correct subnet for the problem computers.
0
 
LVL 57

Expert Comment

by:giltjr
ID: 21847668
So you have:

    USER <--> SWB <--> SWA <--> CORE

Now, is the user loosing contact with devices on SWA that are  on a different subnet?  Or is everything on SWA and SWB on the same subnet and they are lossing access to devices that are on, or "beyond" CORE.

I would double check core for the correct route table entries and subnet masks.

Depending on what you have and have not done, you may want to setup a mirror port)(s) as needed on the core switch to verify that traffic is getting in and out of the CORE.
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 

Author Comment

by:Dovermen
ID: 21859621
The problem computers on both SWA and SWB are on the same subnet and they are losing access to devices that are beyond the CORE.

Each access switch on site is configured similarly.  The core switch has ip routing running.  The route table has each local subnet and the correct route, and a default route to the firewall.  The core switch is able to ping the problem computer and the domain controllers on destination subnet.  The same time the problem computer loses access there are computers on the same switch and subnet without a problem.  The routing problem seems to just apply to the IP of the problem computer.

I will investigate using a protocol analyzer soon.  

The core switch is running the original IOS version of 12.1.  The core switch uptime is over 4 years.  I am considering restarting the core switch, but there many devices that work trouble free.  I could cycle the interface with the problem switches.  The problem switches have been power cycled though.
0
 
LVL 57

Assisted Solution

by:giltjr
giltjr earned 40 total points
ID: 21861456
-->  The core switch uptime is over 4 years.  I am considering restarting the core switch, ...

Make sure you have done a write mem and that you have a offline backup.  I would hate to see you loose something with be being 4 years since a restart.   What type of switch is the core switch?

In addition of doing port mirroring and checking where the traffic is getting lost:

What happens if you issue "ipconfig /release" and then "ipconfig /renew" on the problem computer?

What happens if you systematically clears the arp cache starting the core switch and going out to the switches on the switches that are on a different subnet?
0
 

Author Comment

by:Dovermen
ID: 21884066
The core switch is a Catalyst 3550 12G.

Using ipconfig to release and renew the IP address has always resulted in obtaining an IP address.  This presumable is because of the ip helper command.  The DHCP packets are always passed to the DHCP server.  The problem computers will obtain an IP address automatically, but are unable to reach service beyond their subnet.

I used clear arp-cache command on the core switch and the access switches.  Initially after this command the problem computer today worked, but lost connectivity again.  Then several other problem computers lost connectivity.  But toggling the IP resolved all these computers.

I am planning to restart the switch during off hours.  I am under the impression that this will not resolve the problem, but should be done before escalating.

Thank you for your help thus far!
0
 

Author Comment

by:Dovermen
ID: 21985292
The core switch has been restarted.  The problem continues though.  I am planning to update the IOS.  I believe the version currently on the device is not longer supported.  
0
 
LVL 57

Expert Comment

by:giltjr
ID: 21987360
An upgrade to the IOS will not hurt, but I would still suggest mirroring ports and doing some packet captures.    If the IOS upgrade does not resolve the issue, then port mirroring and packet captures will be the only way to see where the issue is.
0
 

Author Comment

by:Dovermen
ID: 22161342
This problem continues.  I have requested help from a technology solutions partner that has helped us in the past.  They have helped setup port mirroring and packet captures.  The captures seem to show that DNS requests have correct responses, but the workstations seem to ignore the response.

We have ordered new access switches to replace each Catalyst 3524-PWR XL because they are end of life equipment.  Once the switches are replaced if the problem continues, I will continue to investigate the problem.  My focus now is replacing theses access switches.

This question can be closed, or I can continue to post information about the problem.
0
 
LVL 57

Expert Comment

by:giltjr
ID: 22166777
Hopefully they can help.

A couple of tid bits.

Typically the timeout for a DSN request is 30 seconds.  So if it take longer than 30 seconds to get a response back, the computer making the response has stopped listening.

Windows based computers will "shotgun" reqeusts, that is if you have 3 DNS server configured on a Windows box, instead of sending the reqeust to #1, wait 30 seconds, send to #2, wait 30, and then send to #3 and wait 30 seconds. WIndows will send to #1, #2, and #3 at the same time.  It will then honor the first response.  So if #1 comes back with "unknown host" before #2 or #3 returns "here is the IP address"  the "unknow host" response is the one the computer will use and thus return, no such host.
0
 

Accepted Solution

by:
Dovermen earned 0 total points
ID: 22488732
We have replaced each switch.  The problem seems to have ended.

Thank you very much for the information and suggestions.
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Suggested Solutions

Data center, now-a-days, is referred as the home of all the advanced technologies. In-fact, most of the businesses are now establishing their entire organizational structure around the IT capabilities.
Meet the world's only “Transparent Cloud™” from Superb Internet Corporation. Now, you can experience firsthand a cloud platform that consistently outperforms Amazon Web Services (AWS), IBM’s Softlayer, and Microsoft’s Azure when it comes to CPU and …
After creating this article (http://www.experts-exchange.com/articles/23699/Setup-Mikrotik-routers-with-OSPF.html), I decided to make a video (no audio) to show you how to configure the routers and run some trace routes and pings between the 7 sites…
After creating this article (http://www.experts-exchange.com/articles/23699/Setup-Mikrotik-routers-with-OSPF.html), I decided to make a video (no audio) to show you how to configure the routers and run some trace routes and pings between the 7 sites…

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now