?
Solved

Network drop-outs on XP workstation to 2003 Server

Posted on 2006-03-24
14
Medium Priority
?
587 Views
Last Modified: 2010-03-18
Hello all,

I have a Windows 2003 server with a shared directory which my client application needs.  I have XP client machines running on the same network subnet and XP 10 machines running on another subnet\VLAN.  All machines are connected through the same Cisco switch.  All the clients ping the 2003 Server every 20 seconds looking for confirmation the network is still up, this is done through a GetFileAttributes API call to the Server’s shared directory where the clients are also placing data.  If a client’s API call is not returned within 10 seconds the client application assumes the network is down and moves into another state.  

Here is the problem…on occasion my client’s API call fails for unknown reasons.  I’ll explain some of my trouble-shooting:

-      All NIC cards\port settings have been synched (100\full)
-      The server does not have TCP stack issues, it returned 25 consecutive 65k pings in <1 ms
-      The server’s integrated NIC card has been replaced with a PCI NIC
-      Server load is low, so is network load
-      All gateways, dns\wins servers, etc… all network settings are fine
-      No trace problems, goes from source->switch->destination.  Have run all the pingpaths, tracerts, iperf commands with  no issues
-      No errors in the switch logs
-      Here is the kicker!  On consecutive ping tests for 100 times at 50k and 3k, about 1 out of every 5 set of pings fails.  And about 50% of those are on the first ping.  The RTT varies quite a bit too…10 in a row are normal, than a return time goes up to 15ms, than normal times, than another 15ms one.  These long times are about 1 out of 10 pings (within the set of 100).  I’ve also never had a normal 32 btye ping fail, only ones with an increased packet size.

I’m not sure if this is the switch trying to read\route things around or what?  Unfortunately I don’t have the luxury of putting a dumb switch in and see what happens.  If anyone has any experience\pointers I’d love to know how it worked out.

Thanks
0
Comment
Question by:thesultanofswine
  • 7
  • 4
  • 2
  • +1
14 Comments
 
LVL 15

Assisted Solution

by:adamdrayer
adamdrayer earned 800 total points
ID: 16286053
I assume you are pinging by IP address and not by name, so I'll rule out name resolution.  Have you tried running the same test with the workstation and server in the same subnet/VLAN?  That would help narrow it down to a switch problem.  It's possible that your switch is a "store-and-forward" switch, and it is running out of bufferspace, but that's extremely remote since it should have ample buffer space for what you are doing.  And I'm sure it would log buffer overflows in the switchlog.  I would run Network Monitor on the server to try and determine if the problem is evident on the server.  Is it failing before or after it reaches the server.  That sort of thing.

Are there any IPS or IDS devices or software on your network that might be detecting what you are doing as some sort of DoS attack? A new feature of some switches (and possibly some OSes) do just that... take a look at this HP ProCurve Switch:

http://www.hp.com/rnd/products/switches/ProCurve_Switch_3500yl-5400zl_Series/features.htm

"ICMP throttling: defeats ICMP denial-of-service attacks by enabling any switch port to automatically throttle ICMP traffic  NEW!"

I know you don't have an dumb switch handy, but many times you can set one port to 'Monitor' and have it monitor all traffic in and out of another port.  This might help.

Also, turn off any IPSEC that is running on the server.  That can slow TCP/IP communication down.

Also disable any NetBEUI or extra protocols not nescessary.

You may also want to try updating windows, device drivers, and switch firmware just to be safe.
0
 
LVL 40

Assisted Solution

by:Fatal_Exception
Fatal_Exception earned 400 total points
ID: 16288308
Morning Adam..  

Although I should not think this a problem with your Cisco Switch, you might try pinging with a different packet size, and see if the responses are any different..

ping IP_Address -f -i Packet_Size

ie:  ping 192.168.1.x -f -i 1500 (MTU ethernet default packet size)

If you receive a message regarding fragmentation, then try lowering the MTU and ping again, until you discover the optimum MTU...  of course, if you do find problems, then you might need a new switch..  

Then again, it is early here, and I might be completely off base!  :)
0
 
LVL 51

Expert Comment

by:Keith Alabaster
ID: 16293093
You mention that a number of your machines are on a different subnet. How are these machines connecting? Are you using VLANs or is there a router in the mix here?

Are you getting the same issue from users on both VLAN's are just one of them?
0
Veeam and MySQL: How to Perform Backup & Recovery

MySQL and the MariaDB variant are among the most used databases in Linux environments, and many critical applications support their data on them. Watch this recorded webinar to find out how Veeam Backup & Replication allows you to get consistent backups of MySQL databases.

 
LVL 51

Expert Comment

by:Keith Alabaster
ID: 16293095
Sorry, I see you are using VLAN's. How are you converging these? Still, how are you connecting these together? Are you getting the same issue from users on both VLAN's are just one of them?
0
 

Author Comment

by:thesultanofswine
ID: 16301152
Thanks for the suggestions...I'll try to answer some of your questions with some detail:

- The ping tests were performed with both name and IP, the error rate was similar.
- I also pinged between different workstations (which eliminated the server), the same error rate occured.
- About the VLANs, the same error rate is occuring between the machines which are in the same subnet and the 10 machines on the VLAN with a different IP address scheme.  There is no router involved, all the clients are hooked up to the Cisco switch and I believe the the switch is doing the routing.  I have to admit I do not know the specifics about the VLAN or how it is converged.  

I'll go try out your above suggestions and get back to everyone with the results.
Thanks again.
0
 
LVL 51

Expert Comment

by:Keith Alabaster
ID: 16302525
Can you tell me which Cisco switch it is? If its a layer 3 switch then fine. If its only a layer 2 then that cannot do the converging and there must be something else in the mix doing the routing. Layer 2 devices cannot route :)
0
 

Author Comment

by:thesultanofswine
ID: 16315946
Keith,

The switch is a Cisco Catalyst 6509.
0
 
LVL 51

Expert Comment

by:Keith Alabaster
ID: 16316364
Wooo. we have four of those; layer 3 it is then lol.

Superb bits of kit. Sup 1A's or using the new 720's?

Sorry, back to the question. How are the subnets/vlans connecting? Boxes directly on switchports or devies at the other end of trunks?
f trunks, what are the access layer boxes at the other ends? If Cisco's, is spanning tree enabled? Could be switching out and taking a few seconds or more to re converge.
0
 

Author Comment

by:thesultanofswine
ID: 16327630
Keith,

This is where my knowledge of the network really drops out.  This network is not at my site and and have no access to the information\setup, besides the basics.  This is also getting over my head in the networking department too, I'll try to bring up the questions and see what I get back, unfortunately some of the people I'm dealing with probably also don't know the specifics to this degree...  I'll try to get back with some answers soon.

I do have one question though, in my experience with trouble-shooting similar issues on my application, I've noticed the big expensive Cisco switches cause some issues.  In situations where we can, we've swapped out the Cisco switches with old dumb Bayview switches and the problem has decreased.  Is there some issue with all the logic\work\routing the smart Cisco switches do which is causing the delay.  Also one more question, is there a way to take that functionality off certain ports on the Cisco switch so frames just pass through?

Thanks
0
 
LVL 51

Accepted Solution

by:
Keith Alabaster earned 800 total points
ID: 16329822
OK., no sweat.

the 6509 will likely have a blade with x number of 10/100/1000Mb ports and/or a blade with gigabit fibre ports on it.

these ports can be set as trunk ports (no ip address) that connect to switch devices so as to extend the fabric and you state which vlans will be allowed over the trunk (uses 802.1q or Cisco's proprietary ISL protocol). Alternatively they can be set as ordinary ports whereby you may have a single server or device plugged directly into the port. We use Cisco 2950's and the older 35xx series access layer switches all on trunk ports but we have no issues (that I am aware of) with timeouts/drop outs.

The spanning tree or per vlan spanning tree (stp or pvst) is simply the process to ensure that only the best route for the traffic to take is left in an operational state. Any second/third routes that your network discovers to get to a device are placed into a hold-down condition. If something fails/topology changes etc, the algorythm kicks in and the new best route is activated and any others placed into hold-down.

When this change is made (or more pertinently, the networks 'thinks' this change has been made, it can take a small time for the new routes to propagate round causing a delay.

0
 

Author Comment

by:thesultanofswine
ID: 16332963
All, thank you for the reponses.  I appreciate your efforts helping me through my issue.  
0
 
LVL 51

Expert Comment

by:Keith Alabaster
ID: 16334753
welcome :)
0
 
LVL 40

Expert Comment

by:Fatal_Exception
ID: 16338393
Keith..  great explanation of the layer3 switching using vlans!  

and of course, a thanks to sos!

FE
0
 
LVL 51

Expert Comment

by:Keith Alabaster
ID: 16339825
:)
0

Featured Post

Vote for the Most Valuable Expert

It’s time to recognize experts that go above and beyond with helpful solutions and engagement on site. Choose from the top experts in the Hall of Fame or on the right rail of your favorite topic page. Look for the blue “Nominate” button on their profile to vote.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Are you one of those front-line IT Service Desk staff fielding calls, replying to emails, all-the-while working to resolve end-user technological nightmares? I am! That's why I have put together this brief overview of tools and techniques I use in o…
This article offers some helpful and general tips for safe browsing and online shopping. It offers simple and manageable procedures that help to ensure the safety of one's personal information and the security of any devices.
Michael from AdRem Software explains how to view the most utilized and worst performing nodes in your network, by accessing the Top Charts view in NetCrunch network monitor (https://www.adremsoft.com/). Top Charts is a view in which you can set seve…
We’ve all felt that sense of false security before—locking down external access to a database or component and feeling like we’ve done all we need to do to secure company data. But that feeling is fleeting. Attacks these days can happen in many w…
Suggested Courses

850 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question