asked on

Network drop-outs on XP workstation to 2003 Server

Hello all,

I have a Windows 2003 server with a shared directory which my client application needs. I have XP client machines running on the same network subnet and XP 10 machines running on another subnet\VLAN. All machines are connected through the same Cisco switch. All the clients ping the 2003 Server every 20 seconds looking for confirmation the network is still up, this is done through a GetFileAttributes API call to the Server’s shared directory where the clients are also placing data. If a client’s API call is not returned within 10 seconds the client application assumes the network is down and moves into another state.

Here is the problem…on occasion my client’s API call fails for unknown reasons. I’ll explain some of my trouble-shooting:

-      All NIC cards\port settings have been synched (100\full)
-      The server does not have TCP stack issues, it returned 25 consecutive 65k pings in <1 ms
-      The server’s integrated NIC card has been replaced with a PCI NIC
-      Server load is low, so is network load
-      All gateways, dns\wins servers, etc… all network settings are fine
-      No trace problems, goes from source->switch->destination. Have run all the pingpaths, tracerts, iperf commands with no issues
-      No errors in the switch logs
-      Here is the kicker! On consecutive ping tests for 100 times at 50k and 3k, about 1 out of every 5 set of pings fails. And about 50% of those are on the first ping. The RTT varies quite a bit too…10 in a row are normal, than a return time goes up to 15ms, than normal times, than another 15ms one. These long times are about 1 out of 10 pings (within the set of 100). I’ve also never had a normal 32 btye ping fail, only ones with an increased packet size.

I’m not sure if this is the switch trying to read\route things around or what? Unfortunately I don’t have the luxury of putting a dumb switch in and see what happens. If anyone has any experience\pointers I’d love to know how it worked out.

Thanks

SOLUTION

adamdrayer

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

SOLUTION

Fatal_Exception

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Keith Alabaster

You mention that a number of your machines are on a different subnet. How are these machines connecting? Are you using VLANs or is there a router in the mix here?

Are you getting the same issue from users on both VLAN's are just one of them?

Keith Alabaster

Sorry, I see you are using VLAN's. How are you converging these? Still, how are you connecting these together? Are you getting the same issue from users on both VLAN's are just one of them?

thesultanofswine

ASKER

Thanks for the suggestions...I'll try to answer some of your questions with some detail:

- The ping tests were performed with both name and IP, the error rate was similar.
- I also pinged between different workstations (which eliminated the server), the same error rate occured.
- About the VLANs, the same error rate is occuring between the machines which are in the same subnet and the 10 machines on the VLAN with a different IP address scheme. There is no router involved, all the clients are hooked up to the Cisco switch and I believe the the switch is doing the routing. I have to admit I do not know the specifics about the VLAN or how it is converged.

I'll go try out your above suggestions and get back to everyone with the results.
Thanks again.

Keith Alabaster

Can you tell me which Cisco switch it is? If its a layer 3 switch then fine. If its only a layer 2 then that cannot do the converging and there must be something else in the mix doing the routing. Layer 2 devices cannot route :)

thesultanofswine

ASKER

Keith,

The switch is a Cisco Catalyst 6509.

Keith Alabaster

Wooo. we have four of those; layer 3 it is then lol.

Superb bits of kit. Sup 1A's or using the new 720's?

Sorry, back to the question. How are the subnets/vlans connecting? Boxes directly on switchports or devies at the other end of trunks?
f trunks, what are the access layer boxes at the other ends? If Cisco's, is spanning tree enabled? Could be switching out and taking a few seconds or more to re converge.

thesultanofswine

ASKER

Keith,

This is where my knowledge of the network really drops out. This network is not at my site and and have no access to the information\setup, besides the basics. This is also getting over my head in the networking department too, I'll try to bring up the questions and see what I get back, unfortunately some of the people I'm dealing with probably also don't know the specifics to this degree... I'll try to get back with some answers soon.

I do have one question though, in my experience with trouble-shooting similar issues on my application, I've noticed the big expensive Cisco switches cause some issues. In situations where we can, we've swapped out the Cisco switches with old dumb Bayview switches and the problem has decreased. Is there some issue with all the logic\work\routing the smart Cisco switches do which is causing the delay. Also one more question, is there a way to take that functionality off certain ports on the Cisco switch so frames just pass through?

Thanks

ASKER CERTIFIED SOLUTION

Keith Alabaster

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

thesultanofswine

ASKER

All, thank you for the reponses. I appreciate your efforts helping me through my issue.

Keith Alabaster

welcome :)

Fatal_Exception

Keith.. great explanation of the layer3 switching using vlans!

and of course, a thanks to sos!

FE

Keith Alabaster