I have a Windows 2003 server with a shared directory which my client application needs. I have XP client machines running on the same network subnet and XP 10 machines running on another subnet\VLAN. All machines are connected through the same Cisco switch. All the clients ping the 2003 Server every 20 seconds looking for confirmation the network is still up, this is done through a GetFileAttributes API call to the Server’s shared directory where the clients are also placing data. If a client’s API call is not returned within 10 seconds the client application assumes the network is down and moves into another state.
Here is the problem…on occasion my client’s API call fails for unknown reasons. I’ll explain some of my trouble-shooting:
- All NIC cards\port settings have been synched (100\full)
- The server does not have TCP stack issues, it returned 25 consecutive 65k pings in <1 ms
- The server’s integrated NIC card has been replaced with a PCI NIC
- Server load is low, so is network load
- All gateways, dns\wins servers, etc… all network settings are fine
- No trace problems, goes from source->switch->destination. Have run all the pingpaths, tracerts, iperf commands with no issues
- No errors in the switch logs
- Here is the kicker! On consecutive ping tests for 100 times at 50k and 3k, about 1 out of every 5 set of pings fails. And about 50% of those are on the first ping. The RTT varies quite a bit too…10 in a row are normal, than a return time goes up to 15ms, than normal times, than another 15ms one. These long times are about 1 out of 10 pings (within the set of 100). I’ve also never had a normal 32 btye ping fail, only ones with an increased packet size.
I’m not sure if this is the switch trying to read\route things around or what? Unfortunately I don’t have the luxury of putting a dumb switch in and see what happens. If anyone has any experience\pointers I’d love to know how it worked out.