Link to home
Start Free TrialLog in
Avatar of Scott_V
Scott_V

asked on

Intermittent ping problem

Heres the situation, to test a theory I made a simple batch file that ran ping about 50 times per minute.  I let this file run overnight and log to a text file along with a time stamp (provided by time /t).  This .bat file was run from a client machine (A) to a database server machine (D).

D= HP ProLiant running SQL 2000 SP3, and win2000 Sp3.  It has dual Intel Nics running in a team using Hp's Teaming driver.  

A= HP d325 (Athlon XP/512MB DDR/integrated 3Com 3c920B), Win2000 Sp3

At various times during the night the ping comes back with "Request timed out."  Normally I wouldn't be so "anal" about this because it seems to occur in about 0.1% of the lines in the log, however machine A runs a program that communicates directly to the DB server and is VERY sensitive to any disconnect or downtime.  One group of "Request timed out" was about 8 in a row (translating to about 10 seconds).  During this time the program running on A crashed out.

Topography:  Both machines (all 3 Nics) are plugged into the same Switch (Advance Stack 800T 8 port modular switch Model#: J3245A)

Can anyone think of any reasons why this could be happening?  Changing the program on A is not an option.  Trust me, I wish it were!

-Scott
ASKER CERTIFIED SOLUTION
Avatar of P1isken
P1isken
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Scott_V
Scott_V

ASKER

I probably should have mentioned this before, but.  The computer "D" is a windows 2000 server with a static IP address...  The client machine, however does get DHCP'd, and the last lease was obtained Tuesday November 4th at 3:54:35 PM...

I am making the log available for download at the following site...

http://www.brennertank.com/pdf/pinglog.zip


Examine it at your leasure.  It was taken beginning on Tuesday the 4th at 2:28pm and ending some time around 9:18 am on the 5th...

-Scott
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I would not use ping because a ping-reply is dropped as sson as there is too much traffic on the wire ...
So install a "network monitor" tool in Promiscuous mode to see what really happens...
Maybe it is just another heavy network load that is hanging your application (just like the ping)
Put some NICs in 10Mbit/half duplex is maybe a good way to slow down some things to find the error.
Marc
Avatar of Scott_V

ASKER

Any suggested tools?  The switch is modular and by HP, but its not managed...  The topography consists mainly of switches.  The reason I'm asking is I was using some packet-sniffers to monitor internet traffic, but these will probably not work too well on a switch, especially a non-managed one.  Also, don't forget (and maybe I didn't say this before).  The computers A and D shared the same switch.  They are connected as folows...

A -------100Mb------[Switch]======200Mb====== D

so in theory it'd be impossible for D to be overloaded, and at the time of day its occuring, A should have almost no load what so ever.  Being a switch, these computer's load should be all that matters as non-broadcasted packets should only go from A to D and not to X or Y.  (correct me if I'm wrong)

-Scott
SOLUTION
Avatar of Steve Jennings
Steve Jennings

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Scott_V

ASKER

Think I figured it out (its been running for a week and a half with no "Request timed out" messages).

It was related to our Dual-ported NIC configuration.  Basically, we have a NIC from HP that has dual ports, and is designed for load balancing (which allows the up-to-200Mbit/sec transfers).  The drivers and such are all up-to-date, but it was not that.

It was the way the card initiated the balancing of the load.  Originally we had the card configured in a "split all traffic" manner (so if 150Mbit/sec of traffic came in, 75Mbit/sec would go to port1 and the other 75 would goto the other port all the time).  The new configuration, that seemed to solve the problem, was a trigger-based load balance, eg. when Port 1 hits XX% usage, then port 2 takes traffic above that....

And SteveJ, man, I was so sure that I WOULD never figure out what the heck was going on.  Luckily I did because it saved the company a considerable amount of wasted "we-need-new-software-to-replace-the-picky-software" time and effort.  (Which, as you know, translates directly into lost $$.  ;)

Thanks for all your efforts.  Anyone who answered will get 1/x of the points.

-Scott
Avatar of Scott_V

ASKER

After re-reading the posts, I decided to give the majority of the points to  P1isken, but split the remaining points up evenly.

Thanks again,
Scott