Solved

Intermittent ping problem

Posted on 2003-11-05
8
1,397 Views
Last Modified: 2008-02-26
Heres the situation, to test a theory I made a simple batch file that ran ping about 50 times per minute.  I let this file run overnight and log to a text file along with a time stamp (provided by time /t).  This .bat file was run from a client machine (A) to a database server machine (D).

D= HP ProLiant running SQL 2000 SP3, and win2000 Sp3.  It has dual Intel Nics running in a team using Hp's Teaming driver.  

A= HP d325 (Athlon XP/512MB DDR/integrated 3Com 3c920B), Win2000 Sp3

At various times during the night the ping comes back with "Request timed out."  Normally I wouldn't be so "anal" about this because it seems to occur in about 0.1% of the lines in the log, however machine A runs a program that communicates directly to the DB server and is VERY sensitive to any disconnect or downtime.  One group of "Request timed out" was about 8 in a row (translating to about 10 seconds).  During this time the program running on A crashed out.

Topography:  Both machines (all 3 Nics) are plugged into the same Switch (Advance Stack 800T 8 port modular switch Model#: J3245A)

Can anyone think of any reasons why this could be happening?  Changing the program on A is not an option.  Trust me, I wish it were!

-Scott
0
Comment
Question by:Scott_V
8 Comments
 
LVL 4

Accepted Solution

by:
P1isken earned 50 total points
ID: 9690696
With it being so intermittent, there is no exact point of failure that we are able to identify at this time... My suggestions to you would be to possible switch from the dual NIC solution to a single nic solution because there may be a issue in the software that causes that... Also, other things to look at would be for the consistency in the time of the failures... Can you post the time failures... Additionally if those time failures coincide with any other network level event, for example an IP lease renewal with the DHCP.. Take your DHCP lease time and cut it in half for the first renewal.

At a command prompt type "ipconfig /all" removing the quotes of course...

This will give you the time the lease was obtained and when it will expire if you need to use that to figure the lease time. That has been know to cause issues with VPN and Home DSL/Cable Routers...

Other then the above, all you can do is look for a coralation with something else.. My reasoning for this is because you say it is at night when typical network maintanance occurs.

Good Luck...
0
 
LVL 1

Author Comment

by:Scott_V
ID: 9690809
I probably should have mentioned this before, but.  The computer "D" is a windows 2000 server with a static IP address...  The client machine, however does get DHCP'd, and the last lease was obtained Tuesday November 4th at 3:54:35 PM...

I am making the log available for download at the following site...

http://www.brennertank.com/pdf/pinglog.zip


Examine it at your leasure.  It was taken beginning on Tuesday the 4th at 2:28pm and ending some time around 9:18 am on the 5th...

-Scott
0
 
LVL 1

Assisted Solution

by:mdecroos
mdecroos earned 25 total points
ID: 9694190
I would not use ping because a ping-reply is dropped as sson as there is too much traffic on the wire ...
So install a "network monitor" tool in Promiscuous mode to see what really happens...
Maybe it is just another heavy network load that is hanging your application (just like the ping)
0
 
LVL 1

Expert Comment

by:mdecroos
ID: 9694193
I would not use ping because a ping-reply is dropped as sson as there is too much traffic on the wire ...
So install a "network monitor" tool in Promiscuous mode to see what really happens...
Maybe it is just another heavy network load that is hanging your application (just like the ping)
Put some NICs in 10Mbit/half duplex is maybe a good way to slow down some things to find the error.
Marc
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 1

Author Comment

by:Scott_V
ID: 9694474
Any suggested tools?  The switch is modular and by HP, but its not managed...  The topography consists mainly of switches.  The reason I'm asking is I was using some packet-sniffers to monitor internet traffic, but these will probably not work too well on a switch, especially a non-managed one.  Also, don't forget (and maybe I didn't say this before).  The computers A and D shared the same switch.  They are connected as folows...

A -------100Mb------[Switch]======200Mb====== D

so in theory it'd be impossible for D to be overloaded, and at the time of day its occuring, A should have almost no load what so ever.  Being a switch, these computer's load should be all that matters as non-broadcasted packets should only go from A to D and not to X or Y.  (correct me if I'm wrong)

-Scott
0
 
LVL 16

Assisted Solution

by:SteveJ
SteveJ earned 25 total points
ID: 9743275
It's a good academic question . . . and you are the one who used the term "anal" . . . but without knowing what's happening on the switch and both machine interfaces at the exact time of the ping loss makes diagnosing this kind of problem practically impossible. I think I'd probably set up a batch file on "D" doing the same thing that "A" is doing and see if the loss rate is same for both . . . same time, etc. If it is, you can be fairly certain it's a network issue. If not, then you may be looking at some sort of goofy anomolous condition that you'll never find.

Good luck,
Steve
0
 
LVL 1

Author Comment

by:Scott_V
ID: 9748627
Think I figured it out (its been running for a week and a half with no "Request timed out" messages).

It was related to our Dual-ported NIC configuration.  Basically, we have a NIC from HP that has dual ports, and is designed for load balancing (which allows the up-to-200Mbit/sec transfers).  The drivers and such are all up-to-date, but it was not that.

It was the way the card initiated the balancing of the load.  Originally we had the card configured in a "split all traffic" manner (so if 150Mbit/sec of traffic came in, 75Mbit/sec would go to port1 and the other 75 would goto the other port all the time).  The new configuration, that seemed to solve the problem, was a trigger-based load balance, eg. when Port 1 hits XX% usage, then port 2 takes traffic above that....

And SteveJ, man, I was so sure that I WOULD never figure out what the heck was going on.  Luckily I did because it saved the company a considerable amount of wasted "we-need-new-software-to-replace-the-picky-software" time and effort.  (Which, as you know, translates directly into lost $$.  ;)

Thanks for all your efforts.  Anyone who answered will get 1/x of the points.

-Scott
0
 
LVL 1

Author Comment

by:Scott_V
ID: 9748649
After re-reading the posts, I decided to give the majority of the points to  P1isken, but split the remaining points up evenly.

Thanks again,
Scott
0

Featured Post

Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

Join & Write a Comment

Even if you have implemented a Mobile Device Management solution company wide, it is a good idea to make sure you are taking into account all of the major risks to your electronic protected health information (ePHI).
When it comes to security, there are always trade-offs between security and convenience/ease of administration. This article examines some of the main pros and cons of using key authentication vs password authentication for hosting an SFTP server.
After creating this article (http://www.experts-exchange.com/articles/23699/Setup-Mikrotik-routers-with-OSPF.html), I decided to make a video (no audio) to show you how to configure the routers and run some trace routes and pings between the 7 sites…
This video gives you a great overview about bandwidth monitoring with SNMP and WMI with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're looking for how to monitor bandwidth using netflow or packet s…

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now