Link to home
Start Free TrialLog in
Avatar of marcum
marcum

asked on

what to look for in wireshark when losing network connection

I have a network running on a watchguard VPN firewall (Servering our internal DHCP) with 2 24 port linksys managed switches. I have several remote locations connecting in to the watchguard VPN Firewall. I have 3 centrally located servers, 1 is a Windows Domain server, 1 ScoUnix server and 1 Linux Server. I'm running Wyse60 Terminal emulation on windows xp machines as well as Neoware Dumb terminals to connect in to the ScoUnix servers.

So my problem is as follows.
At this time we seem to be intermitantly losing connection to the ScoUnix server on all machine, it seems the Sco box is intermittently dropping off the network only for a brief moment, on the internal network as well as the external network.
For example, if I connect to the server with PowerTerm (winxp terminal emulation app) on monday I may have no problems till Wednesday morning, on wednesday morning I will run Powerterm and it will not be able to access the server on the first attempt. However if I open a secondary window it will access the server with no problem.
Similarly, on the Neoware dumb terminal, if I can not connect I will have to reboot to get a connection.

I checked  the logs on both the switch and they have been up for 20 days. I also checked the Sco logs and there was nothing special, only the link went up and down when we replaced the 2 switchs 20 days ago.

Finally I have installed an Ubuntu Linux box as a test machine with Cacti and Wireshark, I mirrored the port on the switch that the sco box is plugged into and I'm using wireshark to sniff the packets and cacti to graph the usagage through the mirrored port on my test box.

So my questions are.
1. WHat should I look for in wireshark to determine the problem.
2. What else can I do to find out what the problem is.
ASKER CERTIFIED SOLUTION
Avatar of Jan Bacher
Jan Bacher
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
(1) It has nothing to do with DHCP leases?
(2) There are no power management settings active in the SCO server?  (Look in the BIOS)
Avatar of marcum
marcum

ASKER

Jesper: I don't see any errors on the switch.

Moor: Ok, that is good to know, it doesn't help me to with searching through wireshark but it helps to eliminate some possibilities.

Another question, if the nic card on the Sco box is bad how can I tell and how can I monitor the CPU Usage on the ScoBox
You can eliminate DHCP as being the problem by

(1)  Trying fixed IP's on devices

(2)  Using a different method of issuing DHCP.

(3)  Use different Lease time, use a short lease in conjunction with wireshark filtering

Use filters in Wireshark to home in on DHCP packets.  I believe (quick rummage on Google) that DHCP uses UDP Ports 67 and 68.  So setup a filter using that criteria.  

If you suspect the NIC is missing packets put another NIC in the SCO box (easier said than done in Unix, I know that much).  That reminds me: there was a bug with NIC's with Realtek chipsets missing packets some years ago: this was I emphasise some years ago - and it was the Windows drivers that needed to be tweaked, not relevant with Unix, but it might spark some research.
Avatar of marcum

ASKER

As an update, I analyzed the wireshark logs and basically what I'm seeing is the server is sending requests for missing packets, it's showing them out of sequence. However I can see the client sending the missing packets through the mirrored port to the server. So there is an issue some where on the server side. Now I have to figure out if the cable is bad (which I'm replacing), if the Nic is bad or if the server is being overloaded. I don't think it's being overloaded because according to Cacti I'm only showing peak traffic at 2Mbs and typical traffic is about 50Kbs. Right now the port is syncing up at a 1Gb, maybe I should force it to 100Mb.

So after all this, I'm going to say I should replace the cable and then the Nic
I'll know more on monday.
I don't know how Cacti is monitoring performance, usually with these kind of things there is the "participant observer" effect to consider.  Cacti is governed by similar constraints to the problem being monitored, and therefore may be affected in a similar way.  The chronological order in which you are seeing "snooped" traffic is not necessarily gospel if it is coming from different sources.  However if the server is requesting missing packets then I would have thought this is a clear indicator of the nature of the problem.  

Re performance indicator: if you were to draw a graph of activity the mean of that graph might trundle along quite nicely at a reasonable value, but if that activity happens to occur all at the same time then  there may be a problem.  It is the times between the packets emanating from the same source that is important.  Slugging the system to 100Mb/s sounds like a grand idea, it might be a handshake/overflow problem.