Sporadic LAN Disconnects

I've got a network with 8 computers hooked into a hub.  Recently I have been getting alot of client disconnects.  Whats really strange is I have a file server that also shares it's DSL line via ICS.  On many of the computers I will be disconnected from the shared drive on the file server but will still be able to access the internet.  The only thing I have found that helps is to switch hubs.  Luckly I have a spare.  When I switch it out, it will function for about eight hours and then I will begin getting disconnects again.  I have rotated a total of three 8-port hubs and each time it works for a short time and then stops.  Any ideas or questions would be greatly appriciated.
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Power spikes?

Does turning the hub off and then on again help you?
egorzikAuthor Commented:
No.  I've tried restarting the hub and that seems to have no effect.  It does run from a surge protector as well.
Is something flooding the network, overloading the hub?  You could put a packet sniffer on the network to see if any unusual traffic exists.  You could use a switch in place of the hub to see if that helps...
Powerful Yet Easy-to-Use Network Monitoring

Identify excessive bandwidth utilization or unexpected application traffic with SolarWinds Bandwidth Analyzer Pack.

Can you ever put the same hub back in??
>On many of the computers I will be disconnected from the shared drive
>on the file server but will still be able to access the internet.

>I've tried restarting the hub and that seems to have no effect.

I'd go for the sniffer: www.ethereal.com (free)

btw: A lot of "hubs" today really are switches...
If you're only seeing traffic from the station you install it on, try your other hubs.

Are you running up to date virus software on all these machines?
Checked for Welchia / Blaster, etc?

Checked for spy/adware?

All patched up?

Checked the port speed and duplex on your server to make sure it matches your switch/hub?
egorzikAuthor Commented:
I've scaned all worksatations for a virus and nothing.  No spy/adware that I am aware of.  I keep all machines up to date patch wise.  I'll try a sniffer, I havent gone that route yet.

Yes, I can put the one of the other hubs back.  Is it possible for a hub to over heat?  It would be strange if that was what was causing it as, in my opinion, we have very little traffic for an 8-port (ie. very rarely is everyone on the computer accessing things from the file server/internet).

We did have Blaster32 a month or so back but I cleaned all the infected computers and patched them.  Actually we didnt have this problem even when we had the Blaster virus.  The only thing I had noticed with the Blaster virus is internet speeds were slow (we were on dialup then) and our outgoing bps was much higher than out incomming bps (which ofcourse is unusuall unless you share files over the internet).
OK.  So you can't just reboot the hub, but you can put it back after a period of time.  That sure suggests a physical problem to me.  Something about the envrionment is whacking those hubs for a short period of time.  Heat wouldn't be out of the question, but it's a damn strange problem, that's for sure.

I'd try moving the who kit and kaboodle somewhere else if you could.  Eliminate as much of the physical side as you can.  Simply replaceing a hub with another one should not fix this sort of a problem.  (You might also try turning off the offending hub for a few minutes and then restarting it.  If the reboot of the hub is too quick and something nasty is going on on your network that might account for the "switch it in laeter problem..."
egorzikAuthor Commented:
Alittle information on the environment.  I have it located in a spare office.  This building was built about two years ago from the ground up and all the network and phone lines were installed by a local Phone/Internet/cable TV company (Fidelity Communications).  There a big company that covers half or more of Missouri.  One issue I have wondered about is the server is located about four feet from our phone system.  I have heard instances of the electromagnetic fields from such devices screwing up network lines (especially if there is alot of network activity).  Again, I'm just throwing this out there.  With a phone/internet company doing the install I would assume this is all safe but what do you guys think.

I appriciate the comments so far.  Keep 'um comming!
egorzikAuthor Commented:
I do have one more thing to add.  I did have the UPS battery go out not to long ago.  I can't say for sure if the problems started when this occured but I thought that I would mention it.  I have a replacement battery on order.  As I mentioned before I do have it on a standard surge protector in the mean time (computer, hub, the whole deal).

Along with the previous post, moving the system presents somewhat of a problem as the building wire install keeps the hub tethered to the phone system area.  I could move the file server and leave the hub located by the phone system if you think this may be worth a shot.
Unless there is an electrical transformer or some other heavy duty equipment in that closet, PBX's generally don't generate interference as they are as susceptible as any computer. If you suspect EMI or RFI your exposed cable is going to be the problem more so than the machine itself. As you experienced this problem with the worm active, the sniffer is where I'd start.
egorzikAuthor Commented:
I've got the sniffer running on the file server and one client.  Being computer savy but by far not an expert (i.e. I've built a few computers and can solve most hardware/network issues), what am I looking for with the sniffer?
egorzikAuthor Commented:
OK, the network is now acting up after being connected successfully for around six hours.  I saved the Port Sniffer File (it's around 180 meg in size I think).  Now what exactly am I looking for in the file.  Everyone on the network is still able to access the DSL connection shared on the server but I cant get into the shared files.  I'm also able to print to a shared printer.  This time I tried to access the shared folders on the other clients from the file server and still, nothing.  I'm getting traffic as indicated on the hub by the LED when tring to access the network but no traffic when not trying to access it (does this indicate that it is NOT a traffic overflow, or can an overflow happen and then lock the hub regardless of current traffic?).

At this point I am going to leave things as they are so if you guys want me to try anything while I'm having the problem, I can.
At this point, I usually start comparing what is going on now with what is going on when things are working properly.  How many packets?  What type of packets?  Who is the big network hog?  Who is he talking to?  What happens when you try to connect to a share?

If you can compare things between when they are good and when they are bad, you can usually figure out who is responsible, even if you don't yet know why.
egorzikAuthor Commented:
Thanks Robing66066,  I'll start in on it and post my results.
the thing about ethereal vs a commercial sniffer is the analysis...
Sniffer Pro et al have neat TOP TEN BANDWIDTH graphs and traffic pattern pie charts...

You're looking for LOTS of traffic from one MAC address
egorzikAuthor Commented:
Ok, I have started coparing sniffer files.  While I was doing that I unplugged the hubs power lead and left it sit for 10 minutes.  When I plugged the power lead back in it worked.  So, correct me if I am wrong, it sounds to me like I have one of the following problems:

1) The hub is overheating (this option is doubtful).

2) The hub is being flooded.

3) The hub is experiencing power spikes.

Does this sound right?
I wouldn't narrow it down quite so far yet.  I'd go here:

1.  Hub has a physical problem

2.  Hub is being flooded.
egorzikAuthor Commented:
Two questions come to mind...

Can flooding a hub cause loss of some network traffic but not all (i.e. would that jive with my shared folders being unacceptable but sill having full access to the internet)?

Could a hub be effected by power spikes even while being run through a surge protector?
egorzikAuthor Commented:
I would think we can toss out the physical problem as I have rotated three hubs and the problem occurs on each.  Do you agree?
I agree, its definitely not a physical problem with the hub itself as you have used multiple devices.  Yes, some traffic will still get through when a hub is flooded/overrun.  Analyze the sniffer results to look for any unusual and heavy traffic.  You can't get your hands on a managed switch can you?  You'd be able to better monitor the network.
egorzikAuthor Commented:
Great... Looks like I have some work to do with the packet sniffer.
I was thinking a physical problem from the perspective of an envionmental problem that is causing a physical problem with whatever hub is put there, but I agree, the odds are pretty low.
egorzikAuthor Commented:
One thing that stands out is I see alot of "LANMAN WPrintQGetInfo Request" any idea that that might be?  I'm seeing this in the log of the client.
What does the ip address in that mean to you?
Is this a broadcast?
can you post the whole line

egorzikAuthor Commented:
I didn't save the log from that session, sorry.  I rescanned and now I'm not getting it.  I'll start saving all my logs from now on.

I don't know if this helps but Im getting this error on the file server running Win2K.  I cut this from the event log (should have checked that on Friday):

Event Type:      Error
Event Source:      ipnathlp
Event Category:      None
Event ID:      32003
Date:            12/4/2003
Time:            2:43:52 PM
User:            N/A
Computer:      SERVER
The Network Address Translator (NAT) was unable to request an operation of the kernel-mode translation module. This may indicate misconfiguration, insufficient resources, or an internal error. The data is the error code.
0000: 1f 00 00 00               ....    

Don't know if that will help...  I'm getting this from the server when I try to access another computer on the network while the hub is 'locked up'.
egorzikAuthor Commented:
I think I have it fixed and I think it was saturation of the hub as was stated earlier.  I installed a router and disabled ICS.  Seems to have corrected the problem as it has not 'locked up' for the past five days.  Thanks for all your help guys!
PAQed, with points refunded (125)

Community Support Moderator

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.