Link to home
Start Free TrialLog in
Avatar of GuyOwen
GuyOwen

asked on

Network Users Lose Internet Connectivity Randomly

I'm not an Expert. Take it easy on me! I'm flying by the seat of my pants here, but we do have a Tech person I can pass this along to. He has tried everything he can think of for the past three months with no results.

We have the following conditions.
Problem: Internet connection is lost at random times throughout the day. Internal network connections remain alive on everyone, but access to the Internet is lost for about ten minutes -- but only certain Users -- and this will occur throughout the day, off and on. Every computer can still access all network drives during the outages.

Server is Windows 2003.
Internet connection is DSL provided by XO Communications.
DSL Modem connects to Firewall box.
Firewall is connected to one of three 3Com Baseline Switches - not managed.
There are about 25 Users out of 40 that have Internet access.
There is a mix of computers running XP Pro, NT, Win98 or Win2K.

Symptoms are...
You always have access to the Network, itself.
Periodically, groups of Internet-access people will not be able to reach the Internet.
Lasts only about 10 minutes during each outage.

We have tried...
XO Communications could not find any problems after monitoring the line for two weeks.
Outages occurred during that time but nothing shows up on their equipment.
We then isolated to one computer connected to the DSL Router, bypassing Firewall and the Network -- connects every time during an outage. Therefore, XO claims it is not them.
Reconnect Router and those who were experiencing an outage still have it until it ends.
We've tried turning off NetBEUI (leftover from an old installation of the original NT Server).
We've replaced all three Switches.
We've replaced the Firewall.
We've replaced the Router.
We've disconnected all Printers during an outage -- no effect.
We've scanned with PestPatrol, SpySweeper, etc. to clear out all Spyware, Malware, etc.
We have Norton Antivirus Corporate Edition feeding down from the Server to all machines.
We've run 3Com and other packet sniffers looking for activity -- all we've found are a large number of Port 1900 activity indicators, but nothing in the way of files being transferred or sent or received.

Only odd behaviour that seems to be happening is all Ports on the back of the Switches are blinking on and off at the exact same time. That seemed to be happening only during the outages when the original Switches were installed. But now it occurs almost all the time on the new Switches.

I saw a similar posting here referring to NAT configuration problems restricting the number of connections.
Anybody think that might be the problem here?
Any other ideas?

Outage seems to move around. It does not appear to be the same people all the time, but the outage seems to last the same period of time. It does not affect only one type of Operating System -- people with XP could be "disconnected" just as easily as those running NT or anything else.

My questions to our Tech guy is...
a) Why can't we monitor all connections and all activity throughout the day and get a report?
b) Can't we setup some kind of constant ping situation so we can see a report of which computer loses the Internet, and when?

Ideas have been that it could be a bad NIC somewhere within the Network -- but why would that cause random blocking and not affect the Network connections, only the Internet?
ASKER CERTIFIED SOLUTION
Avatar of Les Moore
Les Moore
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of GuyOwen
GuyOwen

ASKER

Okay, I split the points for the two who bothered to Answer -- and I think you for the effort!

What this ended up being was a second Router which was located in a back office, into which had been plugged two printers. The reason this was setup this way -- or so I was told -- is because we bought a new color laser for the workgroup back in May of 2004. There was no additional LAN Port nearby, so the installer decided to simply add a small Router to use as a Hub. The "solution" to his problem was to grab the closest one he could find -- which happened to be an unused one laying in the Server Room. Only problem was -- this was the old $40 Linksys Router we once used to connect our LAN to the Internet last year.

The old $40 Linksys Router was set with an IP address and was eventually replaced with a newer, industrial-strength Router.
Both were set to the same IP Address.
Once the used one was added back into the Network, it started creating collisions and conflicts with the new one. This occurred, seemingly, only when jobs were sent to the two printers that were also connected to that Router. And it was somehow intermittent -- some days, not much of a problem -- other days, it happened frequently.

How did we discover it?
I eventually got so frustrated that I asked the ISP to reset the flash memory on their new DSL Modem while our outside Tech Guy worked with them, thinking it must be a problem with the ISP's equipment. As soon as we started doing this and tried to reinstall the used IP Address, it reported errors or conflicts that would not allow it to be reestablished on the Network. The Tech Guy we hired couldn't understand what the problem was, other than to say "It's reporting a computer or device on the Network called "NR-401" as having the same address as the new Router, so I changed the new Router Address to something completely different. It seems to work, now. But we need to find out what device is named NR-401 -- I don't know what that could be."

So in my uneducated naivete, I jumped onto Google and typed in "NR-401" -- and up jumped the devil -- the Linksys 4-Port Router by that same name. Then I remembered seeing it used as a connection point for the two printers. I then went to that area of the shop and removed it. Everything is running as fast as it is supposed to, now.

Moral of this story...
a) Don't leave components laying around.
b) Watch out when more than one Tech Guy works in the building on different aspects of the Network.
c) Google is a goddamned Godsend!! I love those guys!

The only questions that still linger, for me, are...
1) Why did it take from May to July before these conflicts really started becoming noticeable?
2) How come the guys we pay $90 to $120 an hour couldn't find it immediately once we reported the symptoms?
3) Isn't there some simple test they could have run that would report back all of the components and IP Addresses associated with them -- and wouldn't two units have popped up with the same address?

Thanks everybody. I hope this story helps someone else someday.

Guy Owen
Washington DC
Avatar of GuyOwen

ASKER

A few more things...
1) I did not mean to belittle the answers given in any way. I was giving the best details I had at the time, so the question may have been a little unfair. Both were good responses, but if one had said "You may have two devices on the Network sharing the same IP Address..." -- THAT would have been spot-on.

2) In my explanation of our attempts to resolve this, you will notice I said we turned off the printers involved. THAT was very near the solution to the problem, but the one thing that would have immediately given us a clue was if we had thought to DISCONNECT that Router. We turned off the printers, true, but the Router was still connected and turned ON. And, apparently, that's why we saw no real change simply because we turned off the printers.

3) The more I think about this, it is such a simplistic issue that it really boggles the mind that so much time and effort passed before we isolated the problem. In fact, if I had not taken a final stance and called XO Communications on my own to ask them to please reflash their DSL Modem, I doubt it would ever have been discovered. It may not be completely fair to my Tech Guy because it was another installer who grabbed the problematic Router, but it frustrates me that a simple Network Report of some kind could not have discovered this -- or, more to the point, that he did not resolve it that way.

4) All of the fears that jumped to the forefront prove, once again, that my theory may be true: Most every "problem" with computers is User Error. It was NOT, after all, an outside attack, not a virus, not a bad NIC, not a problem with Server 2003, etc. It was simply a dumb decision.

Guy
Thanks for the update, Guy!
Glad you finally found it. That was a bugger to track down.
Avatar of GuyOwen

ASKER

Yes, I agree. The one saving moment was when I decided to have the ISP reflash their modem. The reflashing did nothing toward solving the issue, but the mere act of doing it made our outside Tech Guy notice a problem when they couldn't add it back into the Network since therer was an IP Address conflict. The other small Router did not give any notice of any kind. But it sure made sense once we found out the problem. The only reason I asked them to reflash the Modem was because XO suddenly decided they were going to charge us $300 to replace the existing one. I almost agreed to pay it (don't ask me why).