Link to home
Start Free TrialLog in
Avatar of q
qFlag for United States of America

asked on

Internet connections interrupted

Hello all,

This is long, please bear with me.

For the past week we have been having severe latency and interruptions to our internet connectivity.  Messages from R-U-0n and MxWatch show that periodically our resources are unreachable online, the timing varies.  We have an MPLS network and all sites are affected, though it seems that the main site is affected more.   I have removed our own firewall from the mix.  That did not solve the problem.  The ISP has their own FrontLine firewall.

These interruptions affect computers in any switch.  Switch diags show no errors or packet loss.WireShark reports 0 errors. Resources inside the LAN are not affected, all servers and printers are available as usual while the Internet interruptions occur.  

Current example:  A continuous ping to the natted address from outside to Server A times out (5 times) and then comes back for one reply, then repeats, in that pattern, 5 off, one on, with little variation.  Continuous ping to Server A from Server B from inside the LAN replys in <1ms with no timeouts at all. Both servers are in the same gigabit switch. Could this still happen if the switch, a port or the cable were bad?   I was able to remote to the Server B at the same time i was unable to remote to Server B.  After about 20 minutes I got disconnected from Server B, then after a few minutes it was available again.  I am able to remote to the Server A from my Server B session inside the LAN with no issues.

Our ISP says there are no errors on the circuit, however Verizon came to fix what they said was a circuit problem on 7/31 (they did not touch the data room, only the demarc), and it was right after they left that the constant monitoring messages began.  

Any ideas on what could be causing the problem would be appreciated.
ASKER CERTIFIED SOLUTION
Avatar of Exchange_Geek
Exchange_Geek
Flag of India image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of q

ASKER

Thanks Exchange_Geek,

It's happening now, i just put in a ticket, will call momentarily.

tracerts take 22 hops to a server now answering pings well, but take 24 hops to a server currently doing the peridioc ping timeouts, with 2 tracert timeouts after it gets into the ISP network, just before it gets to the server outside address.

firewall logs, when the firewall was running, showed the only drops to be from forbiddent sites, and rbl email servers.  no other dropped connections.

switches diag ok, but could this still ,possibly be a switch issue,  something subtle????  a cable issue?  I don't think so, but I would hate to be wrong on that.  i just simultaneoulsy pinged 5 natted boxes on different switches they all timed out together about 30 times, then all responded at once.  That says it's not the switch to me.  Sounds like the outside router.

no virtual server, and the server event logs show no drops either, except for when i rebooted the switches.

yes, i have requested that Verizon explain what they did.

Regards,
Quay
check if you can restart the switch - i know this isn't possible during production hours, however if it'll be possible - you could always tell them that switch was rebooted to check if there is no snag / hung switch in production.

Regards,
Exchange_Geek
Avatar of q

ASKER

Yes, i restarted all the switches, more than once, no hitches, (did it during production too, so i could make sure everyone could reconnect - i have an understanding group of users).  The problem always comes back, though it seems to come back slower after a reboot.  Could be a perception thing though.

ISP says maybe its the cable, he was in the process of getting into the router, but feels if it was the router nothing would be available.  I think maybe getting to the server from the inside does not touch the outside router, which is why inside session works, and outside session does not happen.  He is escalating to someone who knows more about this.  I am going into the office to change the cable to the outside router. may as well...
thanks again, i will keep you posted..
Best
Quay





Quay
All the best :)

Regards,
Exchange_Geek
Avatar of q

ASKER

update:  yesterday am the ISP tech got into th router that i use for the gateway at this location and said he could not get out of the outside router.  (Our MPLS was overbuilt)  He rebooted the outside router and the connectivity issue was gone for a few hours, but it came back.  Overnight i could ping natted machines and get time outs, but i could ping those same machines from inside the LAN and get good replys. That led me to believe that there is a fault in the router.  BUT, how is it that mail is still flowing and that some machines can connect and others cannot, and that some machnes can connect to some internet sites but not others?

more... i set up continous pings to all of the switches from 3 different machines.  The switch with the most users spikes with reply times up to as high as 316 ms, (32 bytes). and stays up there for 5 minutes at a time then goes back to 1 ms. This moves around, meaning: sometimes i see the spikes from one machine, then from another.  Does THIS sound like the switch could be the problem?  Could that be causing latency and the disconnects?
Avatar of q

ASKER

Thanks for taking the time to answer me.  After days of arguing with the ISP and lots of finger pointing, the problem 'mysteriously' went away all by itself.  I learned a lot about switchng and troubleshooting these types of issues though, due in large part to your suggestions.  Put me on the path to good articles about that part of networking.