Internet connections interrupted

Hello all,

This is long, please bear with me.

For the past week we have been having severe latency and interruptions to our internet connectivity.  Messages from R-U-0n and MxWatch show that periodically our resources are unreachable online, the timing varies.  We have an MPLS network and all sites are affected, though it seems that the main site is affected more.   I have removed our own firewall from the mix.  That did not solve the problem.  The ISP has their own FrontLine firewall.

These interruptions affect computers in any switch.  Switch diags show no errors or packet loss.WireShark reports 0 errors. Resources inside the LAN are not affected, all servers and printers are available as usual while the Internet interruptions occur.  

Current example:  A continuous ping to the natted address from outside to Server A times out (5 times) and then comes back for one reply, then repeats, in that pattern, 5 off, one on, with little variation.  Continuous ping to Server A from Server B from inside the LAN replys in <1ms with no timeouts at all. Both servers are in the same gigabit switch. Could this still happen if the switch, a port or the cable were bad?   I was able to remote to the Server B at the same time i was unable to remote to Server B.  After about 20 minutes I got disconnected from Server B, then after a few minutes it was available again.  I am able to remote to the Server A from my Server B session inside the LAN with no issues.

Our ISP says there are no errors on the circuit, however Verizon came to fix what they said was a circuit problem on 7/31 (they did not touch the data room, only the demarc), and it was right after they left that the constant monitoring messages began.  

Any ideas on what could be causing the problem would be appreciated.
quaybjAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

 
Exchange_GeekCommented:
Next time this issue happens, what I'd suggest to run a tracert from outside the network to your internal box, check the number of hops it goes through.

Now, each of those hops can cause issues to you practically. There are only two points that come up in these issues - problem at ISP end OR your end.

If ISP mentions that they are good at their end, what can be done is run a simultaneous ping from internet to your network viz-a-viz from ISP network.

If you find that from the internet the pings aren't working good however, from the ISP it works good - this means that ISP is at fault.

Example
Example 1) Internet - - - > ISP -------> Your network
Example 2) ISP -------> Your network.

Next, if from the internet you find glitches in pings and your ISP too finds it, this indicates issues at your end.

What you could check at your end, your main firewall logs for connection drop-outs. Switch diags if you mentioned that there are no issues there on.

Also, you mentioned that there are no traces on Winroutes, my next point of thought could be check server logs - are these servers virtualized? If yes, there could an issue with your virtualization host NIC.

Now, you mentioned Verizon came and fixed something - ask them for explanation, to what they fixed and what was their thought of troubleshooting / how did they discover it.

Regards,
Exchange_Geek
0

Experts Exchange Solution brought to you by ConnectWise

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
 
quaybjAuthor Commented:
Thanks Exchange_Geek,

It's happening now, i just put in a ticket, will call momentarily.

tracerts take 22 hops to a server now answering pings well, but take 24 hops to a server currently doing the peridioc ping timeouts, with 2 tracert timeouts after it gets into the ISP network, just before it gets to the server outside address.

firewall logs, when the firewall was running, showed the only drops to be from forbiddent sites, and rbl email servers.  no other dropped connections.

switches diag ok, but could this still ,possibly be a switch issue,  something subtle????  a cable issue?  I don't think so, but I would hate to be wrong on that.  i just simultaneoulsy pinged 5 natted boxes on different switches they all timed out together about 30 times, then all responded at once.  That says it's not the switch to me.  Sounds like the outside router.

no virtual server, and the server event logs show no drops either, except for when i rebooted the switches.

yes, i have requested that Verizon explain what they did.

Regards,
Quay
0
 
Exchange_GeekCommented:
check if you can restart the switch - i know this isn't possible during production hours, however if it'll be possible - you could always tell them that switch was rebooted to check if there is no snag / hung switch in production.

Regards,
Exchange_Geek
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

 
quaybjAuthor Commented:
Yes, i restarted all the switches, more than once, no hitches, (did it during production too, so i could make sure everyone could reconnect - i have an understanding group of users).  The problem always comes back, though it seems to come back slower after a reboot.  Could be a perception thing though.

ISP says maybe its the cable, he was in the process of getting into the router, but feels if it was the router nothing would be available.  I think maybe getting to the server from the inside does not touch the outside router, which is why inside session works, and outside session does not happen.  He is escalating to someone who knows more about this.  I am going into the office to change the cable to the outside router. may as well...
thanks again, i will keep you posted..
Best
Quay





Quay
0
 
Exchange_GeekCommented:
All the best :)

Regards,
Exchange_Geek
0
 
quaybjAuthor Commented:
update:  yesterday am the ISP tech got into th router that i use for the gateway at this location and said he could not get out of the outside router.  (Our MPLS was overbuilt)  He rebooted the outside router and the connectivity issue was gone for a few hours, but it came back.  Overnight i could ping natted machines and get time outs, but i could ping those same machines from inside the LAN and get good replys. That led me to believe that there is a fault in the router.  BUT, how is it that mail is still flowing and that some machines can connect and others cannot, and that some machnes can connect to some internet sites but not others?

more... i set up continous pings to all of the switches from 3 different machines.  The switch with the most users spikes with reply times up to as high as 316 ms, (32 bytes). and stays up there for 5 minutes at a time then goes back to 1 ms. This moves around, meaning: sometimes i see the spikes from one machine, then from another.  Does THIS sound like the switch could be the problem?  Could that be causing latency and the disconnects?
0
 
quaybjAuthor Commented:
Thanks for taking the time to answer me.  After days of arguing with the ISP and lots of finger pointing, the problem 'mysteriously' went away all by itself.  I learned a lot about switchng and troubleshooting these types of issues though, due in large part to your suggestions.  Put me on the path to good articles about that part of networking.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.