Link to home
Start Free TrialLog in
Avatar of Xetroximyn
XetroximynFlag for United States of America

asked on

VPN -- every 5 mins, traffic stops for 30 seconds -- sometimes only 1 way

I have 2 VPNs from a remote call center.

The voice was on 1 VPN, and the data (terminal sessions) on the other.

The voice VPN, every 5 minutes was having traffic stop -- sometimes only 1 way (i.e. ping at remote office, was pinging through VPN fine while ping to that PC, from HQ would drop 20 packets.  But sometimes traffic stopped both ways.

The data VPN was totally stable -- so we moved the voip box's to that VPN -- and suddenly the same stuff started happening there.

We have restarted all routing/VPN equipment -- and we have verified we are not exceeding the bandwidth on our T1's.

Any ideas?  



 

Avatar of Xetroximyn
Xetroximyn
Flag of United States of America image

ASKER

One of the times this happend, I ran a tracert from the HQ to the external IP of the router where the VPN terminates.  The tracert ran fine, no bottlenecks, while the pings thru the VPN failed.  See pic.
tracrt.PNG
Oh -- and when the VPN seems to be failing, the green light stays lit on the sonicwall VPN page, (I did refresh) -- and it does not renegotiate phase1.  
Do you have sonicwall appliances at both ends of the VPN?  I've seen this problem before but where Cox was the Internet provider.  We had to change the MTU on the WAN interface.  Run the following command on an external IP address, ping -f -l 1024 <IP Address>.  Increase 1024 by 8 until you get a reply.  Take the number you get and change the MTU on the WAN Interface.  The default MTU on the sonicwall is 1500.
actually have a cisco router on 1 end.  Does that make a difference?  Which way should I do the ping then?
Avatar of JanSc
JanSc

Have seen this before, with VoIP phones sending marked traffic, ie. dscp EF, for QoS. Tunnel did not accept that traffic after x period due to buffer overflow.
Perhaps worth trying to disable QoS settings on Voice devices.

What also can be a problem with Sonic wall appliances is Spanning tree handling on one side of the tunnel, which breaks down the logical link. Why? I dont remember/know. Look at that with sniffer attached on Sonicwall to see is something happens just before tunnel breaks.
are you running enhanced os on your sonicwall? it's not a challenge connecting cisco and sonicwall, but getting the settings right can be a challenge. did you check the mtu on the sonicwall side?
I have a sonicwall NSA 240.   I tried the loaded ping, but i just get packet needs to be fragmented byt DF set when I get to 1480 (1473 really)

JanSc -- so by disabling QoS settings on voice devices, you mean setting the voice devices them selves?
Then you need to set the MTU to 1473 on the WAN interface.

Unless you have bandwidth management configured on your Sonicwall, QoS won't matter, as I understand it anyway.  Even if your devices use QoS, your Sonicwall will ignore this information and send all your traffic with the same priority.
It sounds like you need to enable bandwidth management on both ends.
Oh -- and something I should probably mention -- we have run fine for many months.... then suddenly we started having these problems.  Could the ISP have changed their MTU size or something, and that triggered these issues?

It's what we experienced with Cox.  You haven't updated the firmware or OS on either appliance?
Just so I am clear -- are you saying that Cox DID change their MTU size?
getting the following error's over and over again -- in sonicwall logs.  

Received notify. NO_PROPOSAL_CHOSEN
IKE Initiator: Start Quick Mode (Phase 2)

Does this have any significance?  What setting should I look for in cisco?

if you are already in phase two then your psk is correct....check yo make sure the settings in the phase 2 are are correct...
No proposal chosen means something is mismatching in the phases, test the following

phase 2 encryption/authentication
phase 2 life time (cisco defaults are different from sonicwall and is sometimes different between phase 1 and phase 2)

Also, how do you have IKE authentication under the first tab configured?

I looked through the mysonicwall forums and found this suggestion...i'm just copying and pasted their post information. As the posts say, I don't know your exact settings so I'm giving you something to at least look at...hope it helps:

When you enter something into Cisco and it doesn't display them in the SHOW RUN, that usually means they are defaults. NO PROPOSAL CHOSEN is typically due to mis-matched remote IP networks. Short of giving us a copy of the Cisco 2851 config, and at least describing the TZ170 config (or a TSR), I'm not sure there is much we can do for you here. Not enough info.

Well, I found that the "crypto map tosonicwall" was on the wrong interface. Was supposed to be on the interface that is 10.1.1.1. That made Phase 2 work, but now the traffic isn't flowing. That is I can't ping 192.168.2.1 and vise versa, so there must be something more in the routing now.

Figured it out with the help of a Cisco tech! So what needed to be done was the following:

1) the "crypto map" command as to go on the interface to the internet. It couldn't go on the LAN connection that has the IP address the connection is talking to. For example, it went on the serial interface rather than the ethernet int.

2) "reverse-route" was added under the "crypto dynamic-map sonicwall 10" area. Just that command nothing else.

3) And with the two things done above, then the line "crypto map tosonicwall local-address GigabitEthernet0/0" had to be added.

4) Since I had filters filtering outbound traffic, I had an access-list entry that allowed access to the host address of the Cisco router (the IP of the ethernet interface the VPN connection uses as the Gateway address from the Sonicwall entry). So the line was "permit ip host 1.2.3.4 any" where 1.2.3.4 = the valid IP address (in the example above it would be 10.1.1.1). Again this was an outbound filter, otherwise no outbound filters on the ethernet interface are OK, too, of course.
IKE Phase 1 settings from Sonicwall

Aggresive Mode
Group 1
DES
MD5
86400

Phase 2
ESP
DES
MD5
86400

Enable perfect foward secrecy is not checked

On the first tab in sonicwall there are empty fields for Local IKE ID and Peer IKE IDIKE Phase 1 settings from Sonicwall

Aggresive Mode
Group 1
DES
MD5
86400

Phase 2
ESP
DES
MD5
86400

Enable perfect foward secrecy is not checked

On the first tab in sonicwall there are empty fields for Local IKE ID and Peer IKE ID
SOLUTION
Avatar of digitap
digitap
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I actually am not familiar enough with telnet interfaces in routers to find out on my own.  But I talked to the tech that can, and he brought the setting into sync.  And then turning off keepalive in theeepalive in the SW made the errors in the SW log go away.

But the problem persists

I am scared to change the MTU size, because there is another VPN that runs through the sonicwall that works fine, and I dont want to mess that one up.  
glad to hear the error is gone...if you're not losing internet or bthwe other vpn isn't dropping then it probably isn't the mtu...
Yea -- When the issue occurs pings to the internet from the remote office, still run fine.  

To give a little more detail there is actually 2 other VPN's.  The one we have been talking about and then another one run through another T1 with same carrier.  it carries telnet data instead of voip.  And it has been running fine.    But when we moved the voip boxes to that VPN is experienced the same issues.  Moved the voip back, and the issues follow.....

The third VPN is run through a different ISP, and has voip and data on it, and it is running fine.  
are you running any kind of bandwidth management for qos for voip or any other traffic?
Where can I check for that in the sonicwall?  
its on the wan interface and the vpn....go to where you change the mtu on the wan interface and check if bandwidth management is enabled...
Thanks!    Nope -- neither bandwidth management option is checked
ok...i don't remember if the tz180 has the packet capture option or not in the standard OS...that's what I would try next.

Question, when you have traffic problems, does it only affect the VoIP traffic?  Have you considered QoS for the VoIP traffic?

I found another forum post.  Have a look at this:

We had something similar with Cisco Call Manager I think what we did was went under the VoIP settings on the SonicWALL and turned off everything in there... Consistent NAT... SIP Transformations... H.323 Transofmrations, etc. I believe all of those features are for people that are trying to do VoIP through NAT. Since you are probably doing it via VPN or at least not NATing, I think these features are unnecessary and can even hose things up.

Ahh... thank goodness for google desktop search... I found this in my archives:

"We spent some time today putting things back the way they were, to see what exactly changed to fix their dropped call problem. Here is what solved our problem: We unchecked the "Enable TCP stateful inspection" under Firewall > TCP Settings on the PRO1260 at the main office."

So, that was the culprit. Take a look for that and see if it helps. Please let me know!
Thanks much!!  You are awesome!  

When the issue happens, it affects all traffic.   No traffic can flow at all on the VPN.  

I dont have "Enable TCP stateful inspection" under TCP settings -- but I can say that nothing under TCP settings is checked.....  And not sure if it means much, but at the bottom of that page is lists a bunch of counts and amount them there are about 8000 invalid flag packets dropped and 16000 invalid sequence packets lost

We have an NSA 240 -- it does have packet capture -- they wont be running again until tomorrow morning, and the issue only occurs when they are running -- when I do run the capture what should I look for?   Or should I just export and post here or something?.  
First, do you have an option on the left called VoIP?  If so, what settings are selected there?
just "Enable consistent NAT" is checked.


deselect everything under VoIP...can you test function before tomorrow?
One thing I should mention.  Completely separate from the voip box's that run through the VPN, which are for the call center/interviewer phones.   We also use polycom IP phones for our administrative phones at our HQ.   I am not sure if this setting is needed for those phones.  Would you have any idea about that?

I guess I can  give it a try either way, and just see if the phones still work.  

I can't test until tomorrow because the issue only occurs when they are actually on the phones.  


have you gotten the no proposal chosen errors all along?  I wonder if there was something wrong with the cisco that was resolved when the tech re-sync'd it.  You only need these settings if you are actually NAT'ing over your VPN and I don't think you are.  Let's leave it deselected for now and test tomorrow.  If all is well, then it's either VoIP settings or re-sync'ing the cisco.  What do you think about that?

Otherwise, do you still have sonicwall support on this appliance?
So basically it seems like the DH group was not set in the cisco router.  Now that I have DH group set for both phase 1 and phase 2 in the cisco router, it seems to be working fine.
That's gotta be frustrating to have come down to the settings!  I'm sorry, I should have had you export the settings for the cisco...second pair of eyes looking at it...who found the setting mismatch?
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Yeah, you gotta watch those CLI guys.  Group 2 is default.  If it works, don't mess with it.  Someone was messing with the settings.  They don't just stop working, at least that's my experience.  Keep me posted.
Well it is strange -- even after this problems started -- there were some days we didnt have problems.  They were intermittent, and slowly started getting more constant over a few weeks.

I know it was not accidental config -- because it is an outside guy, who was not doing anything for us when it started.   Only way settings could have changed, is if he did it on purpose to try to generate work for himself.....  And I think he would have had to be going back and forth on the settings to cause intermittent issues.   I mean -- I dunno if the guy can be trusted -- but he didn't seem like he was aching for the work or anything.  At times he seemed almost like he was sick of troubleshooting and just wanted us to let him go for the day.  

Oh -well -- Im just glad it is working now..... and I had him show me how to get in and at least look at all the settings in the cisco.  So I can verify when/if settings are changed.

Yes, strange, but glad it's working.  I'm sure knowing how to check the settings on the cisco is reassuring.
thanks for the points!