Solved

Watchguard manual Branch Office VPN stability issues

Posted on 2008-06-24
14
3,944 Views
Last Modified: 2013-11-16
We have several BOVPN's and they all seem to have stability issues.  They'll connect and work but several times a day they'll drop 5 to 10 packets which is just long enough to interfere with network applications.  
In the SOHO logs I receive the following messages;
2008-06-24-17:01:08 MONITOR Quick Mode processing failed
2008-06-24-17:01:08 MONITOR get_ipsec_pref: Unable to find channel info for remote(x.x.x.x)
2008-06-24-17:01:08 MONITOR ACTION - Verify VPN IPSec Policies for x.x.x.x
2008-06-24-17:01:08 MONITOR WARNING - No Matching IPSec Policy found for x.x.x.x

In the x700 logs I get
2008-06-24 17:16:48 kernel VPN disconnected: ipsec policy (Location)
2008-06-24 17:16:54 kernel VPN connected: on South-0 for ipsec policy (Location)

Phase 1 is set to negotiate every 24 hrs and phase 2 does not expire.

I've poured over all the configs and even changed the shared secrets.   I've also changed the phase 1 and 2 settings on multiple devices to different authentication and encryption algorithms.

Is anyone else having these same issues?  Any Ideas on what is causing this or how to resolve?  Any help is much appreciated.  
0
Comment
Question by:Quagmire2
  • 7
  • 7
14 Comments
 
LVL 32

Assisted Solution

by:dpk_wal
dpk_wal earned 120 total points
ID: 21864246
As seen from logs, Quick Mode or phase II of VPN tunnel fails; normally when we create a VPN tunnel we configure it to re-negotiate keys for security reasons; this re-negotiations happens after every fixed interval of time or after fixed bytes are transferred.

You have mentioned that phase II expiration is set to 0; if that is the case then above logs should not be seen.

Can you double check to make sure phII renegotiation is actually set not to expire on both ends.

As a simple test, run ping from end of the tunnel to the other end with -t option (continous pings); observe if this caused the tunnel to expire early or if this give some stability to the tunnel.

Please check and update.

Thank you.
0
 

Author Comment

by:Quagmire2
ID: 21865880
In phase 2, both sides are set to 0 for bytes and time.  We have submitted several incidents with WG support, and they recomeded lowering encryption and authentication methods as well as changing phase 2 to both zeros on the edge sites and disabled on the x700.  I just changed the x700 to enable and set to 0's.  Maybe this is the issue.

We have a real time network monitoring tool that pings the extgernal IP of the WG devices and the internal IP of the devices.  The external IP is stable.  

As I understand it, since I have phase 1 set to 24 hours, I should see the tunnel drop and renegotiate once per day??  

If I were to set the phase 2 to 24 hrs as well, would the tunnel drop 2 times per day or would both phase 1 and 2 negotiate at the same time?  

0
 

Author Comment

by:Quagmire2
ID: 21865921
The disabled/enable option on the x700 was not the issue.  I just received the same errors in the logs from abve.
0
 
LVL 32

Expert Comment

by:dpk_wal
ID: 21866062
I am not sure if you set it to 24-hours if both the keys would negotiate at the same time; in security terms they should not for security reasons! :)

As you have set it to 0 this means never expire either on byte count or time.

Does any of the site have dynamic IP address; does this re-negotiation happen when the public IP changes or all the sites have static public IP addresses.

Finally, does all the smaller devices (SOHOs and Edge(s)) are running latest firmware.

Please advice.
0
 

Author Comment

by:Quagmire2
ID: 21866489
The devices we're concerned with are the static IP sites.  

All of the devices except the x700 and edge e series are running the latest software.  The x700 and edge e series are running 2 versions behind.  I do not see any BOVPN issues in the release notes that seem to match.  I'll update the devices this weekend to the latest and greatest.

dpk_wal, do you manage alot of WG devices??  Tunnel stability always seems to be an issue with WG since we've been using their equip, 7+ years.  Do you find this is the case for you??
0
 
LVL 32

Expert Comment

by:dpk_wal
ID: 21868383
These days I only help people with problems on WG; but I have seen VPN tunnel to be stable on WG; I do feel that ASA might be a better and robust device when compared to WG.

I have seen thousands of tunnel on a single box on a customer box in UK; only few tunnels were flapping overall the tunnels were stable.

VPN tunnel issue might not just be device related they sometimes even might be related to ISP's infrastructure too.

As you are already planning to migrate to ASA I think this would be a good point to start. You can configure the VPN between one of the most unreliable site and ASA and then go from there.

What are your thoughts, please advice.

Thank you.
0
 

Author Comment

by:Quagmire2
ID: 21868897
dpk_wal,

I appreciate your advice.  The other post is to start the process of potentially replacing our infastructure. If we could get the WG tunnels stable, then we most likely will not move to Cisco.  WG is highly touted UTM solution and has worked for us pretty well.  

It really seems that the issue is between all the models and the x700.  When I look at the logs, the SOHO to SOHO connections don't have any of these issues and they are running 3DES.  Since the Data Center is at the x700 site, not much traffic runs between the other sites though.

What would cause the tunnel to rekey besides the button in VPN properties and regenerate settings?  Do know if there is a way to determine what has caused it to rekey?
0
Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 32

Expert Comment

by:dpk_wal
ID: 21869083
Not really; its tough to say what caused the rekey; you can enable debug level logging on x700 and then we can look at the logs if they spit out something. we can then capture them and try configuring the devices so try to minimize rekey, Its tough though.

What is the load and memory on the x700 at any point in time; also if you look at system status page, what are the memory usage trends. Can you check. It might be that the processes are getting shunned due to low memory; not sure if this actually is the reason just guessing; trying to look at all possibilities.

Would you have some details as on an average how much data flows on a single tunnel in an hour (I understand it tough to account for....but if you do we can also look at some other settings). Just to check we can configure the keys to renegotiate when the max bytes (2147483647) are transferred and 24 hours [almost 2 GB data and/or 24 hours whichever is earlier].

Some details about traffic trends would help.

Thank you.
0
 

Author Comment

by:Quagmire2
ID: 21869240
Below is some of the info
3:35PM
**Memory
            total:      used:       free:       shared:     buffers:    cached:              
Mem:        263114752   243781632   19333120    0           6922240     37896192  
** Load Average
1-min   5-min   15-min  run-proc last-pid
0.20      0.36      0.36      3/49      5037

3:45 PM
** Memory
           total:      used:       free:       shared:     buffers:    cached:              
Mem:        263114752   244396032   18718720    0           6922240     38412288  
** Load Average
1-min   5-min   15-min  run-proc last-pid
0.13      0.24      0.31      3/49      6558


As you said it is difficult to say how much traffic is going though.  In watching it for a few min here are the results.  We're running 2 bonded T1's for a 3Mb pipe, so there is available bandwith.
External Send Kbps
1763, 1623, 1624, 922, 1565, 1427

External Rcv Kbps
480, 589, 454, 1501, 1408, 810



0
 
LVL 32

Expert Comment

by:dpk_wal
ID: 21871723
18718720 bytes of free memory would roughly be 18 MB of free RAM. The CPU load is not too high; try setting the maximum values of byte count and time value on one of the tunnels (on both ends) and see if you get any different results.

x700 has 256 MB of RAM; it is EOL on 25 Oct 2009; the newer x550e and above have at least 512 MB RAM in them. I am not sure if having just 18 MB of free RAM there is any throttling happening; normally you would observer some logs if that was the case. But one thing is sure, this is not an easy fact to live with.

I would like to know if you have configured as full-mesh or hub-spoke or main-branch topology for VPN. May be we can try reducing VPN traffic and this might give better results for tunnel stability.

FYI in my earlier post when I said I had seen thousands of tunnel; that was on a FBIII 2500 boxes.

Please advice.

Thank you.
0
 

Author Comment

by:Quagmire2
ID: 21874841
It is a hub and spoke model.  Some of the sites have manual tunnels if they share files shares.  

Since there is only 18MB of RAM free, I can add more ram.  I'm sure I have some pc133 setting around.  Since the proc is a PIII, I would believe that it is pc133.  Do you know if there are available slots.
We've done this before to extend the life a little of a FBII.

All in all, since our box is 4 years old, we have not made up our mind if we'll replace it with another WG product or the Cisco.  

Just to clarify, are you referring to enabling traffic control or manipulating the VPN negotiation settings?  If I modify the negotiation settings, doesn't that put more of a strain because it now my renegotiate potentially several times a day?  Do you have any recommendations for them?  We generally have, before support recommended setting the phase 2 to both zeros, we had both phases set to 24 hrs and no byte count.

Thanks!
0
 
LVL 32

Expert Comment

by:dpk_wal
ID: 21875383
I am just asking what is the effect, when we set to both max byte count and 24 hours (might be an indication as to which limit is reached first) currently as I understand the tunnels are expiring several times in a day.
I am not too sure on the memory type.

My hunch is if we can get one of the tunnel to stay up for at least 24 hours; then this might reduce memory consumption on the unit. I am not sure if this would happen; just a thought.

IF it hub and spoke; so if spoke A wants to communicate to spoke B; it goes through the hub; if you wish you can instead create a manual tunnel between A and B; this way the amount of traffic on hub would get reduced. OR is it that the spokes only communicate with hub and for inter sopke communication you already have specific manual tunnels in place.

I myself have extended memory on FB II and FB III models; I think WG support might be able to guide on memory specifications for X Core platform; I have never personally opened these boxes so am really not sure.

Thank you.
0
 

Accepted Solution

by:
Quagmire2 earned 0 total points
ID: 22140233
I've submitted an incident with Watchguard.  I have gotten a response back from them that may resolve the issue.  In the tunnel config, the x700 was set to any-->remote IP range and the remote sohos were set to local IP range-->Remote IP Range.  So far so good.  The any is a default setting for the local network on the x700.  
0
 
LVL 32

Expert Comment

by:dpk_wal
ID: 22143260
Thank you for the update.
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Suggested Solutions

I've written this article to illustrate how we can implement a Dynamic Multipoint VPN (DMVPN) with both hub and spokes having a dynamically assigned non-broadcast multiple-access (NBMA) network IP (public IP). Here is the basic setup of DMVPN Pha…
I recently attended Cisco Live! in Las Vegas, a conference that boasted over 28,000 techies in attendance, and a week of hands-on learning hosted by a solid partner with which Concerto goes to market.  Every year, Cisco displays cutting-edge technol…
After creating this article (http://www.experts-exchange.com/articles/23699/Setup-Mikrotik-routers-with-OSPF.html), I decided to make a video (no audio) to show you how to configure the routers and run some trace routes and pings between the 7 sites…
After creating this article (http://www.experts-exchange.com/articles/23699/Setup-Mikrotik-routers-with-OSPF.html), I decided to make a video (no audio) to show you how to configure the routers and run some trace routes and pings between the 7 sites…

706 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now