Link to home
Start Free TrialLog in
Avatar of Quagmire2
Quagmire2

asked on

Watchguard manual Branch Office VPN stability issues

We have several BOVPN's and they all seem to have stability issues.  They'll connect and work but several times a day they'll drop 5 to 10 packets which is just long enough to interfere with network applications.  
In the SOHO logs I receive the following messages;
2008-06-24-17:01:08 MONITOR Quick Mode processing failed
2008-06-24-17:01:08 MONITOR get_ipsec_pref: Unable to find channel info for remote(x.x.x.x)
2008-06-24-17:01:08 MONITOR ACTION - Verify VPN IPSec Policies for x.x.x.x
2008-06-24-17:01:08 MONITOR WARNING - No Matching IPSec Policy found for x.x.x.x

In the x700 logs I get
2008-06-24 17:16:48 kernel VPN disconnected: ipsec policy (Location)
2008-06-24 17:16:54 kernel VPN connected: on South-0 for ipsec policy (Location)

Phase 1 is set to negotiate every 24 hrs and phase 2 does not expire.

I've poured over all the configs and even changed the shared secrets.   I've also changed the phase 1 and 2 settings on multiple devices to different authentication and encryption algorithms.

Is anyone else having these same issues?  Any Ideas on what is causing this or how to resolve?  Any help is much appreciated.  
SOLUTION
Avatar of dpk_wal
dpk_wal
Flag of India image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Quagmire2
Quagmire2

ASKER

In phase 2, both sides are set to 0 for bytes and time.  We have submitted several incidents with WG support, and they recomeded lowering encryption and authentication methods as well as changing phase 2 to both zeros on the edge sites and disabled on the x700.  I just changed the x700 to enable and set to 0's.  Maybe this is the issue.

We have a real time network monitoring tool that pings the extgernal IP of the WG devices and the internal IP of the devices.  The external IP is stable.  

As I understand it, since I have phase 1 set to 24 hours, I should see the tunnel drop and renegotiate once per day??  

If I were to set the phase 2 to 24 hrs as well, would the tunnel drop 2 times per day or would both phase 1 and 2 negotiate at the same time?  

The disabled/enable option on the x700 was not the issue.  I just received the same errors in the logs from abve.
I am not sure if you set it to 24-hours if both the keys would negotiate at the same time; in security terms they should not for security reasons! :)

As you have set it to 0 this means never expire either on byte count or time.

Does any of the site have dynamic IP address; does this re-negotiation happen when the public IP changes or all the sites have static public IP addresses.

Finally, does all the smaller devices (SOHOs and Edge(s)) are running latest firmware.

Please advice.
The devices we're concerned with are the static IP sites.  

All of the devices except the x700 and edge e series are running the latest software.  The x700 and edge e series are running 2 versions behind.  I do not see any BOVPN issues in the release notes that seem to match.  I'll update the devices this weekend to the latest and greatest.

dpk_wal, do you manage alot of WG devices??  Tunnel stability always seems to be an issue with WG since we've been using their equip, 7+ years.  Do you find this is the case for you??
These days I only help people with problems on WG; but I have seen VPN tunnel to be stable on WG; I do feel that ASA might be a better and robust device when compared to WG.

I have seen thousands of tunnel on a single box on a customer box in UK; only few tunnels were flapping overall the tunnels were stable.

VPN tunnel issue might not just be device related they sometimes even might be related to ISP's infrastructure too.

As you are already planning to migrate to ASA I think this would be a good point to start. You can configure the VPN between one of the most unreliable site and ASA and then go from there.

What are your thoughts, please advice.

Thank you.
dpk_wal,

I appreciate your advice.  The other post is to start the process of potentially replacing our infastructure. If we could get the WG tunnels stable, then we most likely will not move to Cisco.  WG is highly touted UTM solution and has worked for us pretty well.  

It really seems that the issue is between all the models and the x700.  When I look at the logs, the SOHO to SOHO connections don't have any of these issues and they are running 3DES.  Since the Data Center is at the x700 site, not much traffic runs between the other sites though.

What would cause the tunnel to rekey besides the button in VPN properties and regenerate settings?  Do know if there is a way to determine what has caused it to rekey?
Not really; its tough to say what caused the rekey; you can enable debug level logging on x700 and then we can look at the logs if they spit out something. we can then capture them and try configuring the devices so try to minimize rekey, Its tough though.

What is the load and memory on the x700 at any point in time; also if you look at system status page, what are the memory usage trends. Can you check. It might be that the processes are getting shunned due to low memory; not sure if this actually is the reason just guessing; trying to look at all possibilities.

Would you have some details as on an average how much data flows on a single tunnel in an hour (I understand it tough to account for....but if you do we can also look at some other settings). Just to check we can configure the keys to renegotiate when the max bytes (2147483647) are transferred and 24 hours [almost 2 GB data and/or 24 hours whichever is earlier].

Some details about traffic trends would help.

Thank you.
Below is some of the info
3:35PM
**Memory
            total:      used:       free:       shared:     buffers:    cached:              
Mem:        263114752   243781632   19333120    0           6922240     37896192  
** Load Average
1-min   5-min   15-min  run-proc last-pid
0.20      0.36      0.36      3/49      5037

3:45 PM
** Memory
           total:      used:       free:       shared:     buffers:    cached:              
Mem:        263114752   244396032   18718720    0           6922240     38412288  
** Load Average
1-min   5-min   15-min  run-proc last-pid
0.13      0.24      0.31      3/49      6558


As you said it is difficult to say how much traffic is going though.  In watching it for a few min here are the results.  We're running 2 bonded T1's for a 3Mb pipe, so there is available bandwith.
External Send Kbps
1763, 1623, 1624, 922, 1565, 1427

External Rcv Kbps
480, 589, 454, 1501, 1408, 810



18718720 bytes of free memory would roughly be 18 MB of free RAM. The CPU load is not too high; try setting the maximum values of byte count and time value on one of the tunnels (on both ends) and see if you get any different results.

x700 has 256 MB of RAM; it is EOL on 25 Oct 2009; the newer x550e and above have at least 512 MB RAM in them. I am not sure if having just 18 MB of free RAM there is any throttling happening; normally you would observer some logs if that was the case. But one thing is sure, this is not an easy fact to live with.

I would like to know if you have configured as full-mesh or hub-spoke or main-branch topology for VPN. May be we can try reducing VPN traffic and this might give better results for tunnel stability.

FYI in my earlier post when I said I had seen thousands of tunnel; that was on a FBIII 2500 boxes.

Please advice.

Thank you.
It is a hub and spoke model.  Some of the sites have manual tunnels if they share files shares.  

Since there is only 18MB of RAM free, I can add more ram.  I'm sure I have some pc133 setting around.  Since the proc is a PIII, I would believe that it is pc133.  Do you know if there are available slots.
We've done this before to extend the life a little of a FBII.

All in all, since our box is 4 years old, we have not made up our mind if we'll replace it with another WG product or the Cisco.  

Just to clarify, are you referring to enabling traffic control or manipulating the VPN negotiation settings?  If I modify the negotiation settings, doesn't that put more of a strain because it now my renegotiate potentially several times a day?  Do you have any recommendations for them?  We generally have, before support recommended setting the phase 2 to both zeros, we had both phases set to 24 hrs and no byte count.

Thanks!
I am just asking what is the effect, when we set to both max byte count and 24 hours (might be an indication as to which limit is reached first) currently as I understand the tunnels are expiring several times in a day.
I am not too sure on the memory type.

My hunch is if we can get one of the tunnel to stay up for at least 24 hours; then this might reduce memory consumption on the unit. I am not sure if this would happen; just a thought.

IF it hub and spoke; so if spoke A wants to communicate to spoke B; it goes through the hub; if you wish you can instead create a manual tunnel between A and B; this way the amount of traffic on hub would get reduced. OR is it that the spokes only communicate with hub and for inter sopke communication you already have specific manual tunnels in place.

I myself have extended memory on FB II and FB III models; I think WG support might be able to guide on memory specifications for X Core platform; I have never personally opened these boxes so am really not sure.

Thank you.
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thank you for the update.