Solved

Sonicwall NSA 4600 Suddenly stops passing traffic

Posted on 2014-02-24
9
1,605 Views
Last Modified: 2014-03-17
Here's my situation:

I have 2 SonicWall NSA 4600 units in an active/passive failover configuration.
When I move them into Production, everything ticks along without incident for approximately 24-48 hours.

After 24-48 hours (the actual interval between failures is inconsistent - longest uptime was around 2 full days, shortest was about 17 hours) the units just cease passing traffic.  The Web frontends and the SSH just cease responding.

Powering down the primary unit to force a failover does nothing, as the secondary unit is also non-responsive.

Powering off both units and powering them back on also does not appear to solve the problem. The units become responsive again for about 5 minutes and then cease responding again.

The only thing that gets them functional again is to remove the switch they are connected to and take the units completely off the production network for approximately 30 mins to an hour.  Then, when a laptop is plugged into the switch they are on, everything seems fine.  

There are no errors in the SonicWall logs related to the failure.  In fact, it appears, from log entries, that the units never ceased functioning.

I have tried replacing the network infrastructure to which it is attached.  I swapped a smart switch for a 10GB backplane enterprise switch.  The switch also does not report any abnormalities.

I have tried connecting just one of the SonicWall units directly to our Core switches, without HA enabled, and the issue still happened.

Our older Check Point firewall is configured nearly identically and has no issues.

Management refuses to allow the SonicWalls to be placed back into Production without the issue being identified and resolved because it takes down the entire network when it ceases to respond.

I've been over the NATs and Routing 3 times and I see no errors.

There are NATs that put the Sonicwall in "Routed Mode", meaning our internal IPs are also Public IPs... But beyond that, it is a very typical setup.

I am hoping perhaps someone has encountered a similar issue and perhaps a method to resolve it?  Even being pointed in the right direction would help.


Thanks in Advance!
0
Comment
Question by:delpt
  • 5
  • 2
  • 2
9 Comments
 

Author Comment

by:delpt
ID: 39884425
It is worth noting that I have had SNMP monitoring both the Switch and the SonicWall for the duration of the outage.

SNMP continues to respond during the outage, as does the Syslog.  The CPU and RAM are not even close to 100% (about 10% and 15%, respectively).


The switch doesn't even break a sweat.  It hovers around 5-15% capacity at any one given time.
0
 
LVL 20

Expert Comment

by:carlmd
ID: 39885340
0
 

Author Comment

by:delpt
ID: 39886409
OK... Following the guidelines, I've made the following alterations:

1. Added a 10GB Twinax Cable to X17 (10GB SFP) on both units (X11 is also connected with crossover CAT6) and changed the HA Data Interface from X11 to X17.

2. Checked the "Enable Virtual MAC" checkbox.  I could not find the automatically generated MAC for the WAN, so I went into the Monitoring section and specified it.

3. Made sure DPI-SSL was disabled (it was already)


Seems a little odd, however, that the issue still occurred when we had just one Sonicwall connected directly to our core infrastructure with HA disabled...

I won't be able to move it back into Production for a test until next week.  The users have deadlines this week and won't tolerate another potential outage.


If there is anything else I should be looking for, please let me know... I'd like to make sure all my ducks are in a row before I even attempt putting it back.
0
 
LVL 20

Expert Comment

by:carlmd
ID: 39886672
Can't you test the new settings in your test environment?s At least drop the primary unit and see what the HA does.
0
Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

 

Author Comment

by:delpt
ID: 39886710
Yes I can.  But the units have never failed in the test environment.
I have been able to failover/failback until my fingers hurt without any issue at all in the isolated environment.  I've even slammed it with traffic until the source interfaces overload in the test environment and the Sonicwalls and switch do not even blink.

The issue only seems to replicate itself in our production environment and then only after about 24-48 hours and then only under normal load (users and SSL VPNs connected) and I was never able to determine the real root cause.
0
 
LVL 13

Accepted Solution

by:
Greg Hejl earned 500 total points
ID: 39887529
Have you opened a case with Dell?  their engineers are quite good at fixing these issues.  you would receive priority escalation with your model.
0
 

Assisted Solution

by:delpt
delpt earned 0 total points
ID: 39924333
Just finished troubleshooting with Dell.
It is a defect in their product.

Also, SonicWall Support is "restructuring" and even their "Priority" queues are at least 1 hour deep.

We are returning the devices and upgrading our CheckPoint Gateways instead.

Thanks everyone for input!
0
 

Author Closing Comment

by:delpt
ID: 39933737
The solution was to rid myself of the devices.
0
 
LVL 13

Expert Comment

by:Greg Hejl
ID: 39935647
Thanks for the points....

Curious, as I am about to deploy NSA 3600,  what was the defect?  did the offer a solution to resolve your issues?  was it due to HA implementation?
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Suggested Solutions

Hi All,  Recently I have installed and configured a Sonicwall NS220 in the network as a firewall and Internet access gateway. All was working fine until users started reporting that they cannot use the Cisco VPN client to connect to the customer'…
This article offers some helpful and general tips for safe browsing and online shopping. It offers simple and manageable procedures that help to ensure the safety of one's personal information and the security of any devices.
After creating this article (http://www.experts-exchange.com/articles/23699/Setup-Mikrotik-routers-with-OSPF.html), I decided to make a video (no audio) to show you how to configure the routers and run some trace routes and pings between the 7 sites…
After creating this article (http://www.experts-exchange.com/articles/23699/Setup-Mikrotik-routers-with-OSPF.html), I decided to make a video (no audio) to show you how to configure the routers and run some trace routes and pings between the 7 sites…

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now