Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17


Sonicwall NSA 4600 Suddenly stops passing traffic

Posted on 2014-02-24
Medium Priority
Last Modified: 2014-03-17
Here's my situation:

I have 2 SonicWall NSA 4600 units in an active/passive failover configuration.
When I move them into Production, everything ticks along without incident for approximately 24-48 hours.

After 24-48 hours (the actual interval between failures is inconsistent - longest uptime was around 2 full days, shortest was about 17 hours) the units just cease passing traffic.  The Web frontends and the SSH just cease responding.

Powering down the primary unit to force a failover does nothing, as the secondary unit is also non-responsive.

Powering off both units and powering them back on also does not appear to solve the problem. The units become responsive again for about 5 minutes and then cease responding again.

The only thing that gets them functional again is to remove the switch they are connected to and take the units completely off the production network for approximately 30 mins to an hour.  Then, when a laptop is plugged into the switch they are on, everything seems fine.  

There are no errors in the SonicWall logs related to the failure.  In fact, it appears, from log entries, that the units never ceased functioning.

I have tried replacing the network infrastructure to which it is attached.  I swapped a smart switch for a 10GB backplane enterprise switch.  The switch also does not report any abnormalities.

I have tried connecting just one of the SonicWall units directly to our Core switches, without HA enabled, and the issue still happened.

Our older Check Point firewall is configured nearly identically and has no issues.

Management refuses to allow the SonicWalls to be placed back into Production without the issue being identified and resolved because it takes down the entire network when it ceases to respond.

I've been over the NATs and Routing 3 times and I see no errors.

There are NATs that put the Sonicwall in "Routed Mode", meaning our internal IPs are also Public IPs... But beyond that, it is a very typical setup.

I am hoping perhaps someone has encountered a similar issue and perhaps a method to resolve it?  Even being pointed in the right direction would help.

Thanks in Advance!
Question by:delpt
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 2
  • 2

Author Comment

ID: 39884425
It is worth noting that I have had SNMP monitoring both the Switch and the SonicWall for the duration of the outage.

SNMP continues to respond during the outage, as does the Syslog.  The CPU and RAM are not even close to 100% (about 10% and 15%, respectively).

The switch doesn't even break a sweat.  It hovers around 5-15% capacity at any one given time.
LVL 20

Expert Comment

ID: 39885340

Author Comment

ID: 39886409
OK... Following the guidelines, I've made the following alterations:

1. Added a 10GB Twinax Cable to X17 (10GB SFP) on both units (X11 is also connected with crossover CAT6) and changed the HA Data Interface from X11 to X17.

2. Checked the "Enable Virtual MAC" checkbox.  I could not find the automatically generated MAC for the WAN, so I went into the Monitoring section and specified it.

3. Made sure DPI-SSL was disabled (it was already)

Seems a little odd, however, that the issue still occurred when we had just one Sonicwall connected directly to our core infrastructure with HA disabled...

I won't be able to move it back into Production for a test until next week.  The users have deadlines this week and won't tolerate another potential outage.

If there is anything else I should be looking for, please let me know... I'd like to make sure all my ducks are in a row before I even attempt putting it back.
Take our survey for a chance to win!

As a valued customer of Targus, we’d like to ask you a few questions about us. As thanks, you will be automatically entered for a chance to win a $500 VISA gift card. To enter, just complete the survey by September 15, 2017.

LVL 20

Expert Comment

ID: 39886672
Can't you test the new settings in your test environment?s At least drop the primary unit and see what the HA does.

Author Comment

ID: 39886710
Yes I can.  But the units have never failed in the test environment.
I have been able to failover/failback until my fingers hurt without any issue at all in the isolated environment.  I've even slammed it with traffic until the source interfaces overload in the test environment and the Sonicwalls and switch do not even blink.

The issue only seems to replicate itself in our production environment and then only after about 24-48 hours and then only under normal load (users and SSL VPNs connected) and I was never able to determine the real root cause.
LVL 13

Accepted Solution

Greg Hejl earned 1500 total points
ID: 39887529
Have you opened a case with Dell?  their engineers are quite good at fixing these issues.  you would receive priority escalation with your model.

Assisted Solution

delpt earned 0 total points
ID: 39924333
Just finished troubleshooting with Dell.
It is a defect in their product.

Also, SonicWall Support is "restructuring" and even their "Priority" queues are at least 1 hour deep.

We are returning the devices and upgrading our CheckPoint Gateways instead.

Thanks everyone for input!

Author Closing Comment

ID: 39933737
The solution was to rid myself of the devices.
LVL 13

Expert Comment

by:Greg Hejl
ID: 39935647
Thanks for the points....

Curious, as I am about to deploy NSA 3600,  what was the defect?  did the offer a solution to resolve your issues?  was it due to HA implementation?

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The Cisco RV042 router is a popular small network interfacing device that is often used as an internet gateway. Network administrators need to get at the management interface to make settings, change passwords, etc. This access is generally done usi…
In the world of WAN, QoS is a pretty important topic for most, if not all, networks. Some WAN technologies have QoS mechanisms built in, but others, such as some L2 WAN's, don't have QoS control in the provider cloud.
After creating this article (, I decided to make a video (no audio) to show you how to configure the routers and run some trace routes and pings between the 7 sites…
After creating this article (, I decided to make a video (no audio) to show you how to configure the routers and run some trace routes and pings between the 7 sites…
Suggested Courses

704 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question