Sonicwall NSA 4600 Suddenly stops passing traffic
Posted on 2014-02-24
Here's my situation:
I have 2 SonicWall NSA 4600 units in an active/passive failover configuration.
When I move them into Production, everything ticks along without incident for approximately 24-48 hours.
After 24-48 hours (the actual interval between failures is inconsistent - longest uptime was around 2 full days, shortest was about 17 hours) the units just cease passing traffic. The Web frontends and the SSH just cease responding.
Powering down the primary unit to force a failover does nothing, as the secondary unit is also non-responsive.
Powering off both units and powering them back on also does not appear to solve the problem. The units become responsive again for about 5 minutes and then cease responding again.
The only thing that gets them functional again is to remove the switch they are connected to and take the units completely off the production network for approximately 30 mins to an hour. Then, when a laptop is plugged into the switch they are on, everything seems fine.
There are no errors in the SonicWall logs related to the failure. In fact, it appears, from log entries, that the units never ceased functioning.
I have tried replacing the network infrastructure to which it is attached. I swapped a smart switch for a 10GB backplane enterprise switch. The switch also does not report any abnormalities.
I have tried connecting just one of the SonicWall units directly to our Core switches, without HA enabled, and the issue still happened.
Our older Check Point firewall is configured nearly identically and has no issues.
Management refuses to allow the SonicWalls to be placed back into Production without the issue being identified and resolved because it takes down the entire network when it ceases to respond.
I've been over the NATs and Routing 3 times and I see no errors.
There are NATs that put the Sonicwall in "Routed Mode", meaning our internal IPs are also Public IPs... But beyond that, it is a very typical setup.
I am hoping perhaps someone has encountered a similar issue and perhaps a method to resolve it? Even being pointed in the right direction would help.
Thanks in Advance!