Link to home
Start Free TrialLog in
Avatar of ccsenet
ccsenetFlag for Saudi Arabia

asked on

Cisco ASA failover - Primary unit reboots on activation

Hi guys...

We have an Active/Standby ASA 5540 failover cluster. Few months back, the secondary unit became active automatically while the primary went into 'Standby Ready' mode. Now, whenever we try to make the primary Active either by giving the command "failover active" on primary or "no failover active" on secondary, the primary automatically and immediately reboots. The logging on secondary unit shows the following messages:

May 13 2009 09:24:27: %ASA-1-104002: (Secondary) Switching to STNDBY - Other unit want me Standby
May 13 2009 09:24:27: %ASA-6-210022: LU missed 8485 updates
May 13 2009 09:24:32: %ASA-1-105003: (Secondary) Monitoring on interface outside waiting
May 13 2009 09:24:32: %ASA-1-105003: (Secondary) Monitoring on interface inside waiting
May 13 2009 09:24:32: %ASA-1-105003: (Secondary) Monitoring on interface DMZ-NHS waiting
May 13 2009 09:24:42: %ASA-1-105008: (Secondary) Testing Interface outside
May 13 2009 09:24:42: %ASA-1-105008: (Secondary) Testing Interface inside
May 13 2009 09:24:42: %ASA-1-105008: (Secondary) Testing Interface DMZ-NHS
May 13 2009 09:24:42: %ASA-1-105009: (Secondary) Testing on interface outside Passed
May 13 2009 09:24:42: %ASA-1-105009: (Secondary) Testing on interface DMZ-NHS Passed
May 13 2009 09:24:42: %ASA-1-105009: (Secondary) Testing on interface inside Passed
May 13 2009 09:24:45: %ASA-1-103001: (Secondary) No response from other firewall (reason code = 1).
May 13 2009 09:24:45: %ASA-1-104001: (Secondary) Switching to ACTIVE - HELLO not heard from mate.
May 13 2009 09:28:55: %ASA-1-709003: (Secondary) Beginning configuration replication: Send to mate.
May 13 2009 09:29:07: %ASA-1-709004: (Secondary) End Configuration Replication (ACT)


The interface GigabitEthernet0/3 is used for LAN failover in both the firewalls. The failover interfaces are connected to a switch. For troubleshooting, we also connected the two interfaces using a cross over cable, it didnt work and the same issue was faced again. Following is the LAN failover configuration on the two units:

Primary Unit:

failover
failover lan unit primary
failover lan interface Statefull-Failover GigabitEthernet0/3
failover key *****
failover replication http
failover link Statefull-Failover GigabitEthernet0/3
failover interface ip Statefull-Failover 10.200.200.1 255.255.255.252 standby 10.200.200.2


Secondary Unit:

failover
failover lan unit secondary
failover lan interface Statefull-Failover GigabitEthernet0/3
failover key *****
failover replication http
failover link Statefull-Failover GigabitEthernet0/3
failover interface ip Statefull-Failover 10.200.200.1 255.255.255.252 standby 10.200.200.2


Following is the Show Failover result on the current active unit.

Failover On
Failover unit Secondary
Failover LAN Interface: Statefull-Failover GigabitEthernet0/3 (up)
Unit Poll frequency 1 seconds, holdtime 15 seconds
Interface Poll frequency 5 seconds, holdtime 25 seconds
Interface Policy 1
Monitored Interfaces 3 of 250 maximum
failover replication http
Version: Ours 7.2(1), Mate 7.2(1)
Last Failover at: 09:24:45 KSA May 13 2009
        This host: Secondary - Active
                Active time: 10195375 (sec)
                slot 0: ASA5540 hw/sw rev (1.0/7.2(1)) status (Up Sys)
                  Interface outside (X.X.X.1): Normal
                  Interface inside (Y.Y.Y.1): Normal
                  Interface DMZ-NHS (Z.Z.Z.1): Normal
                  Interface management (0.0.0.0): Link Down (Not-Monitored)
                slot 1: ASA-SSM-20 hw/sw rev (1.0/6.0(3)E1) status (Up/Up)
                  IPS, 6.0(3)E1, Up
        Other host: Primary - Standby Ready
                Active time: 0 (sec)
                slot 0: ASA5540 hw/sw rev (1.0/7.2(1)) status (Up Sys)
                  Interface outside (X.X.X.2): Normal
                  Interface inside (Y.Y.Y.2): Normal
                  Interface DMZ-NHS (Z.Z.Z.2): Normal
                  Interface management (0.0.0.0): Normal (Not-Monitored)
                slot 1: ASA-SSM-20 hw/sw rev (1.0/6.0(3)E1) status (Up/Up)
                  IPS, 6.0(3)E1, Up

Stateful Failover Logical Update Statistics
        Link : Statefull-Failover GigabitEthernet0/3 (up)
        Stateful Obj    xmit       xerr       rcv        rerr
        General         3509761538 0          3760724    2
        sys cmd         1360203    0          1360196    0
        up time         0          0          0          0
        RPC services    0          0          0          0
        TCP conn        3160458482 0          1931284    0
        UDP conn        284734101  0          423080     0
        ARP tbl         63178436   0          46150      2
        Xlate_Timeout   0          0          0          0
        VPN IKE upd     18605      0          6          0
        VPN IPSEC upd   11698      0          8          0
        VPN CTCP upd    18         0          0          0
        VPN SDI upd     0          0          0          0
        VPN DHCP upd    0          0          0          0

        Logical Update Queue Information
                        Cur     Max     Total
        Recv Q:         0       25      3774426
        Xmit Q:         0       7       3534578209


Any idea whats happening there or what to look for ????
Avatar of lanboyo
lanboyo

Well I would open a TAC case ASAP. If you have a HTTP state inspection for failover, you might want to replace it with a generic tcp inspection, this helped for "A guy on the internet".


And is the management interface unused?
Avatar of nodisco
I'd agree with opening a TAC case on it also - you have some stateful int errors on the failover output - do you have any errors on sh interface for your  Gi0/3 interface on both units?

If your management0/0 is not in use, realistically you should have it shutdown but its unlikely to cause this problem.  Do you have any logs from your Primary unit that may indicate why its rebooting?  
ASKER CERTIFIED SOLUTION
Avatar of lanboyo
lanboyo

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of ccsenet

ASKER


Lanboyo and nodisco

 Thanks for the prompt responses.

Is the management interface unused?
Yes, the management interface is not being used.

failover replication tcp?? This isnt a command in Cisco ASA. Either we can use "failover replication http" or "no failover replication http". The only difference between the two is that in case of later the http connection table from the active unit will not be transferred to the standby unit.

We have found a following link for best practices on configuring Cisco ASA:

http://www.checkthenetwork.com/networksecurity%20Cisco%20ASA%20Firewall%20Best%20Practices%20for%20Firewall%20Deployment%201.asp#_Toc218778849

The above link recommends to disable http replication for performance reasons. Anyhow, we will try disabling and then trying. Lets see...



Avatar of ccsenet

ASKER


Following are the log entries on Primary unit (Standby).

May 13 2009 16:10:46: %ASA-1-105005: (Primary) Lost Failover communications with mate on interface outside
May 13 2009 16:10:46: %ASA-1-105005: (Primary) Lost Failover communications with mate on interface inside
May 13 2009 16:10:46: %ASA-1-105005: (Primary) Lost Failover communications with mate on interface DMZ-NHS
May 13 2009 16:10:46: %ASA-1-105008: (Primary) Testing Interface outside
May 13 2009 16:10:46: %ASA-1-105008: (Primary) Testing Interface inside
May 13 2009 16:10:46: %ASA-1-105008: (Primary) Testing Interface DMZ-NHS
May 13 2009 16:10:46: %ASA-1-105009: (Primary) Testing on interface outside Passed
May 13 2009 16:10:47: %ASA-1-105009: (Primary) Testing on interface inside Passed
May 13 2009 16:10:47: %ASA-1-105009: (Primary) Testing on interface DMZ-NHS Passed
May 16 2009 09:28:00: %ASA-1-105005: (Primary) Lost Failover communications with mate on interface outside
May 16 2009 09:28:00: %ASA-1-105005: (Primary) Lost Failover communications with mate on interface inside
May 16 2009 09:28:00: %ASA-1-105005: (Primary) Lost Failover communications with mate on interface DMZ-NHS
May 16 2009 09:28:00: %ASA-1-105008: (Primary) Testing Interface outside
May 16 2009 09:28:00: %ASA-1-105008: (Primary) Testing Interface inside
May 16 2009 09:28:00: %ASA-1-105008: (Primary) Testing Interface DMZ-NHS
May 16 2009 09:28:00: %ASA-1-105009: (Primary) Testing on interface outside Passed
May 16 2009 09:28:01: %ASA-1-105009: (Primary) Testing on interface DMZ-NHS Passed
May 16 2009 09:28:01: %ASA-1-105009: (Primary) Testing on interface inside Passed
May 18 2009 07:52:13: %ASA-1-105005: (Primary) Lost Failover communications with mate on interface outside
May 18 2009 07:52:13: %ASA-1-105005: (Primary) Lost Failover communications with mate on interface inside
May 18 2009 07:52:13: %ASA-1-105005: (Primary) Lost Failover communications with mate on interface DMZ-NHS
May 18 2009 07:52:13: %ASA-1-105008: (Primary) Testing Interface outside
May 18 2009 07:52:13: %ASA-1-105008: (Primary) Testing Interface inside
May 18 2009 07:52:13: %ASA-1-105008: (Primary) Testing Interface DMZ-NHS
May 18 2009 07:52:13: %ASA-1-105009: (Primary) Testing on interface outside Passed
May 18 2009 07:52:13: %ASA-1-105009: (Primary) Testing on interface inside Passed
May 18 2009 07:52:13: %ASA-1-105009: (Primary) Testing on interface DMZ-NHS Passed
May 18 2009 10:25:57: %ASA-1-105005: (Primary) Lost Failover communications with mate on interface outside
May 18 2009 10:25:57: %ASA-1-105005: (Primary) Lost Failover communications with mate on interface inside
May 18 2009 10:25:57: %ASA-1-105005: (Primary) Lost Failover communications with mate on interface DMZ-NHS
May 18 2009 10:25:57: %ASA-1-105008: (Primary) Testing Interface outside
May 18 2009 10:25:57: %ASA-1-105008: (Primary) Testing Interface inside
May 18 2009 10:25:57: %ASA-1-105008: (Primary) Testing Interface DMZ-NHS
May 18 2009 10:25:57: %ASA-1-105009: (Primary) Testing on interface outside Passed
May 18 2009 10:25:57: %ASA-1-105009: (Primary) Testing on interface inside Passed
May 18 2009 10:25:57: %ASA-1-105009: (Primary) Testing on interface DMZ-NHS Passed


Are they (Active and Standby units) supposed to communicate over interfaces other than Giga0/3 (LAN Failover Interface)?

Any clues??
Avatar of ccsenet

ASKER


Cool... The problem is solved.

"no failover replication http" did it for us. The primary unit is now active. It seemed to be a problem with poll and hold timers. The short timers probably do not allow the http replication while failing over. We believe that adjusting timers might also have helped.

For the second problem posted above: "Lost Failover communications with mate on interface outside/inside/DMZ-NHS" we have found a good explanation in "Configuring Failover via Cisco ASDM" by Bob Eckhoff. Can be downloaded from:

https://cisco.hosted.jivesoftware.com/servlet/JiveServlet/download/3390-1-2874/Configuring%20Failover%20via%20ASDM_Posted_10-30-08.pdf;jsessionid=C38AF4535FE4F4FA666CDB19A6EEDDAA

Again the Poll and hold timers for monitored interfaces seems a possible solution there.


Avatar of ccsenet

ASKER

"failover replication tcp" didnt work since there is no such command supported in Cisco ASA 5540. "no failover replication tcp" solved the problem.
Avatar of ccsenet

ASKER

A mistake made in the comment on lanboyo's solution. The correct statement: "no failover replication http" solved the problem.