Solved

Cisco ASA failover - Primary unit reboots on activation

Posted on 2009-05-16
8
6,905 Views
Last Modified: 2013-11-16
Hi guys...

We have an Active/Standby ASA 5540 failover cluster. Few months back, the secondary unit became active automatically while the primary went into 'Standby Ready' mode. Now, whenever we try to make the primary Active either by giving the command "failover active" on primary or "no failover active" on secondary, the primary automatically and immediately reboots. The logging on secondary unit shows the following messages:

May 13 2009 09:24:27: %ASA-1-104002: (Secondary) Switching to STNDBY - Other unit want me Standby
May 13 2009 09:24:27: %ASA-6-210022: LU missed 8485 updates
May 13 2009 09:24:32: %ASA-1-105003: (Secondary) Monitoring on interface outside waiting
May 13 2009 09:24:32: %ASA-1-105003: (Secondary) Monitoring on interface inside waiting
May 13 2009 09:24:32: %ASA-1-105003: (Secondary) Monitoring on interface DMZ-NHS waiting
May 13 2009 09:24:42: %ASA-1-105008: (Secondary) Testing Interface outside
May 13 2009 09:24:42: %ASA-1-105008: (Secondary) Testing Interface inside
May 13 2009 09:24:42: %ASA-1-105008: (Secondary) Testing Interface DMZ-NHS
May 13 2009 09:24:42: %ASA-1-105009: (Secondary) Testing on interface outside Passed
May 13 2009 09:24:42: %ASA-1-105009: (Secondary) Testing on interface DMZ-NHS Passed
May 13 2009 09:24:42: %ASA-1-105009: (Secondary) Testing on interface inside Passed
May 13 2009 09:24:45: %ASA-1-103001: (Secondary) No response from other firewall (reason code = 1).
May 13 2009 09:24:45: %ASA-1-104001: (Secondary) Switching to ACTIVE - HELLO not heard from mate.
May 13 2009 09:28:55: %ASA-1-709003: (Secondary) Beginning configuration replication: Send to mate.
May 13 2009 09:29:07: %ASA-1-709004: (Secondary) End Configuration Replication (ACT)


The interface GigabitEthernet0/3 is used for LAN failover in both the firewalls. The failover interfaces are connected to a switch. For troubleshooting, we also connected the two interfaces using a cross over cable, it didnt work and the same issue was faced again. Following is the LAN failover configuration on the two units:

Primary Unit:

failover
failover lan unit primary
failover lan interface Statefull-Failover GigabitEthernet0/3
failover key *****
failover replication http
failover link Statefull-Failover GigabitEthernet0/3
failover interface ip Statefull-Failover 10.200.200.1 255.255.255.252 standby 10.200.200.2


Secondary Unit:

failover
failover lan unit secondary
failover lan interface Statefull-Failover GigabitEthernet0/3
failover key *****
failover replication http
failover link Statefull-Failover GigabitEthernet0/3
failover interface ip Statefull-Failover 10.200.200.1 255.255.255.252 standby 10.200.200.2


Following is the Show Failover result on the current active unit.

Failover On
Failover unit Secondary
Failover LAN Interface: Statefull-Failover GigabitEthernet0/3 (up)
Unit Poll frequency 1 seconds, holdtime 15 seconds
Interface Poll frequency 5 seconds, holdtime 25 seconds
Interface Policy 1
Monitored Interfaces 3 of 250 maximum
failover replication http
Version: Ours 7.2(1), Mate 7.2(1)
Last Failover at: 09:24:45 KSA May 13 2009
        This host: Secondary - Active
                Active time: 10195375 (sec)
                slot 0: ASA5540 hw/sw rev (1.0/7.2(1)) status (Up Sys)
                  Interface outside (X.X.X.1): Normal
                  Interface inside (Y.Y.Y.1): Normal
                  Interface DMZ-NHS (Z.Z.Z.1): Normal
                  Interface management (0.0.0.0): Link Down (Not-Monitored)
                slot 1: ASA-SSM-20 hw/sw rev (1.0/6.0(3)E1) status (Up/Up)
                  IPS, 6.0(3)E1, Up
        Other host: Primary - Standby Ready
                Active time: 0 (sec)
                slot 0: ASA5540 hw/sw rev (1.0/7.2(1)) status (Up Sys)
                  Interface outside (X.X.X.2): Normal
                  Interface inside (Y.Y.Y.2): Normal
                  Interface DMZ-NHS (Z.Z.Z.2): Normal
                  Interface management (0.0.0.0): Normal (Not-Monitored)
                slot 1: ASA-SSM-20 hw/sw rev (1.0/6.0(3)E1) status (Up/Up)
                  IPS, 6.0(3)E1, Up

Stateful Failover Logical Update Statistics
        Link : Statefull-Failover GigabitEthernet0/3 (up)
        Stateful Obj    xmit       xerr       rcv        rerr
        General         3509761538 0          3760724    2
        sys cmd         1360203    0          1360196    0
        up time         0          0          0          0
        RPC services    0          0          0          0
        TCP conn        3160458482 0          1931284    0
        UDP conn        284734101  0          423080     0
        ARP tbl         63178436   0          46150      2
        Xlate_Timeout   0          0          0          0
        VPN IKE upd     18605      0          6          0
        VPN IPSEC upd   11698      0          8          0
        VPN CTCP upd    18         0          0          0
        VPN SDI upd     0          0          0          0
        VPN DHCP upd    0          0          0          0

        Logical Update Queue Information
                        Cur     Max     Total
        Recv Q:         0       25      3774426
        Xmit Q:         0       7       3534578209


Any idea whats happening there or what to look for ????
0
Comment
Question by:ccsenet
  • 5
  • 2
8 Comments
 
LVL 10

Expert Comment

by:lanboyo
ID: 24403193
Well I would open a TAC case ASAP. If you have a HTTP state inspection for failover, you might want to replace it with a generic tcp inspection, this helped for "A guy on the internet".


And is the management interface unused?
0
 
LVL 19

Expert Comment

by:nodisco
ID: 24404977
I'd agree with opening a TAC case on it also - you have some stateful int errors on the failover output - do you have any errors on sh interface for your  Gi0/3 interface on both units?

If your management0/0 is not in use, realistically you should have it shutdown but its unlikely to cause this problem.  Do you have any logs from your Primary unit that may indicate why its rebooting?  
0
 
LVL 10

Accepted Solution

by:
lanboyo earned 500 total points
ID: 24408329
Sorry, I was unclear. You have ;

failover replication http

- Someone with similar symptoms was able to resolve it with replacing this with;

failover replication tcp .

Which screams BUG to me, but you might be more interested in stability than code purity.
0
 
LVL 1

Author Comment

by:ccsenet
ID: 24409585

Lanboyo and nodisco

 Thanks for the prompt responses.

Is the management interface unused?
Yes, the management interface is not being used.

failover replication tcp?? This isnt a command in Cisco ASA. Either we can use "failover replication http" or "no failover replication http". The only difference between the two is that in case of later the http connection table from the active unit will not be transferred to the standby unit.

We have found a following link for best practices on configuring Cisco ASA:

http://www.checkthenetwork.com/networksecurity%20Cisco%20ASA%20Firewall%20Best%20Practices%20for%20Firewall%20Deployment%201.asp#_Toc218778849

The above link recommends to disable http replication for performance reasons. Anyhow, we will try disabling and then trying. Lets see...



0
Zoho SalesIQ

Hassle-free live chat software re-imagined for business growth. 2 users, always free.

 
LVL 1

Author Comment

by:ccsenet
ID: 24409628

Following are the log entries on Primary unit (Standby).

May 13 2009 16:10:46: %ASA-1-105005: (Primary) Lost Failover communications with mate on interface outside
May 13 2009 16:10:46: %ASA-1-105005: (Primary) Lost Failover communications with mate on interface inside
May 13 2009 16:10:46: %ASA-1-105005: (Primary) Lost Failover communications with mate on interface DMZ-NHS
May 13 2009 16:10:46: %ASA-1-105008: (Primary) Testing Interface outside
May 13 2009 16:10:46: %ASA-1-105008: (Primary) Testing Interface inside
May 13 2009 16:10:46: %ASA-1-105008: (Primary) Testing Interface DMZ-NHS
May 13 2009 16:10:46: %ASA-1-105009: (Primary) Testing on interface outside Passed
May 13 2009 16:10:47: %ASA-1-105009: (Primary) Testing on interface inside Passed
May 13 2009 16:10:47: %ASA-1-105009: (Primary) Testing on interface DMZ-NHS Passed
May 16 2009 09:28:00: %ASA-1-105005: (Primary) Lost Failover communications with mate on interface outside
May 16 2009 09:28:00: %ASA-1-105005: (Primary) Lost Failover communications with mate on interface inside
May 16 2009 09:28:00: %ASA-1-105005: (Primary) Lost Failover communications with mate on interface DMZ-NHS
May 16 2009 09:28:00: %ASA-1-105008: (Primary) Testing Interface outside
May 16 2009 09:28:00: %ASA-1-105008: (Primary) Testing Interface inside
May 16 2009 09:28:00: %ASA-1-105008: (Primary) Testing Interface DMZ-NHS
May 16 2009 09:28:00: %ASA-1-105009: (Primary) Testing on interface outside Passed
May 16 2009 09:28:01: %ASA-1-105009: (Primary) Testing on interface DMZ-NHS Passed
May 16 2009 09:28:01: %ASA-1-105009: (Primary) Testing on interface inside Passed
May 18 2009 07:52:13: %ASA-1-105005: (Primary) Lost Failover communications with mate on interface outside
May 18 2009 07:52:13: %ASA-1-105005: (Primary) Lost Failover communications with mate on interface inside
May 18 2009 07:52:13: %ASA-1-105005: (Primary) Lost Failover communications with mate on interface DMZ-NHS
May 18 2009 07:52:13: %ASA-1-105008: (Primary) Testing Interface outside
May 18 2009 07:52:13: %ASA-1-105008: (Primary) Testing Interface inside
May 18 2009 07:52:13: %ASA-1-105008: (Primary) Testing Interface DMZ-NHS
May 18 2009 07:52:13: %ASA-1-105009: (Primary) Testing on interface outside Passed
May 18 2009 07:52:13: %ASA-1-105009: (Primary) Testing on interface inside Passed
May 18 2009 07:52:13: %ASA-1-105009: (Primary) Testing on interface DMZ-NHS Passed
May 18 2009 10:25:57: %ASA-1-105005: (Primary) Lost Failover communications with mate on interface outside
May 18 2009 10:25:57: %ASA-1-105005: (Primary) Lost Failover communications with mate on interface inside
May 18 2009 10:25:57: %ASA-1-105005: (Primary) Lost Failover communications with mate on interface DMZ-NHS
May 18 2009 10:25:57: %ASA-1-105008: (Primary) Testing Interface outside
May 18 2009 10:25:57: %ASA-1-105008: (Primary) Testing Interface inside
May 18 2009 10:25:57: %ASA-1-105008: (Primary) Testing Interface DMZ-NHS
May 18 2009 10:25:57: %ASA-1-105009: (Primary) Testing on interface outside Passed
May 18 2009 10:25:57: %ASA-1-105009: (Primary) Testing on interface inside Passed
May 18 2009 10:25:57: %ASA-1-105009: (Primary) Testing on interface DMZ-NHS Passed


Are they (Active and Standby units) supposed to communicate over interfaces other than Giga0/3 (LAN Failover Interface)?

Any clues??
0
 
LVL 1

Author Comment

by:ccsenet
ID: 24411072

Cool... The problem is solved.

"no failover replication http" did it for us. The primary unit is now active. It seemed to be a problem with poll and hold timers. The short timers probably do not allow the http replication while failing over. We believe that adjusting timers might also have helped.

For the second problem posted above: "Lost Failover communications with mate on interface outside/inside/DMZ-NHS" we have found a good explanation in "Configuring Failover via Cisco ASDM" by Bob Eckhoff. Can be downloaded from:

https://cisco.hosted.jivesoftware.com/servlet/JiveServlet/download/3390-1-2874/Configuring%20Failover%20via%20ASDM_Posted_10-30-08.pdf;jsessionid=C38AF4535FE4F4FA666CDB19A6EEDDAA

Again the Poll and hold timers for monitored interfaces seems a possible solution there.


0
 
LVL 1

Author Closing Comment

by:ccsenet
ID: 31582199
"failover replication tcp" didnt work since there is no such command supported in Cisco ASA 5540. "no failover replication tcp" solved the problem.
0
 
LVL 1

Author Comment

by:ccsenet
ID: 24418593
A mistake made in the comment on lanboyo's solution. The correct statement: "no failover replication http" solved the problem.
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Tired of waiting for your show or movie to load?  Are buffering issues a constant problem with your internet connection?  Check this article out to see if these simple adjustments are the solution for you.
Getting hacked is no longer a matter or "if you get hacked" — the 2016 cyber threat landscape is now titled "when you get hacked." When it happens — will you be proactive, or reactive?
After creating this article (http://www.experts-exchange.com/articles/23699/Setup-Mikrotik-routers-with-OSPF.html), I decided to make a video (no audio) to show you how to configure the routers and run some trace routes and pings between the 7 sites…
After creating this article (http://www.experts-exchange.com/articles/23699/Setup-Mikrotik-routers-with-OSPF.html), I decided to make a video (no audio) to show you how to configure the routers and run some trace routes and pings between the 7 sites…

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now