Solved

Cisco Switching - Issues with distribution switches during a core switch outage

Posted on 2013-06-20
8
584 Views
Last Modified: 2013-08-23
I have (2) Cisco 3750 switches in a stack configuration functioning as a CORE switch.  The configuration of these switches have about 10 VLAN's on them, and layer 3 routing is enabled with the switches functioning as the default gateway for devices on all VLANs.

I have about 18 switches uplinking to these core switches (Cisco 3560G switches) using a 2 port etherchannel link for each uplink with 1 port going to 1 switch in the stack, and the other port going to the 2nd switch in the stack.  This was to maximize redundancy should one of the core switches in the stack go down.

The core is setup as a VTP server, with all other switches as VTP clients.  VTP Pruning is ON.  VTP version is (1).  

I recently had an issue where one of the switches in the core stack completely went down (NO POWER).  When this happened, everything should have gone from (2) links in the etherchannel group to (1) and everything should have continued.  This DID happen, but when everything came back up, some switches did not negotiate their etherchannel link correctly and all connectivity was lost until we removed (1) of the (2) etherchannel uplinks or rebooted the switch (the distribution switch, not the core).

What could the issue be here?  I'm able to reproduce this as well if we intentionally take on of the core stacked switches offline.

What does everyone think?
0
Comment
Question by:jkeegan123
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
  • 2
  • +1
8 Comments
 
LVL 50

Expert Comment

by:Don Johnston
ID: 39264754
Sure would be nice to see the config of the 3750.

Absent that, were you using LACP or PAgP? If not, I've seen manual etherchannel do this type of thing on failure recovery.
0
 
LVL 8

Expert Comment

by:TMekeel
ID: 39264767
LACP is available on stacked switches, while PAgP is not according to here:
http://www.cisco.com/en/US/products/hw/switches/ps5023/products_configuration_example09186a00806cb982.shtml

I was thinking maybe it has something to do with spanning tree?
Maybe it blocks the switch that comes back online?
0
 
LVL 7

Expert Comment

by:avcontrol
ID: 39266338
Assuming design was done correctly, first thing I would think of, that etherchannels would not come up.
This you would see once in a while when adding /deleting links from a etherchannel bundle, due to algorithm complexity.
Ussial way "fix", is delete and readd etherchannel configs.
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 5

Author Comment

by:jkeegan123
ID: 39266364
@TMekeel:  I suspect that this is the case as well, and all evidence points to this being an issue with SPANNING TREE after the etherchannel bundle / group comes up.  That and the fact that all ports are set as SPANNING TREE PORTFAST.  What I think is happening is:  The stack switch recovers, the link comes up, there's a spanning tree loop that does not get blocked because PORTFAST is on, and the broadcast consumes so much bandwidth / process power that the etherchannel bundle cannot communicate with the core switch to come online.  I was going to change the etherchannel group to not load balance but to just be fault tolerant instead, but I went with enabling Spanning Tree on the etherchannel interfaces instead and am waiting for a maintenance window to test.  The issue is reproducable every time.

@avcontrol:  The only thing that did bring the units back online besides rebooting the distribution switch was deleting the port-channel group and re-adding it on other ports or on the same ports.
0
 
LVL 50

Expert Comment

by:Don Johnston
ID: 39266409
Do NOT use portfast on inter-switch links.

From the Cisco Command Reference:

Use this feature only on interfaces that connect to end stations; otherwise, an accidental topology loop could cause a data packet loop and disrupt switch and network operation.

Once again, are you creating the ehterchannel bundle manually or using LACP?

With etherchannel configured on an interface, that physical interface does not participate in the STP.  

The symptoms you are describing would indicate a failure in the etherchannel functionality.

And it would be very helpful to see the configs of the switches in question.
0
 
LVL 5

Author Comment

by:jkeegan123
ID: 39266563
@DonJohnston:  Manual creation, I guess...we have created the Port-Channel interface and set it appropriately for VLANs (trunk, encapsulation, etc...) and then we add the physical interfaces to the port-channel group with "channel-group 1 mode on".  Beyond that, the local IT Staff has set all ports to port-fast for workstations/phones because of impatience.  For a long time that was fine, but now that they have VoIP phones and shared media (PC's into the phones switchport) there is more of a chance for loops so we were going to change everything to allow STP to prevent loops.  I wasn't aware that an interface does not participate in STP when you put it in a port-channel group, but after debugging we definitely see that when the issue is happening, the console is reporting the message:

SW_MATM-4-MACFLAP_NOTIF - VLAN 100 is flapping between interfaces G1/0/48 and G1/0/49.

Seems like a loop to me...what's your take?  The config doesn't get more complex than this.
0
 
LVL 7

Expert Comment

by:avcontrol
ID: 39266667
are you seing the vlan "flapping" in between two ports or are you seeing a mac address in vlan 1 flapping between two ports? you are most likely seeing the same mac address flapping between two ports. this could be due to the switch seeing the same mac address being learned on two diffeernt ports.....could be a loop somewhere....double check your physical connections. use the show mac add *mac add* *mac add* enter the mac add twice.


http://www.itcertnotes.com/2011/05/l2-bridging-loop-due-to-etherchannel.html
0
 
LVL 50

Accepted Solution

by:
Don Johnston earned 500 total points
ID: 39267022
Manual creation, I guess...we have created the Port-Channel interface and set it appropriately for VLANs (trunk, encapsulation, etc...) and then we add the physical interfaces to the port-channel group with "channel-group 1 mode on".  
Use LACP to create the channel groups.  

"channel-group 1 mode active"

You will need to apply this command to all interfaces which are participating in the specified port channel.

See if that resolves the issue.

For a long time that was fine, but now that they have VoIP phones and shared media (PC's into the phones switchport) there is more of a chance for loops so we were going to change everything to allow STP to prevent loops.
You already have spanning-tree to prevent loops. It's on by default.

SW_MATM-4-MACFLAP_NOTIF - VLAN 100 is flapping between interfaces G1/0/48 and G1/0/49.

Seems like a loop to me...what's your take?  The config doesn't get more complex than this.


Is this on the 3750 stack or on one of the 3560's? Once again, we still haven't seen the config. You stated earlier that the two links are terminating on different switches in the stack. So I'm guessing this is from one of the 3560's. Have you issued a "show ether sum" when the problem arises? How about a "show span vlan 100"?
0

Featured Post

Secure Your WordPress Site: 5 Essential Approaches

WordPress is the web's most popular CMS, but its dominance also makes it a target for attackers. Our eBook will show you how to:

Prevent costly exploits of core and plugin vulnerabilities
Repel automated attacks
Lock down your dashboard, secure your code, and protect your users

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

#Citrix #Citrix Netscaler #HTTP Compression #Load Balance
I had an issue with InstallShield not being able to use Computer Browser service on Windows Server 2012. Here is the solution I found.
Viewers will learn how to connect to a wireless network using the network security key. They will also learn how to access the IP address and DNS server for connections that must be done manually. After setting up a router, find the network security…
After creating this article (http://www.experts-exchange.com/articles/23699/Setup-Mikrotik-routers-with-OSPF.html), I decided to make a video (no audio) to show you how to configure the routers and run some trace routes and pings between the 7 sites…

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question