Solved

Cisco Switching - Issues with distribution switches during a core switch outage

Posted on 2013-06-20
8
547 Views
Last Modified: 2013-08-23
I have (2) Cisco 3750 switches in a stack configuration functioning as a CORE switch.  The configuration of these switches have about 10 VLAN's on them, and layer 3 routing is enabled with the switches functioning as the default gateway for devices on all VLANs.

I have about 18 switches uplinking to these core switches (Cisco 3560G switches) using a 2 port etherchannel link for each uplink with 1 port going to 1 switch in the stack, and the other port going to the 2nd switch in the stack.  This was to maximize redundancy should one of the core switches in the stack go down.

The core is setup as a VTP server, with all other switches as VTP clients.  VTP Pruning is ON.  VTP version is (1).  

I recently had an issue where one of the switches in the core stack completely went down (NO POWER).  When this happened, everything should have gone from (2) links in the etherchannel group to (1) and everything should have continued.  This DID happen, but when everything came back up, some switches did not negotiate their etherchannel link correctly and all connectivity was lost until we removed (1) of the (2) etherchannel uplinks or rebooted the switch (the distribution switch, not the core).

What could the issue be here?  I'm able to reproduce this as well if we intentionally take on of the core stacked switches offline.

What does everyone think?
0
Comment
Question by:jkeegan123
  • 3
  • 2
  • 2
  • +1
8 Comments
 
LVL 50

Expert Comment

by:Don Johnston
ID: 39264754
Sure would be nice to see the config of the 3750.

Absent that, were you using LACP or PAgP? If not, I've seen manual etherchannel do this type of thing on failure recovery.
0
 
LVL 8

Expert Comment

by:TMekeel
ID: 39264767
LACP is available on stacked switches, while PAgP is not according to here:
http://www.cisco.com/en/US/products/hw/switches/ps5023/products_configuration_example09186a00806cb982.shtml

I was thinking maybe it has something to do with spanning tree?
Maybe it blocks the switch that comes back online?
0
 
LVL 7

Expert Comment

by:avcontrol
ID: 39266338
Assuming design was done correctly, first thing I would think of, that etherchannels would not come up.
This you would see once in a while when adding /deleting links from a etherchannel bundle, due to algorithm complexity.
Ussial way "fix", is delete and readd etherchannel configs.
0
 
LVL 5

Author Comment

by:jkeegan123
ID: 39266364
@TMekeel:  I suspect that this is the case as well, and all evidence points to this being an issue with SPANNING TREE after the etherchannel bundle / group comes up.  That and the fact that all ports are set as SPANNING TREE PORTFAST.  What I think is happening is:  The stack switch recovers, the link comes up, there's a spanning tree loop that does not get blocked because PORTFAST is on, and the broadcast consumes so much bandwidth / process power that the etherchannel bundle cannot communicate with the core switch to come online.  I was going to change the etherchannel group to not load balance but to just be fault tolerant instead, but I went with enabling Spanning Tree on the etherchannel interfaces instead and am waiting for a maintenance window to test.  The issue is reproducable every time.

@avcontrol:  The only thing that did bring the units back online besides rebooting the distribution switch was deleting the port-channel group and re-adding it on other ports or on the same ports.
0
Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

 
LVL 50

Expert Comment

by:Don Johnston
ID: 39266409
Do NOT use portfast on inter-switch links.

From the Cisco Command Reference:

Use this feature only on interfaces that connect to end stations; otherwise, an accidental topology loop could cause a data packet loop and disrupt switch and network operation.

Once again, are you creating the ehterchannel bundle manually or using LACP?

With etherchannel configured on an interface, that physical interface does not participate in the STP.  

The symptoms you are describing would indicate a failure in the etherchannel functionality.

And it would be very helpful to see the configs of the switches in question.
0
 
LVL 5

Author Comment

by:jkeegan123
ID: 39266563
@DonJohnston:  Manual creation, I guess...we have created the Port-Channel interface and set it appropriately for VLANs (trunk, encapsulation, etc...) and then we add the physical interfaces to the port-channel group with "channel-group 1 mode on".  Beyond that, the local IT Staff has set all ports to port-fast for workstations/phones because of impatience.  For a long time that was fine, but now that they have VoIP phones and shared media (PC's into the phones switchport) there is more of a chance for loops so we were going to change everything to allow STP to prevent loops.  I wasn't aware that an interface does not participate in STP when you put it in a port-channel group, but after debugging we definitely see that when the issue is happening, the console is reporting the message:

SW_MATM-4-MACFLAP_NOTIF - VLAN 100 is flapping between interfaces G1/0/48 and G1/0/49.

Seems like a loop to me...what's your take?  The config doesn't get more complex than this.
0
 
LVL 7

Expert Comment

by:avcontrol
ID: 39266667
are you seing the vlan "flapping" in between two ports or are you seeing a mac address in vlan 1 flapping between two ports? you are most likely seeing the same mac address flapping between two ports. this could be due to the switch seeing the same mac address being learned on two diffeernt ports.....could be a loop somewhere....double check your physical connections. use the show mac add *mac add* *mac add* enter the mac add twice.


http://www.itcertnotes.com/2011/05/l2-bridging-loop-due-to-etherchannel.html
0
 
LVL 50

Accepted Solution

by:
Don Johnston earned 500 total points
ID: 39267022
Manual creation, I guess...we have created the Port-Channel interface and set it appropriately for VLANs (trunk, encapsulation, etc...) and then we add the physical interfaces to the port-channel group with "channel-group 1 mode on".  
Use LACP to create the channel groups.  

"channel-group 1 mode active"

You will need to apply this command to all interfaces which are participating in the specified port channel.

See if that resolves the issue.

For a long time that was fine, but now that they have VoIP phones and shared media (PC's into the phones switchport) there is more of a chance for loops so we were going to change everything to allow STP to prevent loops.
You already have spanning-tree to prevent loops. It's on by default.

SW_MATM-4-MACFLAP_NOTIF - VLAN 100 is flapping between interfaces G1/0/48 and G1/0/49.

Seems like a loop to me...what's your take?  The config doesn't get more complex than this.


Is this on the 3750 stack or on one of the 3560's? Once again, we still haven't seen the config. You stated earlier that the two links are terminating on different switches in the stack. So I'm guessing this is from one of the 3560's. Have you issued a "show ether sum" when the problem arises? How about a "show span vlan 100"?
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Suggested Solutions

Data center, now-a-days, is referred as the home of all the advanced technologies. In-fact, most of the businesses are now establishing their entire organizational structure around the IT capabilities.
If you're not part of the solution, you're part of the problem.   Tips on how to secure IoT devices, even the dumbest ones, so they can't be used as part of a DDoS botnet.  Use PRTG Network Monitor as one of the building blocks, to detect unusual…
Here's a very brief overview of the methods PRTG Network Monitor (https://www.paessler.com/prtg) offers for monitoring bandwidth, to help you decide which methods you´d like to investigate in more detail.  The methods are covered in more detail in o…
In this tutorial you'll learn about bandwidth monitoring with flows and packet sniffing with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're interested in additional methods for monitoring bandwidt…

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now