Cisco 3750 Access Layer Switch Clusters Experience Link Flapping

We have recently upgraded our infrastructure to Cisco 6509's (Two) and Cisco 3750 switch clusters in all closets.  After taking a call where users complained about loosing connectivity briefly, I began looking at the switches and discovered all access layer switches were experiencing link flapping on what appears to be random interfaces at various times of the day.  One day a switch will have problems, then may be fine for 3 days in row.  The 6509s do not show any link problems, but all of the access layer switch clusters appear to be having problems.  Most of the time it's so fast, users don't notice it.  Any ideas as to what can cause this bizarre behavior or suggestions on how to trouble shoot would be greatly appreciated.  
Thanks.
The logs look like this...

016530: Feb  9 07:18:00.383 EST: %LINK-3-UPDOWN: Interface GigabitEthernet1/0/6, changed state to down
016531: Feb  9 07:18:03.067 EST: %LINK-3-UPDOWN: Interface GigabitEthernet1/0/6, changed state to up
016532: Feb  9 07:18:04.074 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/6, changed state to
up
016533: Feb  9 07:18:18.636 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/6, changed state to
down
016534: Feb  9 07:18:19.635 EST: %LINK-3-UPDOWN: Interface GigabitEthernet1/0/6, changed state to down
016535: Feb  9 07:18:22.151 EST: %LINK-3-UPDOWN: Interface GigabitEthernet1/0/6, changed state to up
016536: Feb  9 07:18:23.158 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/6, changed state to
up
016537: Feb  9 07:58:07.241 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet2/0/29, changed state to do
wn
016538: Feb  9 07:58:08.247 EST: %LINK-3-UPDOWN: Interface FastEthernet2/0/29, changed state to down
016539: Feb  9 07:58:12.970 EST: %LINK-3-UPDOWN: Interface FastEthernet2/0/29, changed state to up
016540: Feb  9 07:58:13.977 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet2/0/29, changed state to up
016541: Feb  9 07:58:21.828 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet2/0/29, changed state to do
wn
016542: Feb  9 07:58:22.827 EST: %LINK-3-UPDOWN: Interface FastEthernet2/0/29, changed state to down
016543: Feb  9 07:58:24.915 EST: %LINK-3-UPDOWN: Interface FastEthernet2/0/29, changed state to up
016544: Feb  9 07:58:25.922 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet2/0/29, changed state to up
016545: Feb  9 08:57:16.630 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet3/0/12, changed state to do
wn
016546: Feb  9 08:57:18.635 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet3/0/12, changed state to up
016547: Feb  9 08:57:21.127 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet3/0/12, changed state to do
wn
016548: Feb  9 08:57:22.133 EST: %LINK-3-UPDOWN: Interface FastEthernet3/0/12, changed state to down
016549: Feb  9 08:57:29.423 EST: %LINK-3-UPDOWN: Interface FastEthernet3/0/12, changed state to up
016550: Feb  9 08:57:30.430 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet3/0/12, changed state to up
016551: Feb  9 08:58:16.298 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet3/0/12, changed state to do
wn
016552: Feb  9 08:58:17.297 EST: %LINK-3-UPDOWN: Interface FastEthernet3/0/12, changed state to down
016553: Feb  9 08:58:19.310 EST: %LINK-3-UPDOWN: Interface FastEthernet3/0/12, changed state to up
016554: Feb  9 08:58:20.317 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet3/0/12, changed state to up
MongrulAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

td_milesCommented:
I would check to make sure that the speed & duplex are set manually on all of the interfaces that connect the switches together.

Also check you STP setup, make sure something isn't flapping there.

You may need to turn on some more debugging and log it to a syslog host so that you can trawl through the logs from when it happens to see if there are any other events at that time.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
MongrulAuthor Commented:
Thanks for your reply, I ran debug spanning-tree bpdu, and got the following result shown below.  Is it normal for there to be this much activity going on? And, what does "STP(22) port Po1 supersedes 0" mean?

Thanks again.

017879: Feb 16 17:04:39.458 EST: STP(12) port Po2 supersedes 0
017880: Feb 16 17:04:39.466 EST: STP: VLAN0012 Gi1/0/13 tx BPDU: config protocol=ieee
    Data : 0000 00 00 00 400C0011BC1A8C00 00000003 F00C0014F20EDD80 800D 0100 1400 0200 0F00
017881: Feb 16 17:04:39.466 EST: STP: VLAN0012 Gi1/0/20 tx BPDU: config protocol=ieee
    Data : 0000 00 00 00 400C0011BC1A8C00 00000003 F00C0014F20EDD80 8014 0100 1400 0200 0F00
017882: Feb 16 17:04:39.466 EST: STP: VLAN0012 Gi1/0/22 tx BPDU: config protocol=ieee
    Data : 0000 00 00 00 400C0011BC1A8C00 00000003 F00C0014F20EDD80 8016 0100 1400 0200 0F00
017883: Feb 16 17:04:39.466 EST: STP: VLAN0012 Gi1/0/24 tx BPDU: config protocol=ieee
    Data : 0000 00 00 00 400C0011BC1A8C0
sw-cmh-idf-2a#0 00000003 F00C0014F20EDD80 8018 0100 1400 0200 0F00
017884: Feb 16 17:04:39.466 EST: STP: VLAN0012 St1 tx BPDU: config protocol=ieee
    Data : 0000 00 00 00 400C0011BC1A8C00 00000003 F00C0014F20EDD80 83E8 0100 1400 0200 0F00
017885: Feb 16 17:04:39.466 EST: STP: VLAN0022 rx BPDU: config protocol = ieee, packet from Port-channel1  , linktype SSTP
, enctype 3, encsize 22
017886: Feb 16 17:04:39.466 EST: STP: enc 01 00 0C CC CC CD 00 14 A8 30 E4 AB 00 32 AA AA 03 00 00 0C 01 0B
017887: Feb 16 17:04:39.466 EST: STP: Data     000000000040160011BC1A8C000000000380160011BC1AA40086830100140002000F00
017888: Feb 16 17:04:39.466 EST: STP: VLAN0022 Po1:0000 00 00 00 40160011BC1A8C00 00000003 80160011BC1AA400 8683 0100 1400
0200 0F00
017889: Feb 16 17:04:39.466 EST: STP(22) port Po1 supersedes 0
017890: Feb 16 17:04:39.466 EST: STP: VLAN0012 rx BPDU: config protocol = ieee, packet from Port-channel1  , linktype SSTP
, enctype 3, encsize 22
017891: Feb 16 17:04:39.466 EST: STP: enc 01 00 0C CC CC CD 00 14 A8 30 E4 AB 00 32 AA AA 03 00 00 0C 01 0B
017892: Feb 16 17:04:39.475 EST: STP: Data     0000000000400C0011BC1A8C0000000003800C0011BC1AA40086830100140002000F00
017893: Feb 16 17:04:39.475 EST: STP: VLAN0012 Po1:0000 00 00 00 400C0011BC1A8C00 00000003 800C0011BC1AA400 8683 0100 1400
0200 0F00
017894: Feb 16 17:04:39.475 EST: STP(12) port Po1 supersedes 0
017895: Feb 16 17:04:40.599 EST: STP: VLAN0201 rx BPDU: config protocol = ieee, packet from Port-channel2  , linktype SSTP
, enctype 3, encsize 22
017896: Feb 16 17:04:40.599 EST: STP: enc 01 00 0C CC CC CD 00 14 F2 15 86 9B 00 32 AA AA 03 00 00 0C 01 0B
017897: Feb 16 17:04:40.599 EST: STP: Data     000000000040C90011BC1A8C000000000040C90011BC1A8C0086830000140002000F00
017898: Feb 16 17:04:40.599 EST: STP: VLAN0201 Po2:0000 00 00 00 40C90011BC1A8C00 00000000 40C90011BC1A8C00 8683 0000 1400
0200 0F00
017899: Feb 16 17:04:40.599 EST: STP(201) port Po2 supersedes 0
017900: Feb 16 17:04:40.599 EST: STP: VLAN0201 St1 tx BPDU: config protocol=ieee
    Data : 0000 00 00 00 40C9
sw-cmh-idf-2a#0011BC1A8C00 00000003 F0C90014F20EDD80 83E8 0100 1400 0200 0F00
017901: Feb 16 17:04:40.607 EST: STP: VLAN0201 rx BPDU: config protocol = ieee, packet from Port-channel1  , linktype SSTP
, enctype 3, encsize 22
017902: Feb 16 17:04:40.607 EST: STP: enc 01 00 0C CC CC CD 00 14 A8 30 E4 AB 00 32 AA AA 03 00 00 0C 01 0B
017903: Feb 16 17:04:40.607 EST: STP: Data     000000000040C90011BC1A8C000000000380C90011BC1AA40086830100140002000F00
017904: Feb 16 17:04:40.607 EST: STP: VLAN0201 Po1:0000 00 00 00 40C90011BC1A8C00 00000003 80C90011BC1AA400 8683 0100 1400
0200 0F00
017905: Feb 16 17:04:40.607 EST: STP(201) port Po1 supersedes 0
017906: Feb 16 17:04:41.454 EST: STP: VLAN0022 rx BPDU: config protocol = ieee, packet from Port-channel2  , linktype SSTP
, enctype 3, encsize 22
017907: Feb 16 17:04:41.454 EST: STP: enc 01 00 0C CC CC CD 00 14 F2 15 86 9B 00 32 AA AA 03 00 00 0C 01 0B
017908: Feb 16 17:04:41.454 EST: STP: Data     000000000040160011BC1A8C000000000040160011BC1A8C0086830000140002000F00
017909: Feb 16 17:04:41.454 EST: STP: VLAN0022 Po2:0000 00 00 00 40160011BC1A8C00 00000000 40160011BC1A8C00 8683 0000 1400
0200 0F00
017910: Feb 16 17:04:41.454 EST: STP(22) port Po2 supersedes 0
017911: Feb 16 17:04:41.454 EST: STP: VLAN0022 Gi1/0/1 tx BPDU: config protocol=ieee
    Data : 0000 00 00 00 40160011BC1A8C00 00000003 F0160014F20EDD80 8001 0100 1400 0200 0F00
017912: Feb 16 17:04:41.454 EST: STP: VLAN0022 Gi1/0/2 tx BPDU: config protocol=ieee
    Data : 0000 00 00 00 40160011BC1A8C00 00000003 F0160014F20EDD80 8002 0100 1400 0200 0F00
017913: Feb 16 17:04:41.454 EST: STP: VLAN0022 Gi1/0/6 tx BPDU: config protocol=ieee
    Data : 0000 00 00 00 40160011BC1A8C00 00000003 F0160014F20EDD80 8006 0100 1400 0200 0F00
017914: Feb 16 17:04:41.454 EST: STP: VLAN0022 Gi1/0/9 tx BPDU: config protocol=ieee
    Data : 0000 00 00 00 40160011BC1A8C00 00000003 F0160014F20EDD80 8009 0100 1400 0200 0F00
017915: Feb 16 17:04:41.463 EST: STP: VLAN0022 St1 tx BPDU: config protocol=ieee
    Data : 0000 00 00 00 40160011BC1A8C00 00000003 F0160014F20EDD80 83E8 0100 1400 0200 0F00
017916: Feb 16 17:04:41.463 EST: STP: VLAN0012 rx BPDU: config protocol = ieee, packet from Port-channel2  , linktype SSTP
, enctype 3, encsize 22
MongrulAuthor Commented:
Ok folks, I just upped the points.  Please explain what the above debug means.  I ran debug spanning-tree then each individual command.  Only debug spann bpdu returned results.
Microsoft Azure 2017

Azure has a changed a lot since it was originally introduce by adding new services and features. Do you know everything you need to about Azure? This course will teach you about the Azure App Service, monitoring and application insights, DevOps, and Team Services.

td_milesCommented:
I am terribly sorry, I just haven't had time over the last week. I will post a pointer question to get some of the other experts around to take a look.
pedrowCommented:
Are these flapping interfaces on the 3750's homerun interfaces that users are connecting into, or are they also trunk ports back to the 6509?

debug span bpdu will spew tons of stuff if you've got stp running in your topology. Most of it is just informational stuff from other switches. You'd probably see blocking/forwarding messages if it was stp at play here.

So....new infrastructure including switches. What else was new? was there new wiring? were there any issues with the old switches? Just curious if we've got a magnetic interference issue here. Did it ever work 'as advertised'?

Also, how many closets? how many access switches? Just trying to get a handle if it's something like five different closets, 20 switches on three floors or if it's one switch/closet, etc... How long are the homeruns?





 

MongrulAuthor Commented:
The interfaces are homerun to users...not trunk ports. (as far as I can tell so far...I'll go look again)

There was new fiber and copper wiring put in as part of the network upgrade.  As far as I know, this problem has been around since the upgrade.  I was not here at the time.

There are 11 different closets holding a total of 19 3750's.  There are 5 clusters of two or three switches each.  All of the 3750's, individuals and clusters, are having this same problem.  There is no distribution layer.  All of the switches/clusters have fiber runs to two 6509's.  The 6509's do not show any sign of problems like those shown above.  

Since the debug results are to be expected for bpdu, what should I try next?

Thanks all.

pedrowCommented:
as a test, try and see if putting a small, dumb intermediate switch between one of the 3750's and the homerun to the user.

Trying to separate if the problem lies between the 3750 and the dumb switch or the dumb switch and the homerun.

I'm thinking that you've got bad homeruns :(

Or also, try taking a host and plugging it directly into one of the 3750's and see if the host locally plugged into the switch drops as well. Make sense? Trying to eliminate different parts of the topology to see if it changes behavior.

MongrulAuthor Commented:
None of the ports that dropped are trunks.  I've selected a few switches and nailed up the ports that were dropping.  I happened to find one that had just dropped while I was on the switch.  After nailing up the port, it stopped.  The hospital recently rolled out new IBM PC's, I wonder if the NIC's are flakey?
td_milesCommented:
Could be the NIC's. I remember about 6 or 7 years ago we got a bunch of IBM PCs and some of them had duplicate MAC addresses on the NICs, which as you can rightly imagine caused all sorts of strange issues.

I'm not saying IBM are bad, I work for an IBM Business Partner and think that IBM gear is some of the best out there, but everyone has manufacturing glitches now and then.
MongrulAuthor Commented:
I checked on the switches I made changes on this morning....some of the interfaces I nailed up are still dropping.  Apparently  that is not going to solve the problem.  Back to the drawing board.  
pedrowCommented:
sorry...don't really know what you mean by 'nailed up'

if you have a laptop of something, what happens if you plug directly into one of the ports that you've found to drop you? Does it happen if you plug locally into the switch?

again, I'm thinking that there might be a problem with the new copper cabling, and plugging directly into the switches would eliminate the home-runs from the test path.
td_milesCommented:
pedrow, if you put a comment in the pointer question I posted, I can closed it and give you the points, thanks.
pedrowCommented:
done :)
Thanks!
MongrulAuthor Commented:
Nailed up refers to manually setting duplex and speed.  There's no way to know when or on what switch the ports will start bouncing.  It may happen a couple of times one day, then be ok for a day or two.  Then it will switch to another port.  It does not happen on trunk links, only local ports.  
td_milesCommented:
You've accepted my first answer, does that mean that it was the speed/duplex settings causing issues ?

I just want to make sure that your problem is resolved is all. Thanks.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Storage Software

From novice to tech pro — start learning today.