Link to home
Create AccountLog in
Switches / Hubs

Switches / Hubs

--

Questions

--

Followers

Top Experts

Avatar of TimotiSt
TimotiSt🇮🇪

STP topology change in exact 5 minute intervals
Hi All,

Before I start to shut down parts of the network to locate the problem, I'd like to hear your opinions on this one:

Layer2 campus network, mostly D-Link DES-3526 and other D-Link DES switches, some DGS-12xxT switches, some 3Com 4400 boxes, one core 3Com 4050.
Protocol of choice is RSTP, except for the DGS boxes, which can only run STP.
Edge ports configured, core 3Com is the root. Not a lot of redundant paths.
Network topology is pretty far from the 3-tier model, at some points diameter is around 7 L2 hops.

Problem: at every exact 5 minutes, most switches directly connected to the core (and 6-8 others further away) report a topology change. Always the same switches, pattern not changing. Some of the switches can't report STP topology changes to syslog, so they might or might not receive TCNs. All switches report the topology change for their root port.

Switches running the latest firmware, L3 routing is provided by a Linux box with Vyatta, attached vlanned to the 3Com 4050 in the core.

Based on the exact 5 minute pattern, I'm starting to think it has something to do with MAC aging, but that doesn't really make sense for STP...

Any ideas appreciated!

Tamas

Zero AI Policy

We believe in human intelligence. Our moderation policy strictly prohibits the use of LLM content in our Q&A threads.


Avatar of Don JohnstonDon Johnston🇺🇸

You need to determine which switch is issuing the initial TCN. Most switches will let you see which port the last TCN came from. Then track it back to the source switch. Could be a flaky switch or link.

Avatar of TimotiStTimotiSt🇮🇪

ASKER

Tried that, but either I'm dumb, or I can't do that with the DES-35xx series. The others have even worse management interfaces...
The syslog messages report TCNs from the root ports, so those should be what the root bridge propagates after it receives the initial TCN from the bridge reporting the change, right?
Also, the TCN traveling upstream does not contain the originating bridge identifier in RSTP, right?

Flaky switch might be it, but flaky link with errors in exact 5 minutes for months?

Avatar of Don JohnstonDon Johnston🇺🇸

Right and right. But you can (usually) see the port it came in on and keep backtracking to its source.

Make sure ALL edge ports are defined.

Reward 1Reward 2Reward 3Reward 4Reward 5Reward 6

EARN REWARDS FOR ASKING, ANSWERING, AND MORE.

Earn free swag for participating on the platform.


Avatar of TimotiStTimotiSt🇮🇪

ASKER

I think I checked all switches for edge setup (around 80), as I was thinking about a PC stuck in a reboot cycle or something, but considering the exact 5 minutes, I don't think that's likely.
I'll try to check the STP statistics again on the 3Com core in the morning...

Avatar of TimotiStTimotiSt🇮🇪

ASKER

No luck with the STP info:
Select menu option (bridge): summ


stpVersion:		2 (RSTP)	defaultPathCosts:	802.1D-1998
stpState:		enabled    	agingTime:			300

Time since topology change:		0 hrs 7 mins 43 seconds
Topology Changes:			64741
Bridge Identifier:			1000 000a0496ed40
Designated Root:			1000 000a0496ed40

maxAge:			20		bridgeMaxAge:		20
helloTime:		2		bridgeHelloTime:	2
forwardDelay:		15		bridgeFwdDelay:		15
holdTime:		1		rootCost:		0
rootPort:		No Port		priority:		4096

Open in new window


Tried setting up snmp traps and syslog, but it does not log the changes...

Avatar of Don JohnstonDon Johnston🇺🇸

Time since topology change:            0 hrs 7 mins 43 seconds

This one has gone over 5 minutes... barely.

Without a mechanism that reports where the TCNs are coming from, it's really hard to troubleshoot. Have you tried a protocol analyzer?

Free T-shirt

Get a FREE t-shirt when you ask your first question.

We believe in human intelligence. Our moderation policy strictly prohibits the use of LLM content in our Q&A threads.


Do any of your switches have different priorities for VLAN's or individual ports?  One of them may have a hard-coded value that is throwing the rest out of balance.  You may need to true-up your network diagram and identify each switch's (or at least the ones that are changing their STP settings) place in both the physical topology and STP topology.

Avatar of TimotiStTimotiSt🇮🇪

ASKER

Yeah, I was surprised by that. First time I managed to catch the timer to be over 5 minutes in months. Although it didn't get to 8 minutes...
I can wireshark, I'm just not sure where... The initial TCN is only propagated upstream on the root ports, so I'd only see it if I put a laptop with 2 bridged NICs in between the core and the problem switch.

Port priority might be a good idea. I don't exactly see the cause-effect logic to develop 5 minute problems, but I'll check anyways.

Thanks for the ideas so far!

Tamas

Avatar of Don JohnstonDon Johnston🇺🇸

Put Wireshark between the root and any non-root switch. If you don't see the TCN (coming from a non-root switch) then you know it's not coming from there. Then move to the next switch until you see it. Then work outward from there.

You're lucky that this is happening every 5 five minutes. Otherwise it could take a long time. :-o

Reward 1Reward 2Reward 3Reward 4Reward 5Reward 6

EARN REWARDS FOR ASKING, ANSWERING, AND MORE.

Earn free swag for participating on the platform.


Avatar of TimotiStTimotiSt🇮🇪

ASKER

Okay, got some nasty skeletons in the closet... :)

D-Link DGS-3324SR distribution switches (two of them), directly connected to the 3Com 4050 layer2 core.
Protocol set to RSTP on both, all vlans tagged on link. These DLinks are MSTP-capable.

Both DGS-3324SR boxes think they are the root (they have default priority, core is 4096).
Around a dozen DES-3250TG switches connected to them, happily participating in RSTP with the DGS boxes.
Double-checked, no STP protection of any kind on any ports...

I'll try to permit an untagged, empty vlan between them, see what happens...

Avatar of Don JohnstonDon Johnston🇺🇸

Protocol set to RSTP on both, all vlans tagged on link.
Typically, the native VLAN carries the BPDUs. If all VLANs are tagged, then you don't have a native VLAN.

Both DGS-3324SR boxes think they are the root
That would indicate the DLinks can't see the BPDU's from the 4050. Which is consistent with the previous point.

Avatar of TimotiStTimotiSt🇮🇪

ASKER

I agree, but: between the DGS and DES boxes, all links are also fully tagged, no native vlan, but RSTP is still working.
One guess would be that the DLinks send BPDUs both with and without tags, while the 3Com does not?

Free T-shirt

Get a FREE t-shirt when you ask your first question.

We believe in human intelligence. Our moderation policy strictly prohibits the use of LLM content in our Q&A threads.


Avatar of Don JohnstonDon Johnston🇺🇸

I agree, but: between the DGS and DES boxes, all links are also fully tagged, no native vlan, but RSTP is still working.
Sounds like DLink is doing it their own way. :-)

One guess would be that the DLinks send BPDUs both with and without tags, while the 3Com does not?
I think that DLink sends and expects to receive BPDUs on a tagged, native VLAN. Where 3Com is looking for BPDUs on the untagged, native VLAN.

Avatar of TimotiStTimotiSt🇮🇪

ASKER

Update:
Enabled untagged vlan1 on ports between core 3Com and DLink DGS boxes: RSTP up and running between them.
Topology changes still persist... :(

ASKER CERTIFIED SOLUTION
Avatar of Don JohnstonDon Johnston🇺🇸

Link to home
membership
Log in or create a free account to see answer.
Signing up is free and takes 30 seconds. No credit card required.
Create Account

SOLUTION
Link to home
membership
Log in or create a free account to see answer.
Signing up is free and takes 30 seconds. No credit card required.

Avatar of TimotiStTimotiSt🇮🇪

ASKER

Had a planned downtime last weekend, so I started unplugging the fibers... :)

Tracked it down to a D-Link DGS-1224 rev.C switch; which are known to be barely-manageable and fairly unstable. You can't even define edge ports on them...

Disabled STP temporarily (I know, I know...), budgeted for a replace, the topology is happy once again after 2+ years of constant change...

Reward 1Reward 2Reward 3Reward 4Reward 5Reward 6

EARN REWARDS FOR ASKING, ANSWERING, AND MORE.

Earn free swag for participating on the platform.

Switches / Hubs

Switches / Hubs

--

Questions

--

Followers

Top Experts

A switch is a device that filters and forwards packets of data between LAN segments. Switches operate at the data link layer or the network layer of the Open Systems Interconnection (OSI) Reference Model and therefore support any packet protocol. LANs that use switches to join segments are called switched LANs or, in the case of Ethernet networks, switched Ethernet LANs. A hub is a connection point for devices in a network. Hubs are commonly used to connect segments of a LAN. A hub contains multiple ports; when a packet arrives at one port, it is copied to the other ports so that all segments of the LAN can see all packets.