Spanning tree issues

Hi All,

My company recently bought another company whose IT is a bit of a mess.

They've had loads of network issues, mainly broadcasts taking a while to respond (around 30 seconds).  To resolve this I replaced all their switches with 6 of these link.

All six switches are setup the same;

One core and five edge (no special settings, just how they're connected - using port 1).
default vLAN 0 is for the desktops and servers - 192.168.0.0/24
vLAN 1 for voice - 192.168.1.0/24
Port 1 - 23 can be used for both vLANs
Port 24 is setup as a trunk port and assigned for the phone.  The phone system is connected to the core, the other port 24's are empty.

I manually connected all the cables so i'm 100% sure each edge switch is only connected to the core once and is not connected to any other switch.


After I replaced the switches the spanning tree issues were minimal (i could see them in wireshark) and the network issues were resolved.

14 hours later the network issues returned and spanning tree became more prominent.

The wireshark logs show the spanning tre issues are all coming from a single MAC to a different single MAC.

I did a port scan and looked at the arp cache, but couldnt find the MAC against an IP.

The wireshark data highlights the packet as coming from an HP device and the MAC address matches a couple of the switches, but with 07 on the last two digits.

I'm guessing the issue is coming from a specific port.  Does anyone know how to get the MAC addresses assigned to each port?

Also, it looks like spanning tree kicks in after 30 seconds.  However I notice they support Rapid Spanning Tree, which kicks in after 2 seconds.  While this doesnt remove the issue, it will give me some breathing space.

I know very little about this sort of thing, so all help welcome on how to detect the port MAC addresses and enabling Rapid Spanning Tree.


Many thanks
LVL 2
detox1978Asked:
Who is Participating?
 
Don JohnstonInstructorCommented:
I don't know that I would immediately jump to a STP issue.

The BPDU's all appear to be originating from the root. They are Version 2 BPDU's (802.1w Rapid Spanning Tree).  They are being forwarded by the switch with a Bridge ID of 0x80.00-D0.7E.28.26.66.89.  The BPDU's are transmitted every 2 seconds. The cost to the root (from the capturing host) is 20.  I don't see any TC (Topology Change) BPDUs.

I don't see anything here that indicates a problem with spanning-tree.

You don't "enable" BPDUs.  BPDUs are part of the spanning-tree protocol. If you have spanning-tree running, you have BPDUs.

The originator of these BPDUs is the root bridge. I can't say for certain where the root is, but given the cost and that it's not local to the segment the packet was captured on, I would say whatever switch the one that this switch is connected to is the root.

You won't find the MAC address in any ARP cache because that MAC address is not used to source any traffic. If you do a "show interface" on the port that your wireshark PC is connected to, you will probably see that address.

One thing that is interesting is that every one is showing a bad FCS (Frame Check Sequence). But that could be a decode problem on the PC.
0
 
SouljaCommented:
Interesting, so you have a hub and spoke topology, but are experiencing spanning tree loops and topology changes?
0
 
AkinsdNetwork AdministratorCommented:
Enable loopguard or bpdu guard on your access switches.

Looks like someone is plugging a switch in their cubicle

You can also start by hardcoding the switches to access port rather that leaving them at the default of auto

eg
int range fa0/2 - 22   (assuming that's all your access ports. Make sure to exclude you uplinks
switchport mode access
0
Improve Your Query Performance Tuning

In this FREE six-day email course, you'll learn from Janis Griffin, Database Performance Evangelist. She'll teach 12 steps that you can use to optimize your queries as much as possible and see measurable results in your work. Get started today!

 
Don JohnstonInstructorCommented:
default vLAN 0 is for the desktops and servers - 192.168.0.0/24 vLAN 1 for voice - 192.168.1.0/24
Is this a typo?  There is no "VLAN 0"

In your post, you mention "spanning tree issues". What do you mean by this?

The "show interface" command will display the MAC address for a port.

Also, it looks like spanning tree kicks in after 30 seconds.  However I notice they support Rapid Spanning Tree, which kicks in after 2 seconds.
If you don't have any redundant links, this isn't going to help much. These times deal with convergence when an existing link fails and the redundant link takes over.  

It sounds like you may have a rogue switch coming online that is taking over as the root and causing the entire network to re-converge.  In the higher-end switches, a feature known as bpdu-protection is supported which disables a port if a BPDU enters a configured port.  This would be configured on ports that connect to end-stations.  I can't tell if that feature is supported on your model of switch though. But you can try it.  "spanning-tree <port-list> bpdu-protection"
0
 
detox1978Author Commented:
Thanks for the suggestions.

<Soulja>
Yes we have spanning tree issues on our network after I repatched everything.  Which suggestion either a faulty switch or a device (probably a hub/switch) connected to the LAN
</Soulja>

<Akinsd>
I enabled RSPT and BPDU and it stopped the edge switches from working, so had to remove it
</Akinsd>

<donjohnston>
yes it was a typo.  vLAN 1 and vLAN 2.

The spanning tree wireshark screen shot is attached.

The switches support BPDU, but when i enabled it (in conjunction with RSPT) the switches stopped working).

As you can see from the wireshark screenshot, everything appears to be coming from a single MAC address.  However I did a port scan and looked in my arp cache, but nothing matched the MAC address
</donjohnston>

Wireshark
0
 
detox1978Author Commented:
Here's a screenshot from inside the packet.  They are all identical.

Inside the Spanning Tree packet
0
 
detox1978Author Commented:
Just a quick update.  I enabled Spanning tree and BPDU and 3 of the switches became uncontactable.

I'm not on that site again until Monday, so will report back.

I have access to wireshark on that network.  So if you would like me to do any checks let me know.

D
0
 
AkinsdNetwork AdministratorCommented:
The uplink ports (most likely configured with portfast also) on those switches probably went into errdisable state. Bounce (shut and unshut) the port to recover it. you may want to configure errdisable recovery, if not permanently, at least during this troubleshooting phase to save you trips back and forth when the ports shutdown.

Remember to hardcode all ports that are not uplinks to access ports and turn of negotiation.

You may want to enable bpdu guard on specific ports rather than turning it on globally. When turned on globally, the feature will block every access port ot portfast enabled ports that receives bpdu.

Consider disabling portfast on your uplinks if they were configured. Portfast disables spanning tree and should only be used on uplinks if you are certain that there is no chance of loop formation.
0
 
detox1978Author Commented:
Is there anything I can do to work out what's causing the issue?

I removed all the cables and switches and put new one is, so in the server room all the switches plug into a single core switch.
0
 
Don JohnstonInstructorCommented:
Troubleshooting like this can be rather difficult. :-(

Please post the topology (showing all switches and connections between the switches) and the current configurations of the switches.
0
 
detox1978Author Commented:
On site today.  Looks like it's a TCPIP KeepAlive issue with the application.

Many thanks for your time.
0
 
night crowCommented:
Can you please elaborate on how its related to a TCPIP KeepAlive issue.

I am seeing the same problem.

I have a sniffer (cPacket) which is capturing all packets from a server and the sniffer is reporting CRC errors. When I look at the capture, I see the same FCS Bad: True errors that you displayed in your screenshot. From the bad packets in wireshark its seems like they are related to spanning tree.

How did you track them down? and have you got any further info that you can share.

Thanks
0
 
detox1978Author Commented:
Hi nc,

The local anti virus (kasperski) managed the OS firewall,because the app (lotus notes) didn't send keep alives there was a 30 delay until the firewall would re-establish the connection.

We got rid of the antivirus as it was due for renwal.


D
0
 
night crowCommented:
Hmmm, I see.

As I mentioned in my previous post:

I have a sniffer (cPacket) which is capturing all packets from a server and the sniffer is reporting CRC errors. When I look at the capture, I see the same "FCS Bad: True" errors that you displayed in your screenshot. From the bad packets in wireshark its seems like they are related to spanning tree.

However, I am almost 100% certain that this has nothing to do with an application.

Any other ideas what could be causing this?

(As as side note, I replaced the fibres to eliminate any physical issue)
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.