Solved

Spanning tree issues

Posted on 2013-10-23
14
1,799 Views
Last Modified: 2016-09-13
Hi All,

My company recently bought another company whose IT is a bit of a mess.

They've had loads of network issues, mainly broadcasts taking a while to respond (around 30 seconds).  To resolve this I replaced all their switches with 6 of these link.

All six switches are setup the same;

One core and five edge (no special settings, just how they're connected - using port 1).
default vLAN 0 is for the desktops and servers - 192.168.0.0/24
vLAN 1 for voice - 192.168.1.0/24
Port 1 - 23 can be used for both vLANs
Port 24 is setup as a trunk port and assigned for the phone.  The phone system is connected to the core, the other port 24's are empty.

I manually connected all the cables so i'm 100% sure each edge switch is only connected to the core once and is not connected to any other switch.


After I replaced the switches the spanning tree issues were minimal (i could see them in wireshark) and the network issues were resolved.

14 hours later the network issues returned and spanning tree became more prominent.

The wireshark logs show the spanning tre issues are all coming from a single MAC to a different single MAC.

I did a port scan and looked at the arp cache, but couldnt find the MAC against an IP.

The wireshark data highlights the packet as coming from an HP device and the MAC address matches a couple of the switches, but with 07 on the last two digits.

I'm guessing the issue is coming from a specific port.  Does anyone know how to get the MAC addresses assigned to each port?

Also, it looks like spanning tree kicks in after 30 seconds.  However I notice they support Rapid Spanning Tree, which kicks in after 2 seconds.  While this doesnt remove the issue, it will give me some breathing space.

I know very little about this sort of thing, so all help welcome on how to detect the port MAC addresses and enabling Rapid Spanning Tree.


Many thanks
0
Comment
Question by:detox1978
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 3
  • 2
  • +2
14 Comments
 
LVL 26

Assisted Solution

by:Soulja
Soulja earned 50 total points
ID: 39595855
Interesting, so you have a hub and spoke topology, but are experiencing spanning tree loops and topology changes?
0
 
LVL 18

Assisted Solution

by:Akinsd
Akinsd earned 225 total points
ID: 39595874
Enable loopguard or bpdu guard on your access switches.

Looks like someone is plugging a switch in their cubicle

You can also start by hardcoding the switches to access port rather that leaving them at the default of auto

eg
int range fa0/2 - 22   (assuming that's all your access ports. Make sure to exclude you uplinks
switchport mode access
0
 
LVL 50

Assisted Solution

by:Don Johnston
Don Johnston earned 225 total points
ID: 39596716
default vLAN 0 is for the desktops and servers - 192.168.0.0/24 vLAN 1 for voice - 192.168.1.0/24
Is this a typo?  There is no "VLAN 0"

In your post, you mention "spanning tree issues". What do you mean by this?

The "show interface" command will display the MAC address for a port.

Also, it looks like spanning tree kicks in after 30 seconds.  However I notice they support Rapid Spanning Tree, which kicks in after 2 seconds.
If you don't have any redundant links, this isn't going to help much. These times deal with convergence when an existing link fails and the redundant link takes over.  

It sounds like you may have a rogue switch coming online that is taking over as the root and causing the entire network to re-converge.  In the higher-end switches, a feature known as bpdu-protection is supported which disables a port if a BPDU enters a configured port.  This would be configured on ports that connect to end-stations.  I can't tell if that feature is supported on your model of switch though. But you can try it.  "spanning-tree <port-list> bpdu-protection"
0
Business Impact of IT Communications

What are the business impacts of how well businesses communicate during an IT incident? Targeting, speed, and transparency all matter. Find out more in this infographic.

 
LVL 2

Author Comment

by:detox1978
ID: 39597726
Thanks for the suggestions.

<Soulja>
Yes we have spanning tree issues on our network after I repatched everything.  Which suggestion either a faulty switch or a device (probably a hub/switch) connected to the LAN
</Soulja>

<Akinsd>
I enabled RSPT and BPDU and it stopped the edge switches from working, so had to remove it
</Akinsd>

<donjohnston>
yes it was a typo.  vLAN 1 and vLAN 2.

The spanning tree wireshark screen shot is attached.

The switches support BPDU, but when i enabled it (in conjunction with RSPT) the switches stopped working).

As you can see from the wireshark screenshot, everything appears to be coming from a single MAC address.  However I did a port scan and looked in my arp cache, but nothing matched the MAC address
</donjohnston>

Wireshark
0
 
LVL 2

Author Comment

by:detox1978
ID: 39597867
Here's a screenshot from inside the packet.  They are all identical.

Inside the Spanning Tree packet
0
 
LVL 50

Accepted Solution

by:
Don Johnston earned 225 total points
ID: 39597967
I don't know that I would immediately jump to a STP issue.

The BPDU's all appear to be originating from the root. They are Version 2 BPDU's (802.1w Rapid Spanning Tree).  They are being forwarded by the switch with a Bridge ID of 0x80.00-D0.7E.28.26.66.89.  The BPDU's are transmitted every 2 seconds. The cost to the root (from the capturing host) is 20.  I don't see any TC (Topology Change) BPDUs.

I don't see anything here that indicates a problem with spanning-tree.

You don't "enable" BPDUs.  BPDUs are part of the spanning-tree protocol. If you have spanning-tree running, you have BPDUs.

The originator of these BPDUs is the root bridge. I can't say for certain where the root is, but given the cost and that it's not local to the segment the packet was captured on, I would say whatever switch the one that this switch is connected to is the root.

You won't find the MAC address in any ARP cache because that MAC address is not used to source any traffic. If you do a "show interface" on the port that your wireshark PC is connected to, you will probably see that address.

One thing that is interesting is that every one is showing a bad FCS (Frame Check Sequence). But that could be a decode problem on the PC.
0
 
LVL 2

Author Comment

by:detox1978
ID: 39628707
Just a quick update.  I enabled Spanning tree and BPDU and 3 of the switches became uncontactable.

I'm not on that site again until Monday, so will report back.

I have access to wireshark on that network.  So if you would like me to do any checks let me know.

D
0
 
LVL 18

Assisted Solution

by:Akinsd
Akinsd earned 225 total points
ID: 39630326
The uplink ports (most likely configured with portfast also) on those switches probably went into errdisable state. Bounce (shut and unshut) the port to recover it. you may want to configure errdisable recovery, if not permanently, at least during this troubleshooting phase to save you trips back and forth when the ports shutdown.

Remember to hardcode all ports that are not uplinks to access ports and turn of negotiation.

You may want to enable bpdu guard on specific ports rather than turning it on globally. When turned on globally, the feature will block every access port ot portfast enabled ports that receives bpdu.

Consider disabling portfast on your uplinks if they were configured. Portfast disables spanning tree and should only be used on uplinks if you are certain that there is no chance of loop formation.
0
 
LVL 2

Author Comment

by:detox1978
ID: 39630738
Is there anything I can do to work out what's causing the issue?

I removed all the cables and switches and put new one is, so in the server room all the switches plug into a single core switch.
0
 
LVL 50

Expert Comment

by:Don Johnston
ID: 39630781
Troubleshooting like this can be rather difficult. :-(

Please post the topology (showing all switches and connections between the switches) and the current configurations of the switches.
0
 
LVL 2

Author Closing Comment

by:detox1978
ID: 39638504
On site today.  Looks like it's a TCPIP KeepAlive issue with the application.

Many thanks for your time.
0
 

Expert Comment

by:night crow
ID: 41793967
Can you please elaborate on how its related to a TCPIP KeepAlive issue.

I am seeing the same problem.

I have a sniffer (cPacket) which is capturing all packets from a server and the sniffer is reporting CRC errors. When I look at the capture, I see the same FCS Bad: True errors that you displayed in your screenshot. From the bad packets in wireshark its seems like they are related to spanning tree.

How did you track them down? and have you got any further info that you can share.

Thanks
0
 
LVL 2

Author Comment

by:detox1978
ID: 41794017
Hi nc,

The local anti virus (kasperski) managed the OS firewall,because the app (lotus notes) didn't send keep alives there was a 30 delay until the firewall would re-establish the connection.

We got rid of the antivirus as it was due for renwal.


D
0
 

Expert Comment

by:night crow
ID: 41795943
Hmmm, I see.

As I mentioned in my previous post:

I have a sniffer (cPacket) which is capturing all packets from a server and the sniffer is reporting CRC errors. When I look at the capture, I see the same "FCS Bad: True" errors that you displayed in your screenshot. From the bad packets in wireshark its seems like they are related to spanning tree.

However, I am almost 100% certain that this has nothing to do with an application.

Any other ideas what could be causing this?

(As as side note, I replaced the fibres to eliminate any physical issue)
0

Featured Post

DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In the hope of saving someone else's sanity... About a year ago we bought a Cisco 1921 router with two ADSL/VDSL EHWIC cards to load balance local network traffic over the two broadband lines we have, but we couldn't get the routing to work consi…
The Cisco RV042 router is a popular small network interfacing device that is often used as an internet gateway. Network administrators need to get at the management interface to make settings, change passwords, etc. This access is generally done usi…
After creating this article (http://www.experts-exchange.com/articles/23699/Setup-Mikrotik-routers-with-OSPF.html), I decided to make a video (no audio) to show you how to configure the routers and run some trace routes and pings between the 7 sites…
After creating this article (http://www.experts-exchange.com/articles/23699/Setup-Mikrotik-routers-with-OSPF.html), I decided to make a video (no audio) to show you how to configure the routers and run some trace routes and pings between the 7 sites…

738 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question