Solved

massive packet loss when adding vlans

Posted on 2011-03-23
16
1,039 Views
Last Modified: 2012-06-21
I have a problem of massive packet loss across a 5MB metro E.  I currently have 4 vlans going across containing my server, computer, phone, and wireless vlans.  With just those going across I have great speeds, ping times, and no loss.  When I add a new vlan for an EMC replicated storage I have 40% loss.  I have made sure we are not maxing out the bandwidth (we actually use hardly any of it).  It is Cisco equipment on both sides (3750 on source side, 3650 on target side).  Phone is QOS nothing else.  I added the port config for each side below.

Time Warner checked on things and did find issues with their equipment which they replaced at both sites.  When I run a speed test with replication off I get 6-7 mb on up and down, with replication on it is almost nothing.  I am not the greatest Cisco troubleshooter and I feel like I have a problem with how the packets are being tagged.  Any help is greatly appreciated.

Here is the port on the source side:
interface FastEthernet3/0/7
 description Metro E
 switchport trunk encapsulation dot1q
 switchport trunk allowed vlan 2,4,41,50
 switchport mode trunk
 speed 100
 duplex full
 auto qos trust
 spanning-tree bpdufilter enable
end

Here is the port on the target side:
interface FastEthernet0/48
 description Uplink to WAF
 switchport trunk encapsulation dot1q
 switchport trunk allowed vlan 2,4,41,50
 switchport mode trunk
 speed 100
 duplex full
 auto qos voip trust
 spanning-tree bpdufilter enable
end
0
Comment
Question by:ScottJones74
  • 8
  • 6
  • 2
16 Comments
 
LVL 50

Expert Comment

by:Don Johnston
ID: 35198694
What is the speed of your service?

You should have the bandwidth of the service configured on the interfaces connected to that service.

i.e.

int f3/0/7
 bandwidth 5000

0
 
LVL 2

Author Comment

by:ScottJones74
ID: 35198996
it is a 5mb metro link, the ports are set to 100mb
0
 
LVL 50

Expert Comment

by:Don Johnston
ID: 35199029
>it is a 5mb metro link,
>When I run a speed test with replication off I get 6-7 mb on up and down

If it's a 5mbps link, how are you getting 6-7mbps?

I think what's happening is that you're running at (or just over) 100% utilization. When you add the replication services, you're running way over 100% utilization and getting packet loss as a result.
0
 
LVL 2

Author Comment

by:ScottJones74
ID: 35199125
Burst speeds.  Time Warner was having trouble on their end and kept increasing bandwidth.  When they found the problem hardware they just left the pipe as is.  We have run traffic analysis on the pipe and we never get over 1mb of usage with the first 4 vlans.  When we turn on vlan 19 the usage increases slightly but never over 2mb.  Since we are dropping packets like crazy we never get a true sense of what our usage is going to be with vlan 19 (replication) on.

Running speedtest.net or speakeasy.net/speedtest usually shows you burst speeds but is a fair representation of your capabilities.  Before time Warner replaced the equipment we could hardly get speedtest to work.  I have not seen anything to point to utilization, I would love to be able to say that or that time Warner was throttling the connection, but so far we have nothing showing that.
0
 
LVL 50

Expert Comment

by:Don Johnston
ID: 35199176
Okay, but it's still a 5mbps service. And with the replication off, you're utilizing 100% of the bandwidth. So when you add more traffic to the link...
0
 
LVL 2

Author Comment

by:ScottJones74
ID: 35199345
No, i am not using 100% at any point in time.  As I stated above we have run 2 different traffic analysis programs on the pipe and we never get above 2MB even with replication on.

Example, my current utilization is 200k.  thats with the pc's, servers, and phone traffic going across.  Not a lot of traffic occurs.  We have 5 phones and maybe 5 pc's at teh remote location.  Its small.  I can copy a 100mb file in both directions at the same time in 3 min.
0
 
LVL 50

Expert Comment

by:Don Johnston
ID: 35199517
Please post the output of a "show int f3/0/7" and "show int f0/48" on the respective switches with and without the replication traffic running.
0
 
LVL 2

Author Comment

by:ScottJones74
ID: 35199727
this is without replication

FastEthernet3/0/7 is up, line protocol is up (connected)
  Hardware is Fast Ethernet,
  Description: Metro E
  MTU 1500 bytes, BW 100000 Kbit, DLY 100 usec,
     reliability 255/255, txload 1/255, rxload 4/255
  Encapsulation ARPA, loopback not set
  Keepalive set (10 sec)
  Full-duplex, 100Mb/s, media type is 10/100BaseTX
  input flow-control is off, output flow-control is unsupported
  ARP type: ARPA, ARP Timeout 04:00:00
  Last input never, output 00:00:30, output hang never
  Last clearing of "show interface" counters never
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 20569
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)
  5 minute input rate 1706000 bits/sec, 198 packets/sec
  5 minute output rate 273000 bits/sec, 148 packets/sec
     153660685 packets input, 17258760601 bytes, 0 no buffer
     Received 837661 broadcasts (914 multicasts)
     0 runts, 0 giants, 0 throttles
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
     0 watchdog, 914 multicast, 0 pause input
     0 input packets with dribble condition detected
     251028418 packets output, 254458370580 bytes, 0 underruns
     0 output errors, 0 collisions, 2 interface resets
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier, 0 PAUSE output
     0 output buffer failures, 0 output buffers swapped out



FastEthernet0/48 is up, line protocol is up (connected)
  Hardware is Fast Ethernet,
  Description: Metro E
  MTU 1500 bytes, BW 100000 Kbit, DLY 100 usec,
     reliability 255/255, txload 6/255, rxload 1/255
  Encapsulation ARPA, loopback not set
  Keepalive set (10 sec)
  Full-duplex, 100Mb/s, media type is 10/100BaseTX
  input flow-control is off, output flow-control is unsupported
  ARP type: ARPA, ARP Timeout 04:00:00
  Last input never, output 00:00:03, output hang never
  Last clearing of "show interface" counters never
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)
  5 minute input rate 161000 bits/sec, 136 packets/sec
  5 minute output rate 2528000 bits/sec, 235 packets/sec
     1234859 packets input, 655347354 bytes, 0 no buffer
     Received 358865 broadcasts (159847 multicasts)
     0 runts, 0 giants, 0 throttles
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
     0 watchdog, 159847 multicast, 0 pause input
     0 input packets with dribble condition detected
     957790 packets output, 514583855 bytes, 0 underruns
     0 output errors, 0 collisions, 1 interface resets
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier, 0 PAUSE output
     0 output buffer failures, 0 output buffers swapped out
0
What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

 
LVL 2

Author Comment

by:ScottJones74
ID: 35202468
this is with replication vlans allowed across, all i did was allow teh additional vlans of 8 and 19 across

FastEthernet3/0/7 is up, line protocol is up (connected)
  Hardware is Fast Ethernet,
  Description: Metro E
  MTU 1500 bytes, BW 100000 Kbit, DLY 100 usec,
     reliability 255/255, txload 13/255, rxload 1/255
  Encapsulation ARPA, loopback not set
  Keepalive set (10 sec)
  Full-duplex, 100Mb/s, media type is 10/100BaseTX
  input flow-control is off, output flow-control is unsupported
  ARP type: ARPA, ARP Timeout 04:00:00
  Last input never, output 00:00:18, output hang never
  Last clearing of "show interface" counters never
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 20676
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)
  5 minute input rate 294000 bits/sec, 407 packets/sec
  5 minute output rate 5460000 bits/sec, 572 packets/sec
     155164423 packets input, 17428418388 bytes, 0 no buffer
     Received 843730 broadcasts (920 multicasts)
     0 runts, 0 giants, 0 throttles
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
     0 watchdog, 920 multicast, 0 pause input
     0 input packets with dribble condition detected
     252880733 packets output, 255593505406 bytes, 0 underruns
     0 output errors, 0 collisions, 2 interface resets
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier, 0 PAUSE output
     0 output buffer failures, 0 output buffers swapped out


FastEthernet0/48 is up, line protocol is up (connected)
  Hardware is Fast Ethernet,
  Description: Metro E
  MTU 1500 bytes, BW 100000 Kbit, DLY 100 usec,
     reliability 255/255, txload 1/255, rxload 12/255
  Encapsulation ARPA, loopback not set
  Keepalive set (10 sec)
  Full-duplex, 100Mb/s, media type is 10/100BaseTX
  input flow-control is off, output flow-control is unsupported
  ARP type: ARPA, ARP Timeout 04:00:00
  Last input never, output 00:00:01, output hang never
  Last clearing of "show interface" counters never
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)
  5 minute input rate 4925000 bits/sec, 488 packets/sec
  5 minute output rate 281000 bits/sec, 392 packets/sec
     2505032 packets input, 1601059712 bytes, 0 no buffer
     Received 439239 broadcasts (193909 multicasts)
     0 runts, 0 giants, 0 throttles
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
     0 watchdog, 193909 multicast, 0 pause input
     0 input packets with dribble condition detected
     2014618 packets output, 656807446 bytes, 0 underruns
     0 output errors, 0 collisions, 1 interface resets
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier, 0 PAUSE output
     0 output buffer failures, 0 output buffers swapped out
0
 
LVL 2

Author Comment

by:ScottJones74
ID: 35202471
could this be a problem with too many vlans causing too many broadcasts?
0
 
LVL 50

Accepted Solution

by:
Don Johnston earned 500 total points
ID: 35202628
>could this be a problem with too many vlans causing too many broadcasts?

The received broadcasts on the f0/48 port are a bit on the high side. But they ratio is worse when it's NOT replicating (17% replicating, 29% non-replicating).  I would definitely try and figure out where all that broadcast traffic is coming from.

But I still think your drop problems are an over subscription issue.
=================
f3/0/7 without replication

5 minute  input rate 1,706,000 bits/sec
5 minute output rate   273,000 bits/sec, 148 packets/sec

f3/0/7 with replication

5 minute  input rate   294,000 bits/sec, 407 packets/sec
5 minute output rate 5,460,000 bits/sec, 572 packets/sec
==================

The output rate is over 5mbps when replicating.  If you've got a 5mbps link, you're going to be dropping packets when you exceed 5mbps of traffic.

I would clear the counters for these interfaces and check the numbers again.
0
 
LVL 45

Expert Comment

by:Craig Beck
ID: 35203034
You're replicating a SAN over a 5Mbps Metro-E??? :-S

I agree with don, it does look like you're maxing your link.  The load on each end equals roughly 5Mbps (percentage of 100Mbps link).

As for broadcast traffic being higher when not replicating, I'd guess that means you don't have enough bandwidth to send broadcasts across the link when you are replicating.
0
 
LVL 2

Author Comment

by:ScottJones74
ID: 35206384
Looks like i'm having crow for breakfast :)
Ok, EMC designed this setup and said the 5mb link would be fine.  So first action is to pick up a bat and chase them.  I still think the vlan setup is causing a lot of the broadcasts, but it is weird that its worse with replication off.  I will give you guys an update soon and I am sure I am going to have more questions.
0
 
LVL 45

Expert Comment

by:Craig Beck
ID: 35206508
I think the fact that there are less broadcasts with replication off just means that there is no available bandwidth to send all the broadcasts when you are replicating.  You are sending lots of unicast traffic when replicating which is saturating the link.  You have 4 VLANs over the link, all for different things, so I wouldn't be too worried if there's a bit of broadcast traffic.

If you're not doing anything else on the line you could replicate the SAN fine, but you're not, so depending on volumes of data it could take a long time.  Maybe EMC were unaware that you would be doing other things over this line at the same time?
0
 
LVL 50

Expert Comment

by:Don Johnston
ID: 35206559
>So first action is to pick up a bat and chase them.

Let me know if you catch them. :-)

The thing about the broadcast numbers is that without a defined time frame, it's really not very useful. So clear the counters and then check the numbers again.  You might find it's just an anomaly.

0
 
LVL 2

Author Comment

by:ScottJones74
ID: 35369875
Well after much testing, broken equipment, and bloody noses I am still at the same spot I was weeks ago :)

Not the exact same as we now know the issues all along were bandwidth and how EMC setup replication.  It’s not fixed (they don’t know how to yet) but it’s in their court.  Time Warner actually helped out a lot and eventually found bad equipment on their end too.  I appreciate your responses and I will try to keep you updated when things get better.  I am just glad it wasn't our switch setup after all :)
0

Featured Post

What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

This article is a step by step guide on how to create a basic PTP link using Ubiquiti airOS devices. This guide can be used on the following Ubiquiti AirMAX devices. Nanostation, Bullets, AirBridge, Nanobeam, NanoBridge to name a few. Please review …
Network ports are the threads that hold network communication together. They are an essential part of networking that can be easily ignore or misunderstood, my goals is to show those who don't have a strong network foundation how network ports opera…
After creating this article (http://www.experts-exchange.com/articles/23699/Setup-Mikrotik-routers-with-OSPF.html), I decided to make a video (no audio) to show you how to configure the routers and run some trace routes and pings between the 7 sites…
In this tutorial you'll learn about bandwidth monitoring with flows and packet sniffing with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're interested in additional methods for monitoring bandwidt…

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now