Link to home
Start Free TrialLog in
Avatar of Bill Herde
Bill HerdeFlag for United States of America

asked on

Point to point connection slow in one direction only

We have a metro-E lease line from our office to the main datacenter rated for 100Mbps. A Cisco ASA 5505 is in the office, and in the datacenter it first hits a Force10 layer3 switch, then a Cisco ASA 5512. An IPSEC VPN is set up between the sites using these firewalls. It has been working as expected for 2 years.  A week ago, downstream traffic slowed to a maximum of around 20Mbps, and is not steady.  Upstream continues to run at 90 or better.  Traffic is being monitored at the firewall outside connection, so all traffic is viewed. The ISP certified the line good at full speed from my office connection to the wire in my datacenter cabinet that plugs into the layer3 switch.

I have never encountered a one-way speed bump like this.  Ideas on how to troubleshoot this?
ASKER CERTIFIED SOLUTION
Avatar of Gary Patterson, CISSP
Gary Patterson, CISSP
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Bill Herde

ASKER

Thanks for the response.
CRC errors do not appear to be a factor.
Following is from the force 10 on the connected link, and from the ASA in the office.
I also did a show int on the force10 for the connections to the firewall, which also look clean.

This is from the Force10.

SW-EDMS-SanDiego>show interfaces gigabitethernet 1/45
GigabitEthernet 1/45 is up, line protocol is up
Hardware is DellEth, address is 74:e6:e2:f7:c2:82
    Current address is 74:e6:e2:f7:c2:82
Pluggable media not present
Interface index is 2102787
Internet address is not set
Mode of IPv4 Address Assignment : NONE
DHCP Client-ID :74e6e2f7c282
MTU 1554 bytes, IP MTU 1500 bytes
LineSpeed 1000 Mbit, Mode full duplex
Auto-mdix enabled, Flowcontrol rx off tx off
ARP type: ARPA, ARP Timeout 04:00:00
Last clearing of "show interface" counters 37w6d22h
Queueing strategy: fifo
Input Statistics:
     10366650609 packets, 6892038754125 bytes
     748011766 64-byte pkts, 2640626597 over 64-byte pkts, 2334131862 over 127-byte pkts
     236762277 over 255-byte pkts, 165786587 over 511-byte pkts, 4241331520 over 1023-byte pkts
     50 Multicasts, 115316 Broadcasts, 10366535243 Unicasts
     0 runts, 0 giants, 0 throttles
     0 CRC, 0 overrun, 0 discarded
Output Statistics:
     9452933673 packets, 7509980670863 bytes, 0 underruns
     224600651 64-byte pkts, 3503893757 over 64-byte pkts, 374775117 over 127-byte pkts
     280271711 over 255-byte pkts, 133349679 over 511-byte pkts, 4936042758 over 1023-byte pkts
     11822316 Multicasts, 1337525 Broadcasts, 9439773832 Unicasts
     0 throttles, 0 discarded, 0 collisions, 0 wreddrops
Rate info (interval 299 seconds):
     Input 01.00 Mbits/sec,        719 packets/sec, 0.10% of line-rate
     Output 10.00 Mbits/sec,       1166 packets/sec, 1.00% of line-rate
Time since last interface status change: 16:41:48

Also pulled this from the firewall in the office.
Interface Vlan14 "OUTSIDE-SM", is up, line protocol is up
  Hardware is EtherSVI, BW 100 Mbps, DLY 100 usec
        MAC address 0026.9970.3eb4, MTU 1500
        IP address 104.193.24.114, subnet mask 255.255.255.248
  Traffic Statistics for "OUTSIDE-SM":
        30735966 packets input, 35552983727 bytes
        21308405 packets output, 5520118100 bytes
        11583 packets dropped
      1 minute input rate 892 pkts/sec,  966381 bytes/sec
      1 minute output rate 547 pkts/sec,  193461 bytes/sec
      1 minute drop rate, 0 pkts/sec
      5 minute input rate 1186 pkts/sec,  1338871 bytes/sec
      5 minute output rate 729 pkts/sec,  209839 bytes/sec
      5 minute drop rate, 0 pkts/sec
So far, so good, but you pulled ASA stats for the SVI (virtual) interface.  You need to check the stats on the physical interface: Ethernet / FastEthernet / GigaBitEthernet.

Data center: UNI - Force10 Switch- (?) - ASA

Assuming both UNI and ASA are connected to the same switch, the there are 3 places to check here:  2 ports on the switch, plus the physical interface (not the SVI) on the ASA.  If there are multiple switches in the path, you need to check each intermediate switch, too - especially with a "one way" problem.

Your office: UNI - ASA

Assuming the ASA is directly connected, then just one place to check here.  Same comment about checking physical ASA interface here, though.

Assuming that all comes up clean, can you walk me though how you've determined and measured the problem, and preferably post some screen shots?
In the office the uni goes straight to the ASA, and here are the stats for the port it connects to.

Interface Ethernet0/5 "", is up, line protocol is up
  Hardware is 88E6095, BW 100 Mbps, DLY 100 usec
        Auto-Duplex(Full-duplex), Auto-Speed(100 Mbps)
        Input flow control is unsupported, output flow control is unsupported
        Available but not configured via nameif
        MAC address 0026.9970.3eb1, MTU not set
        IP address unassigned
        34658776 packets input, 40217301634 bytes, 0 no buffer
        Received 3017 broadcasts, 0 runts, 0 giants
        0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
        0 pause input, 0 resume input
        0 L2 decode drops
        20277 switch ingress policy drops
        23919102 packets output, 6856273791 bytes, 0 underruns
        0 pause output, 0 resume output
        0 output errors, 0 collisions, 0 interface resets
        0 late collisions, 0 deferred
        0 rate limit drops
        0 switch egress policy drops
        0 input reset drops, 0 output reset drops

In the datacenter the wire in the cabinet drops into the force10 stack (3048) on port 45 on either switch.

    NUM    Status    Description                     Q Ports
*   1      Inactive
    3      Active    DMZ                             T Gi 1/33-1/40
                                                     U Gi 1/41-1/44
                                                     T Gi 2/33-2/40
                                                     U Gi 2/41-2/44
    100    Active    Switch-to-AR                    U Gi 1/48
                                                     U Gi 2/48
    200    Active    Firewall-External-Ports         U Gi 1/47
                                                     U Gi 2/47
    300    Active    SD-Office-MetroE                U Gi 1/45
                                                     U Gi 2/45
    500    Active    Katy-crossconnect               U Gi 1/46
                                                     U Gi 2/46
    600    Inactive  Katy-Firewall-External
    1000   Active    Internal                        U Gi 1/1-1/40
                                                     U Gi 2/1-2/40
SW-EDMS-SanDiego>

It is tagged vlan 300 and gets sent to the firewall (PPG-PROD1) on redundant 1.  Then fed back to the 3048 on redundant 2 into a layer2 vlan 1000.  Here are the stats for the firewall ports.  Also clean.
PPG-PROD1# sho int stats
Interface GigabitEthernet0/0 "", is up, line protocol is up
  Hardware is i82574L rev00, BW 1000 Mbps, DLY 10 usec
        Full-Duplex(Full-duplex), 1000 Mbps(1000 Mbps)
        Input flow control is unsupported, output flow control is off
        Active member of Redundant1
        MAC address fc5b.392d.bfac, MTU not set
        IP address unassigned
        81180411317 packets input, 58907728198413 bytes, 0 no buffer
        Received 651311 broadcasts, 0 runts, 0 giants
        4970099 input errors, 0 CRC, 0 frame, 4970099 overrun, 0 ignored, 0 abort
        0 pause input, 0 resume input
        0 L2 decode drops
        159745973858 packets output, 214900154382177 bytes, 58594 underruns
        0 pause output, 0 resume output
        0 output errors, 0 collisions, 0 interface resets
        0 late collisions, 0 deferred
        0 input reset drops, 0 output reset drops
        input queue (blocks free curr/low): hardware (485/362)
        output queue (blocks free curr/low): hardware (511/0)
Interface GigabitEthernet0/1 "", is up, line protocol is up
  Hardware is i82574L rev00, BW 1000 Mbps, DLY 10 usec
        Full-Duplex(Full-duplex), 1000 Mbps(1000 Mbps)
        Input flow control is unsupported, output flow control is off
        Description: Gateway for DATA backbone network
        Standby member of Redundant1
        MAC address fc5b.392d.bfa9, MTU not set
        IP address unassigned
        651974 packets input, 41726272 bytes, 0 no buffer
        Received 651974 broadcasts, 0 runts, 0 giants
        0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
        0 pause input, 0 resume input
        651974 L2 decode drops
        0 packets output, 0 bytes, 0 underruns
        0 pause output, 0 resume output
        0 output errors, 0 collisions, 0 interface resets
        0 late collisions, 0 deferred
        0 input reset drops, 0 output reset drops
        input queue (blocks free curr/low): hardware (466/458)
        output queue (blocks free curr/low): hardware (511/511)
Interface GigabitEthernet0/2 "", is up, line protocol is up
  Hardware is i82574L rev00, BW 1000 Mbps, DLY 10 usec
        Full-Duplex(Full-duplex), 1000 Mbps(1000 Mbps)
        Input flow control is unsupported, output flow control is off
        Active member of Redundant2
        MAC address fc5b.392d.bfad, MTU not set
        IP address unassigned
        158615036612 packets input, 204554576125211 bytes, 0 no buffer
        Received 7527864 broadcasts, 0 runts, 0 giants
        2202565 input errors, 0 CRC, 0 frame, 2202565 overrun, 0 ignored, 0 abort
        0 pause input, 0 resume input
        0 L2 decode drops
        80021335104 packets output, 53073861052899 bytes, 63 underruns
        0 pause output, 0 resume output
        0 output errors, 0 collisions, 0 interface resets
        0 late collisions, 0 deferred
        0 input reset drops, 0 output reset drops
        input queue (blocks free curr/low): hardware (503/362)
        output queue (blocks free curr/low): hardware (511/0)
Interface GigabitEthernet0/3 "", is up, line protocol is up
  Hardware is i82574L rev00, BW 1000 Mbps, DLY 10 usec
        Full-Duplex(Full-duplex), 1000 Mbps(1000 Mbps)
        Input flow control is unsupported, output flow control is off
        Standby member of Redundant2
        MAC address fc5b.392d.bfaa, MTU not set
        IP address unassigned
        7905819 packets input, 642452807 bytes, 0 no buffer
        Received 7905819 broadcasts, 0 runts, 0 giants
        0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
        0 pause input, 0 resume input
        7905819 L2 decode drops
        0 packets output, 0 bytes, 0 underruns
        0 pause output, 0 resume output
        0 output errors, 0 collisions, 0 interface resets
        0 late collisions, 0 deferred
        0 input reset drops, 0 output reset drops
        input queue (blocks free curr/low): hardware (505/390)
        output queue (blocks free curr/low): hardware (511/511)


The 3048 also shows these connections to the firewall as clean.


SW-EDMS-SanDiego>sh int gig 1/1
GigabitEthernet 1/1 is up, line protocol is up
Hardware is DellEth, address is 74:e6:e2:f7:c2:82
    Current address is 74:e6:e2:f7:c2:82
Pluggable media not present
Interface index is 2097155
Internet address is not set
Mode of IPv4 Address Assignment : NONE
DHCP Client-ID :74e6e2f7c282
MTU 1554 bytes, IP MTU 1500 bytes
LineSpeed 1000 Mbit, Mode full duplex
Auto-mdix enabled, Flowcontrol rx off tx off
ARP type: ARPA, ARP Timeout 04:00:00
Last clearing of "show interface" counters 37w6d23h
Queueing strategy: fifo
Input Statistics:
     0 packets, 0 bytes
     0 64-byte pkts, 0 over 64-byte pkts, 0 over 127-byte pkts
     0 over 255-byte pkts, 0 over 511-byte pkts, 0 over 1023-byte pkts
     0 Multicasts, 0 Broadcasts, 0 Unicasts
     0 runts, 0 giants, 0 throttles
     0 CRC, 0 overrun, 0 discarded
Output Statistics:
     35614371 packets, 5352875961 bytes, 0 underruns
     22542815 64-byte pkts, 5453807 over 64-byte pkts, 5661814 over 127-byte pkts
     202076 over 255-byte pkts, 20932 over 511-byte pkts, 1732927 over 1023-byte pkts
     20310288 Multicasts, 12898925 Broadcasts, 2405158 Unicasts
     0 throttles, 0 discarded, 0 collisions, 0 wreddrops
Rate info (interval 299 seconds):
     Input 00.00 Mbits/sec,          0 packets/sec, 0.00% of line-rate
     Output 00.00 Mbits/sec,          2 packets/sec, 0.00% of line-rate
Time since last interface status change: 22w6d19h


SW-EDMS-SanDiego>sho int gig 2/1
GigabitEthernet 2/1 is up, line protocol is up
Hardware is DellEth, address is 74:e6:e2:f7:c2:82
    Current address is 74:e6:e2:f7:c2:82
Pluggable media not present
Interface index is 3145731
Internet address is not set
Mode of IPv4 Address Assignment : NONE
DHCP Client-ID :74e6e2f7c282
MTU 1554 bytes, IP MTU 1500 bytes
LineSpeed 1000 Mbit, Mode full duplex
Auto-mdix enabled, Flowcontrol rx off tx off
ARP type: ARPA, ARP Timeout 04:00:00
Last clearing of "show interface" counters 38w0d0h
Queueing strategy: fifo
Input Statistics:
     114203850533 packets, 62860561899730 bytes
     15119499382 64-byte pkts, 52166167896 over 64-byte pkts, 3581872079 over 127-byte pkts
     921981498 over 255-byte pkts, 444283914 over 511-byte pkts, 41970045764 over 1023-byte pkts
     0 Multicasts, 868641 Broadcasts, 114202981892 Unicasts
     0 runts, 0 giants, 0 throttles
     0 CRC, 0 overrun, 0 discarded
Output Statistics:
     252423806537 packets, 327122869511040 bytes, 0 underruns
     5582713443 64-byte pkts, 4537789210 over 64-byte pkts, 984214706 over 127-byte pkts
     658952473 over 255-byte pkts, 1588735594 over 511-byte pkts, 239071401111 over 1023-byte pkts
     20310309 Multicasts, 12030320 Broadcasts, 252391465908 Unicasts
     0 throttles, 0 discarded, 0 collisions, 0 wreddrops
Rate info (interval 299 seconds):
     Input 24.00 Mbits/sec,       3283 packets/sec, 2.40% of line-rate
     Output 23.00 Mbits/sec,       3205 packets/sec, 2.40% of line-rate
Time since last interface status change: 22w6d19h



Initial indications of the problem were users are getting disconnected from RDP and SSMS sessions to the developer servers.  Since our internet egress is also across this same line via vlan100, a quick and dirty internet speed test shows a unstable and pathetic download speed, and a good strong upload.  Typical downstream is 10-20M, and up is over 90.  This same imbalance is confirmed running iperf sessions to selected servers in the datacenter.  All the while, I have Passler monitor running on all the firewalls ans switches, and the throughput on the interfaces match the test results.

An interesting thing discovered is that if the metro-e cable is plugged into 2/45, the performance gets worse.  Downstream drops below 10M.

There is no time of day shift in performance, and much of this testing is done after hours, so there is no conflicting traffic.
Overruns:

ppg-prod1
      Interface GigabitEthernet0/0
      4970099 input errors, 0 CRC, 0 frame, 4970099 overrun, 0 ignored, 0 abort
      
      Interface GigabitEthernet0/2      
      2202565 input errors, 0 CRC, 0 frame, 2202565 overrun, 0 ignored, 0 abort

I'm not familiar with this specific switch.  Suggest you open a TAC case.  Depending on the hardware architecture, you may be able to just move one of these ports so it is handled on a different internal module.  For example, on some 8-port groups share a chunk of memory.  If you've got several busy ports, you can oversubscribe that port group.  Don't know if it is the same here or not, but it is pretty easy to test.

You may be able to allocate more buffer space, or determine if flow control is appropriate in your situation.  Flow control needs to be supported at both ends of the wire.

https://www.force10networks.com/CSPortal20/TechTips/0064_FlowControlSampleConfiguration.aspx
PPG-prod1 is the 5512 firewall.  I will contact Cisco.  Stay tuned!
Well, it was not the overruns.

More testing and searching was done over the weekend. We reset the firewall and the switches just because it's something you do.  No improvement.  

The overruns do not increment when loading traffic on the connection to the office, but do increment when loading up the link to the DR site. (different circuit, same 'outside' port on the firewall)  This is not a problem as the 5512 is only rated to move 210Mbps encrypted, and with the throttles off the link to DR, we are able to sustain close to 600. Some overruns would be justified, and flow control would probably fix that.  We will look into it once the big problem is resolved. The Cisco tech taught me a great deal on getting info out of show int stats.

We put a laptop on the office end of the wire and iperf showed us the same symptoms, so the office firewall was eliminated from suspect.

I ran iperf between servers in the DMZ and the inside network in the datacenter which would involve only the firewall and the switch, and it ran at the full gig. (926Mbps)

So the issue appears squarely at the connection from my switch and the datacenter switch.

It is looking a lot like Force10 does not like to play nicely with Juniper.  I am meeting with the datacenter team tomorrow to get a plan together for where to go next.
PS  I did not mention that we had COX and TW come out and verify the link good from end to end prior to the weekend testing.  When their test equipment was being used, it pushed 100M both ways.
Looks like you've got a pretty big speed drop (10:1) at the data center side from 1gbps on the switch to the MEN.  If the CIR is 100mbps on the Metro-e, you should probably configure traffic shaping to that speed on the switchport connected to the UNI if it supports it and if not on the outbound port on the ASA.  

If you're bursting over CIR, which would be easy to do in that direction, MEN provider may be rate limiting you and dropping packets.  Ask them if you haven't already.
Hadn't considered that. will give it a go.
Rate shaping did not help unfortunately.  Also don't think the MEN is dropping packets, else wouldn't there be very high re-transmit?  SW-EDMS-SanDiego#show tcp statistics
Rcvd: 6209 Total, 0 no port
   0 checksum error, 0 bad offset, 0 too short
   457 packets (24740 bytes) in sequence
   0 dup packets (0 bytes)
   0 partially dup packets (0 bytes)
   0 out-of-order packets (0 bytes)
   0 packets ( 0 bytes) with data after window
   0 packets after close
   0 window probe packets, 0 window update packets
   0 dup ack packets, 0 ack packets with unsend data
   461 ack packets (32982 bytes)
Sent: 50954 Total, 0 urgent packets
   38 control packets
   524 data packets (33033 bytes)
   0 data packets (0 bytes) retransmitted
   24009 ack only packets (26627 delayed)
   0 window probe packets, 24 window update packets
0 Connections initiated, 1 connections accepted, 1 connections established
29 Connections closed (including 0 dropped, 0 embryonic dropped)
0 Total rxmt timeout, 0 connections dropped in rxmt timeout
0 Keepalive timeout, 0 keepalive probe, 0 Connections dropped in keepalive
Can't tell much from TCP stats, since it is such a small sample.  Was that during a test period?

A few thoughts:

1) If you haven't already, continue testing, moving closer to the UNI at each end one step at a time until you are plugged in directly.  Validate provider testing, and hopefully isolate problem to a specific device.  Assume it is going to be the data center switch, but of course that assumption needs to be validated through testing.

2) If you haven't already, engage Dell/Force10 tech support by opening a TAC case.

3) What FTOS version are you running?  Consider upgrading to latest firmware if you aren't already there.

4) Review CPU utilization on switch while testing:  https://www.force10networks.com/CSPortal20/TechTips/0040_HighCPU.aspx

5) If you can without compromising security, reboot switch, start a test, let it run for a bit, then post SHOW TECH-SUPPORT

6) If you have access to appropriate hardware (1GbE aggregating port monitor), capture a short wireshark trace between the data center while testing and experiencing slowdown.  Capture using port mirroring isn't as useful, but it that's all you can do, it wouldn't hurt.  Post capture file here (zipped, needs to be under 50MB to post to EE).
Well, this is turning in as I was suspecting.  

I took an old 10/100 switch (actually a hub that is labeled a switch) over to the datacenter to plug in at the connection so I could wireshark and see what what happening.  Once it was plugged in, the link ran full speed!  Took it out, broken again. Turned off auto negotiate on the switch port and set 1G full duplex, same problem.  Tried half duplex, would not link.  Tried 100M full, Better, but still only 60% of rated speed.

SO.  The band-aid is in place.  As long as traffic goes through the old 5 port, it works.  This is passed back to the datacenter networking guys to figure out.
The initial direction to go for the physical layer was right there.  This is pretty clearly a hardware issue that programming and special configurations could not resolve.  If the datacenter team comes up with a clever way to make it work right, I will update this thread.
I'd see if the provider can just configure the UNI port manually for 100mbps/full.  Then just do the same on your switchport.