Solved

BGP over L2 MPLS

Posted on 2014-12-09
5
209 Views
Last Modified: 2014-12-13
Hello,

we are experiencing a weekly bgp disruption between two sites.  We are using cisco and a juniper device.  The topology looks like the following:

MX80 (Junos 11.4R7.5) – Arista DCS-7148S ---- Cogent Ckt (L2 MPLS)----- cisco 2960G 48TC-L--- ASA firewall --- Cisco 2800 IOS 12.4(23)

Usually the flap happens in the middle of the week at around 10PM - 12 AM, the bgp peering goes into active.  i have contacted the provider and they've confirmed that this is not a maintenance nor an outage on their end.  Attached are the graphs that we see from our PRTG and the provider's own graph during the time of the outage.  

our Graph was taken directly from the switch interface where our provider's ckt is connected to.  clearly, we see a noticeable drop and yet they don't see anything at all.  They can't explain the drop.  i can confirm as well that there are NO backups or anything that would saturate the link during those times.  They've also confirmed that we reach about 85 Mbit/s max on this link which we are provisioned for 1 Gb, so i don't think it's a bandwidth issue.  The only suggestion i've got from them was to call during the actual issue.

Have anyone encountered this issue before?  inputs are highly appreciated.
provider.jpg
internal-prtg.jpg
0
Comment
Question by:FREDARCE
  • 2
  • 2
5 Comments
 
LVL 17

Expert Comment

by:pergr
Comment Utility
Clearly you need to troubleshoot during an outage, and I suggest you work according to the OSI layers.

1) are interfaces up or down
2) can you ping across
3) is the routing protocol up or down
...

If you can not ping from MX80 to c2800, perhaps, on the Arista and the 2960G you can also configure vlan/irb interfaces and try to ping through the cogent connection.
0
 

Author Comment

by:FREDARCE
Comment Utility
It's difficult to troubleshoot during the outage as timing is unpredictable and the service is usually restored by the time we get a chance to look at it.
The interfaces never go down.
Our ping monitor reports no ping packets can go accross the line during outage.
BGP goes down

On the cisco side of the link, there are repeating log messages for :

bgp_cpu2timeout:

These messages are not during exact time of outage but do preceed the outage.

On the Juniper side, are entries for:

Dec  8 11:43:52  nytprdrtr-wan1 dcd[53371]: ae0 : Warning: aggregated-ether-options link-speed no kernel value! default to  0
Dec  8 11:43:52  nytprdrtr-wan1 dcd[53371]: ae1 : Warning: aggregated-ether-options link-speed no kernel value! default to  0

Other forums suggest the cisco bgp_cpu2timeout is more cosmetic so not sure if part of the cause.

Please Advise.
0
 
LVL 28

Expert Comment

by:mikebernhardt
Comment Utility
I'd suggest you set up some kind of syslog monitoring and make sure that you are automatically paged/texted when those messages appear or when bgp drop messages appear (make sure the devices are set up to send those syslog traps). You need to troubleshoot when it's happening.
0
 

Author Comment

by:FREDARCE
Comment Utility
We have syslog and traps in place.  Just haven't been able to get to because it's only about a 5 minute window that mostly occurs close to midnight.
0
 
LVL 17

Accepted Solution

by:
pergr earned 500 total points
Comment Utility
If you can not ping the other side, it is not directly a BGP issue, but either the link or the CPU of the other device is too busy to even respond to ping.

I suggest you set up at least two IP addresses on each side, and ping those. That way you can figure out if the link between sites is up, and possible just the device that is running BGP has a CPU that is too busy.


For the Juniper messages (ae0 : Warning: aggregated-ether-options link-speed no kernel value) you may want to try to set the LACP 'link-speed' on both the AE and the GE interfaces, to get rid of that message.
1

Featured Post

Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

Join & Write a Comment

Article by: rfc1180
The Maximum Segment size (MSS) is an important consideration when troubleshooting connectivity via the Internet/Intranet. As the packets are routed via the Internet/Intranet, the packets must traverse through multiple routers in the path between two…
We recently endured a series of broadcast storms that caused our ISP to shut us down for brief periods of time. After going through a multitude of tests, we determined that the issue was related to Intel NIC drivers on some new HP desktop computers …
After creating this article (http://www.experts-exchange.com/articles/23699/Setup-Mikrotik-routers-with-OSPF.html), I decided to make a video (no audio) to show you how to configure the routers and run some trace routes and pings between the 7 sites…
After creating this article (http://www.experts-exchange.com/articles/23699/Setup-Mikrotik-routers-with-OSPF.html), I decided to make a video (no audio) to show you how to configure the routers and run some trace routes and pings between the 7 sites…

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now