BGP over L2 MPLS


we are experiencing a weekly bgp disruption between two sites.  We are using cisco and a juniper device.  The topology looks like the following:

MX80 (Junos 11.4R7.5) – Arista DCS-7148S ---- Cogent Ckt (L2 MPLS)----- cisco 2960G 48TC-L--- ASA firewall --- Cisco 2800 IOS 12.4(23)

Usually the flap happens in the middle of the week at around 10PM - 12 AM, the bgp peering goes into active.  i have contacted the provider and they've confirmed that this is not a maintenance nor an outage on their end.  Attached are the graphs that we see from our PRTG and the provider's own graph during the time of the outage.  

our Graph was taken directly from the switch interface where our provider's ckt is connected to.  clearly, we see a noticeable drop and yet they don't see anything at all.  They can't explain the drop.  i can confirm as well that there are NO backups or anything that would saturate the link during those times.  They've also confirmed that we reach about 85 Mbit/s max on this link which we are provisioned for 1 Gb, so i don't think it's a bandwidth issue.  The only suggestion i've got from them was to call during the actual issue.

Have anyone encountered this issue before?  inputs are highly appreciated.
Who is Participating?
pergrConnect With a Mentor Commented:
If you can not ping the other side, it is not directly a BGP issue, but either the link or the CPU of the other device is too busy to even respond to ping.

I suggest you set up at least two IP addresses on each side, and ping those. That way you can figure out if the link between sites is up, and possible just the device that is running BGP has a CPU that is too busy.

For the Juniper messages (ae0 : Warning: aggregated-ether-options link-speed no kernel value) you may want to try to set the LACP 'link-speed' on both the AE and the GE interfaces, to get rid of that message.
Clearly you need to troubleshoot during an outage, and I suggest you work according to the OSI layers.

1) are interfaces up or down
2) can you ping across
3) is the routing protocol up or down

If you can not ping from MX80 to c2800, perhaps, on the Arista and the 2960G you can also configure vlan/irb interfaces and try to ping through the cogent connection.
FREDARCEAuthor Commented:
It's difficult to troubleshoot during the outage as timing is unpredictable and the service is usually restored by the time we get a chance to look at it.
The interfaces never go down.
Our ping monitor reports no ping packets can go accross the line during outage.
BGP goes down

On the cisco side of the link, there are repeating log messages for :


These messages are not during exact time of outage but do preceed the outage.

On the Juniper side, are entries for:

Dec  8 11:43:52  nytprdrtr-wan1 dcd[53371]: ae0 : Warning: aggregated-ether-options link-speed no kernel value! default to  0
Dec  8 11:43:52  nytprdrtr-wan1 dcd[53371]: ae1 : Warning: aggregated-ether-options link-speed no kernel value! default to  0

Other forums suggest the cisco bgp_cpu2timeout is more cosmetic so not sure if part of the cause.

Please Advise.
I'd suggest you set up some kind of syslog monitoring and make sure that you are automatically paged/texted when those messages appear or when bgp drop messages appear (make sure the devices are set up to send those syslog traps). You need to troubleshoot when it's happening.
FREDARCEAuthor Commented:
We have syslog and traps in place.  Just haven't been able to get to because it's only about a 5 minute window that mostly occurs close to midnight.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.