BGP over L2 MPLS

Hello,

we are experiencing a weekly bgp disruption between two sites.  We are using cisco and a juniper device.  The topology looks like the following:

MX80 (Junos 11.4R7.5) – Arista DCS-7148S ---- Cogent Ckt (L2 MPLS)----- cisco 2960G 48TC-L--- ASA firewall --- Cisco 2800 IOS 12.4(23)

Usually the flap happens in the middle of the week at around 10PM - 12 AM, the bgp peering goes into active.  i have contacted the provider and they've confirmed that this is not a maintenance nor an outage on their end.  Attached are the graphs that we see from our PRTG and the provider's own graph during the time of the outage.  

our Graph was taken directly from the switch interface where our provider's ckt is connected to.  clearly, we see a noticeable drop and yet they don't see anything at all.  They can't explain the drop.  i can confirm as well that there are NO backups or anything that would saturate the link during those times.  They've also confirmed that we reach about 85 Mbit/s max on this link which we are provisioned for 1 Gb, so i don't think it's a bandwidth issue.  The only suggestion i've got from them was to call during the actual issue.

Have anyone encountered this issue before?  inputs are highly appreciated.
provider.jpg
internal-prtg.jpg
FREDARCEAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

pergrCommented:
Clearly you need to troubleshoot during an outage, and I suggest you work according to the OSI layers.

1) are interfaces up or down
2) can you ping across
3) is the routing protocol up or down
...

If you can not ping from MX80 to c2800, perhaps, on the Arista and the 2960G you can also configure vlan/irb interfaces and try to ping through the cogent connection.
0
FREDARCEAuthor Commented:
It's difficult to troubleshoot during the outage as timing is unpredictable and the service is usually restored by the time we get a chance to look at it.
The interfaces never go down.
Our ping monitor reports no ping packets can go accross the line during outage.
BGP goes down

On the cisco side of the link, there are repeating log messages for :

bgp_cpu2timeout:

These messages are not during exact time of outage but do preceed the outage.

On the Juniper side, are entries for:

Dec  8 11:43:52  nytprdrtr-wan1 dcd[53371]: ae0 : Warning: aggregated-ether-options link-speed no kernel value! default to  0
Dec  8 11:43:52  nytprdrtr-wan1 dcd[53371]: ae1 : Warning: aggregated-ether-options link-speed no kernel value! default to  0

Other forums suggest the cisco bgp_cpu2timeout is more cosmetic so not sure if part of the cause.

Please Advise.
0
mikebernhardtCommented:
I'd suggest you set up some kind of syslog monitoring and make sure that you are automatically paged/texted when those messages appear or when bgp drop messages appear (make sure the devices are set up to send those syslog traps). You need to troubleshoot when it's happening.
0
FREDARCEAuthor Commented:
We have syslog and traps in place.  Just haven't been able to get to because it's only about a 5 minute window that mostly occurs close to midnight.
0
pergrCommented:
If you can not ping the other side, it is not directly a BGP issue, but either the link or the CPU of the other device is too busy to even respond to ping.

I suggest you set up at least two IP addresses on each side, and ping those. That way you can figure out if the link between sites is up, and possible just the device that is running BGP has a CPU that is too busy.


For the Juniper messages (ae0 : Warning: aggregated-ether-options link-speed no kernel value) you may want to try to set the LACP 'link-speed' on both the AE and the GE interfaces, to get rid of that message.
1

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Networking Protocols

From novice to tech pro — start learning today.