Solved

BGP over L2 MPLS

Posted on 2014-12-09
5
218 Views
Last Modified: 2014-12-13
Hello,

we are experiencing a weekly bgp disruption between two sites.  We are using cisco and a juniper device.  The topology looks like the following:

MX80 (Junos 11.4R7.5) – Arista DCS-7148S ---- Cogent Ckt (L2 MPLS)----- cisco 2960G 48TC-L--- ASA firewall --- Cisco 2800 IOS 12.4(23)

Usually the flap happens in the middle of the week at around 10PM - 12 AM, the bgp peering goes into active.  i have contacted the provider and they've confirmed that this is not a maintenance nor an outage on their end.  Attached are the graphs that we see from our PRTG and the provider's own graph during the time of the outage.  

our Graph was taken directly from the switch interface where our provider's ckt is connected to.  clearly, we see a noticeable drop and yet they don't see anything at all.  They can't explain the drop.  i can confirm as well that there are NO backups or anything that would saturate the link during those times.  They've also confirmed that we reach about 85 Mbit/s max on this link which we are provisioned for 1 Gb, so i don't think it's a bandwidth issue.  The only suggestion i've got from them was to call during the actual issue.

Have anyone encountered this issue before?  inputs are highly appreciated.
provider.jpg
internal-prtg.jpg
0
Comment
Question by:FREDARCE
  • 2
  • 2
5 Comments
 
LVL 17

Expert Comment

by:pergr
ID: 40491096
Clearly you need to troubleshoot during an outage, and I suggest you work according to the OSI layers.

1) are interfaces up or down
2) can you ping across
3) is the routing protocol up or down
...

If you can not ping from MX80 to c2800, perhaps, on the Arista and the 2960G you can also configure vlan/irb interfaces and try to ping through the cogent connection.
0
 

Author Comment

by:FREDARCE
ID: 40491180
It's difficult to troubleshoot during the outage as timing is unpredictable and the service is usually restored by the time we get a chance to look at it.
The interfaces never go down.
Our ping monitor reports no ping packets can go accross the line during outage.
BGP goes down

On the cisco side of the link, there are repeating log messages for :

bgp_cpu2timeout:

These messages are not during exact time of outage but do preceed the outage.

On the Juniper side, are entries for:

Dec  8 11:43:52  nytprdrtr-wan1 dcd[53371]: ae0 : Warning: aggregated-ether-options link-speed no kernel value! default to  0
Dec  8 11:43:52  nytprdrtr-wan1 dcd[53371]: ae1 : Warning: aggregated-ether-options link-speed no kernel value! default to  0

Other forums suggest the cisco bgp_cpu2timeout is more cosmetic so not sure if part of the cause.

Please Advise.
0
 
LVL 28

Expert Comment

by:mikebernhardt
ID: 40491192
I'd suggest you set up some kind of syslog monitoring and make sure that you are automatically paged/texted when those messages appear or when bgp drop messages appear (make sure the devices are set up to send those syslog traps). You need to troubleshoot when it's happening.
0
 

Author Comment

by:FREDARCE
ID: 40491209
We have syslog and traps in place.  Just haven't been able to get to because it's only about a 5 minute window that mostly occurs close to midnight.
0
 
LVL 17

Accepted Solution

by:
pergr earned 500 total points
ID: 40491268
If you can not ping the other side, it is not directly a BGP issue, but either the link or the CPU of the other device is too busy to even respond to ping.

I suggest you set up at least two IP addresses on each side, and ping those. That way you can figure out if the link between sites is up, and possible just the device that is running BGP has a CPU that is too busy.


For the Juniper messages (ae0 : Warning: aggregated-ether-options link-speed no kernel value) you may want to try to set the LACP 'link-speed' on both the AE and the GE interfaces, to get rid of that message.
1

Featured Post

Zoho SalesIQ

Hassle-free live chat software re-imagined for business growth. 2 users, always free.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

SSL is a very common protocol used these days when browsing the web.  The purpose is to provide security to communication, but how does it do it?  There are several pieces at work that have to be setup before SSL will even work and it requires both …
This is the first one of a series of articles I’ll be writing to address technical issues that are always referred to as network problems. The network boundaries have changed, therefore having an understanding of how each piece in the network  puzzl…
Viewers will learn how to properly install and use Secure Shell (SSH) to work on projects or homework remotely. Download Secure Shell: Follow basic installation instructions: Open Secure Shell and use "Quick Connect" to enter credentials includi…
After creating this article (http://www.experts-exchange.com/articles/23699/Setup-Mikrotik-routers-with-OSPF.html), I decided to make a video (no audio) to show you how to configure the routers and run some trace routes and pings between the 7 sites…

861 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

29 Experts available now in Live!

Get 1:1 Help Now