Solved

BGP over L2 MPLS

Posted on 2014-12-09
5
220 Views
Last Modified: 2014-12-13
Hello,

we are experiencing a weekly bgp disruption between two sites.  We are using cisco and a juniper device.  The topology looks like the following:

MX80 (Junos 11.4R7.5) – Arista DCS-7148S ---- Cogent Ckt (L2 MPLS)----- cisco 2960G 48TC-L--- ASA firewall --- Cisco 2800 IOS 12.4(23)

Usually the flap happens in the middle of the week at around 10PM - 12 AM, the bgp peering goes into active.  i have contacted the provider and they've confirmed that this is not a maintenance nor an outage on their end.  Attached are the graphs that we see from our PRTG and the provider's own graph during the time of the outage.  

our Graph was taken directly from the switch interface where our provider's ckt is connected to.  clearly, we see a noticeable drop and yet they don't see anything at all.  They can't explain the drop.  i can confirm as well that there are NO backups or anything that would saturate the link during those times.  They've also confirmed that we reach about 85 Mbit/s max on this link which we are provisioned for 1 Gb, so i don't think it's a bandwidth issue.  The only suggestion i've got from them was to call during the actual issue.

Have anyone encountered this issue before?  inputs are highly appreciated.
provider.jpg
internal-prtg.jpg
0
Comment
Question by:FREDARCE
  • 2
  • 2
5 Comments
 
LVL 17

Expert Comment

by:pergr
ID: 40491096
Clearly you need to troubleshoot during an outage, and I suggest you work according to the OSI layers.

1) are interfaces up or down
2) can you ping across
3) is the routing protocol up or down
...

If you can not ping from MX80 to c2800, perhaps, on the Arista and the 2960G you can also configure vlan/irb interfaces and try to ping through the cogent connection.
0
 

Author Comment

by:FREDARCE
ID: 40491180
It's difficult to troubleshoot during the outage as timing is unpredictable and the service is usually restored by the time we get a chance to look at it.
The interfaces never go down.
Our ping monitor reports no ping packets can go accross the line during outage.
BGP goes down

On the cisco side of the link, there are repeating log messages for :

bgp_cpu2timeout:

These messages are not during exact time of outage but do preceed the outage.

On the Juniper side, are entries for:

Dec  8 11:43:52  nytprdrtr-wan1 dcd[53371]: ae0 : Warning: aggregated-ether-options link-speed no kernel value! default to  0
Dec  8 11:43:52  nytprdrtr-wan1 dcd[53371]: ae1 : Warning: aggregated-ether-options link-speed no kernel value! default to  0

Other forums suggest the cisco bgp_cpu2timeout is more cosmetic so not sure if part of the cause.

Please Advise.
0
 
LVL 28

Expert Comment

by:mikebernhardt
ID: 40491192
I'd suggest you set up some kind of syslog monitoring and make sure that you are automatically paged/texted when those messages appear or when bgp drop messages appear (make sure the devices are set up to send those syslog traps). You need to troubleshoot when it's happening.
0
 

Author Comment

by:FREDARCE
ID: 40491209
We have syslog and traps in place.  Just haven't been able to get to because it's only about a 5 minute window that mostly occurs close to midnight.
0
 
LVL 17

Accepted Solution

by:
pergr earned 500 total points
ID: 40491268
If you can not ping the other side, it is not directly a BGP issue, but either the link or the CPU of the other device is too busy to even respond to ping.

I suggest you set up at least two IP addresses on each side, and ping those. That way you can figure out if the link between sites is up, and possible just the device that is running BGP has a CPU that is too busy.


For the Juniper messages (ae0 : Warning: aggregated-ether-options link-speed no kernel value) you may want to try to set the LACP 'link-speed' on both the AE and the GE interfaces, to get rid of that message.
1

Featured Post

Courses: Start Training Online With Pros, Today

Brush up on the basics or master the advanced techniques required to earn essential industry certifications, with Courses. Enroll in a course and start learning today. Training topics range from Android App Dev to the Xen Virtualization Platform.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I know for anybody starting from Beginner to Expert in Networking knows what OSI model. But this tutorial is for freshers or those who are new to networking world. Why I am putting OSI in such simple and compact manner is because it enables you to k…
Understanding FTPS File transfer is a common requirement in most Enterprises. While there are numerous ways to get a file from Point A to Point B over a network, perhaps the most common method still in use is FTP – File Transfer Protocol. FTP is …
After creating this article (http://www.experts-exchange.com/articles/23699/Setup-Mikrotik-routers-with-OSPF.html), I decided to make a video (no audio) to show you how to configure the routers and run some trace routes and pings between the 7 sites…
After creating this article (http://www.experts-exchange.com/articles/23699/Setup-Mikrotik-routers-with-OSPF.html), I decided to make a video (no audio) to show you how to configure the routers and run some trace routes and pings between the 7 sites…

816 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now