BFD test with OSPF

Dear colleagues,

I am trying to test Bidirectional Forwarding Detection (BFD) protocol with the OSPF routing protocol on a Cisco 10720 router running Cisco IOS release 12.0(32)S14.  BFD has been configured as follows on both ends of the OSPF routing domain:

Step 1: enable BFD on per OSPF routing domain interface

bfd interval 150 min_rx 150 multiplier 3

The above command configures the BFD detect time to a maximum of 150ms * 3, i.e. 450 miliseconds.

Step 2: enable BFD on the OSPF routing process

router ospf 100
  bfd all-interfaces

The above command enables BFD on OSPF for all BFD enabled interfaces in the OSPF domain.
-------------------------------------------------------
I used the "show ip ospf neigh" command to check that OSPF is fully operational. I also used "show bfd neigh" or "show bfd neigh details" to check that the bfd protocol is fully functional.

router#sh bfd neighbors details
Cleanup timer hits: 0

OurAddr       NeighAddr     LD/RD RH/RS Holddown(mult)  State     Int
10.0.0.3   10.0.0.1   1/1   Up       430  (3 )    Up        Gi2/1
Session state is UP and not using echo function.
Local Diag: 0, Demand mode: 0, Poll bit: 0
MinTxInt: 150000, MinRxInt: 150000, Multiplier: 3
Received MinRxInt: 150000, Received Multiplier: 3
Holddown (hits): 430(0), Hello (hits): 150(20992)
Rx Count: 20955, Rx Interval (ms) min/max/avg: 116/180/136 last: 20 ms ago
Tx Count: 20965, Tx Interval (ms) min/max/avg: 116/168/136 last: 92 ms ago
Elapsed time watermarks: -1 0 (last: 0)
Last packet: Version: 1            - Diagnostic: 0
             State bit: Up         - Demand bit: 0
             Poll bit: 0           - Final bit: 0
             Multiplier: 3         - Length: 24
             My Discr.: 1          - Your Discr.: 1
             Min tx interval: 150000    - Min rx interval: 150000
             Min Echo interval: 0
Registered protocols: OSPF
Uptime: 00:47:39
Pseudo pre-emptive process count: 351293 min/max/avg: 8/40/8 last: 0 ms ago
Interrupt send count: 20963 min/max/avg: 116/180/136 last: 140 ms ago
 Total Adjs Found: 1
 Total BFD sessions: 1
------------------------------

I have also shut down the OSPF interface with the "debug bfd event" turned on. But I cannot make sense of the output to be sure that the detect time as configured, is working as expected.

*Apr 23 00:52:33.837: %OSPF-5-ADJCHG: Process 100, Nbr 1.1.1.1 on GigabitEthernet2/1 from FULL to DOWN, Neighbor Down: Interface down or detached
*Apr 23 00:52:33.837: bfdV1FSM e:6 s:3
*Apr 23 00:52:33.837: BFD: switching timestamps from 1
*Apr 23 00:52:33.837: Session [10.0.0.3,10.0.0.1,Gi2/1,3], event Session delete, state UP -> ADMIN DOWN
*Apr 23 00:52:33.837: BFD:* Could not send bfd packet to 10.0.0.1. result 9
*Apr 23 00:52:33.849: %LINEPROTO-5-UPDOWN: Line protocol on Interface
c10720-Eth-Agg#GigabitEthernet2/1, changed state to down
*Apr 23 00:52:33.853: BFD: switching timestamps on the requested sessions.
*Apr 23 00:52:34.617: BFD:* Could not send bfd packet to 10.0.0.1. result 9
*Apr 23 00:52:34.981: %SYS-5-CONFIG_I: Configured from console by console
*Apr 23 00:52:35.493: BFD:* Could not send bfd packet to 10.0.0.1. result 9
*Apr 23 00:52:35.825: %LINK-5-CHANGED: Interface GigabitEthernet2/1, changed state to administratively down
*Apr 23 00:52:36.349: BFD:* Could not send bfd packet to 10.0.0.1. result 9
*Apr 23 00:52:36.841: bfdV1FSM e:4 s:0
*Apr 23 00:52:36.841: BFD Adj Delinked: Enqueued for Delete for Neighbor 10.0.0.1
*Apr 23 00:52:58.321: %BGP-5-ADJCHANGE: neighbor 1.1.1.1 Down BGP Notification sent
*Apr 23 00:52:58.321: %BGP-3-NOTIFICATION: sent to neighbor 1.1.1.1 4/0 (hold time expired) 0 bytes

----------------------------------
I am sorry for the long post but I am trying to provide enough information on the problem.  What I a looking for, is a way of measuring that the detect time is working very well from end to end. So am I using the correct debugging, if so, which events are required / can be tracked to be sure that I am getting to correct detect time behaviour? The debugging timestamp is provided in millisecond.

Thanks very much in advance.

Koudry
LVL 10
koudryAsked:
Who is Participating?
 
harbor235Connect With a Mentor Commented:


BFD was developed to provide SONET like sub second failure detection and convergence times in a routing protocol, if your test includes only two routers and you shut down the link between them then I would look at the following;

1) make sure timing is synchronized on each router via the same time source
2) Add a loopbackto R1  and add it to OSPF and the BFD config
3) shut down the interface and note the time in R1 via the log and BFD events
4) On R2 view the BFD and OSPF events to verify the time it took to converge OSPF, in the case
     look how long it took to withdraw the R1 loopback network from OSPF

harbor235 ;}


0
 
harbor235Commented:
There is one adjacency and one BFD session

Did you shut down the interface between the two OSPF peers? g2/1 ?

Need more info on your physical topoology.

I would think that you would like to look at the peer on the other side of the link you shutdown to verify the times.

harbor235 ;}
0
 
koudryAuthor Commented:
Replying to harbor235:

The topology is as simple as follows:

CPE (RIP)------>C10720 Access router (OSPF)g2/1---------g2/1>C10720 Core router (OSPF)

BFD was enabled on the OSPF participating interfaces between the two OSPF domain routers, i.e. the access and core routers.  The BFD configuration is identical on both devices.

Yes, I did shut down the OSPF participating interface on the access router in order to trigger the BFD events.  The objective was to measure the BFD detect time which has been configured as a maximum of 450ms (i.e. 150ms * 3).

The only problem is that I don't know how to read the detect time from the millisecond timestamped events from the "debug bfd event" output when the interface was shut down.

As you said, the information I am looking for may be available on the remote end (core router). Will the information be available from one of the debugging commands, e.g. "debug bfd events" or "debug bfd packet"?

With the participating interface shut down on the access router, the session will be dead on the core router. Commands such as "sh bfd neigh" or "sh bfd neigh details" will show 0 session.

Thanks,

Koudry
0
Cloud Class® Course: Microsoft Azure 2017

Azure has a changed a lot since it was originally introduce by adding new services and features. Do you know everything you need to about Azure? This course will teach you about the Azure App Service, monitoring and application insights, DevOps, and Team Services.

 
chad_rCommented:
It's been quite a while since I've done any iptables stuff, so was going to leave this but since nobody has attempted an answer, I figured I'd throw a couple thoughts out there.  Who knows, perhaps one will help.

Since it's working prior, after you enable it, I would run "iptables -L" and take a look at chains, and compare them to whatever you think should be happening to ensure they look ok.  You could post the output here as well.  

You could also try to flush the table which may allow traffic to start flowing again, unless your default policy is drop.  You can attempt this via "iptables -F"  You should still be able to reboot after this and restore your rules, but this might show if there is a particular rule causing the issue, and allow you to start playing with the rules without having to reboot every time.
0
 
chad_rCommented:
er, oops.  Hehe, disregard my post please.  Not sure how I managed to post a response to the incorrect thread.  :-/
0
 
koudryAuthor Commented:
Replying to harbor235:
Both devices configured with BFD and OSPF, already have loopback interface configured on them and these Loopback interfaces are part of the OSPF routing domain. So I think I am already doing some of the suggestions you have made here.
One thing I am not clear with, is how I make sure that time is synchronised on both routers. Are you referring to the clock configuration on the routers?
As you can see from the initial question, I have turned on the BFD event debugging before shutting down the WAN interface that link both devices in the OSPF routing domain. So do I also need to turn on any further debugging relating to OSPF in order to get more granularity on the behaviour of the BFD routing protocol?
I think BFD is working correctly, I just need to be able to measure with near accuracy the convergence time of OSPF, a way to determine if the BFD protocol has shortened this convergence time.
Please advise further if you can.
Thanks.
Koudry
 
0
 
harbor235Commented:

>One thing I am not clear with, is how I make sure that time is synchronised on both routers. Are you referring to the clock configuration on the routers?

Yes, if the routers are operating with independant time sources you will not be able to accurately record
the time it takes to converge. For the sake of your test you can configure onwe of the routers as the clock master and the other will synchoronize it's time with the master, onced synchronized they will both have the same time within a small margin of error.

debug bfd event
debug ip ospf event

Here is a good doc;

http://www.cisco.com/en/US/technologies/tk648/tk365/tk480/technologies_white_paper0900aecd80244005_ps6599_Products_White_Paper.html

harbor235 ;}
0
 
koudryAuthor Commented:
Hello,
At the moment, the time / clock synchronisation is achieved using an NTP server as follows:
  • ntp source Loopback<n>
  • ntp server A.A.A.A   ---> address of alternative NTP server
  • ntp server B.B.B.B prefer    ---> address of preferred NTP server
So if I understand, you are suggesting to use one of the two kits as the NTP server using the loopback address of the master as the preferred NTP server. Is that correct? I am going to make this change to the config to see what happens.
Thanks,
Koudry
0
 
harbor235Commented:

thats perfect, so what we are trying to do is the following; we are making sure that time from one box to the other is in synch so we can properly quantify how long it takes to converge, otherwise the time difference between the two devices will add/subtract time makeing the covergence number inaccurate.

harbor235 ;}
0
 
koudryAuthor Commented:
Hello harbor235:
Thanks for your assistance. I know this probem is not about NTP, but I would like to resolve the NTP clock synchronisation issue before I move back to checking the bahaviour of BFD.
I have used the Loopback<n> of our of the devices as the preferred NTP server on the second device as follows:
On the access router:
ntp source Loopback10
ntp server <loopback10 IP of core router> prefer
The question, how do I configure NTP on the core router, i.e. do I use the Loopback 10 of the access router as NTP server for the core router, or do I use the Loopback 10 IP address of the core router itself. Alternatively, do I have to use a totally different address?
The reason for the question is that, I am getting "Clock is unsynchronized" message on the access router when I do "sh ntp status".
"sh ntp associations" did confirm that the master ntp server is the core router:

sh ntp associations
      address         ref clock     st  when  poll reach  delay  offset    disp
 ~<IP of master>    0.0.0.0          16     7    64    0     0.0    0.00  16000.
 * master (synced), # master (unsynced), + selected, - candidate, ~ configured
 
Thanks.
Koudry
0
 
harbor235Commented:

So I assume we will make the core router the ntp master, so;

core
config t
ntp master
exit
Verify that ntp is started up on the correct loopback "show ip sockets"

this should get things going, all clients should use the core IP in their "ntp server" commands

harbor235 ;}
0
 
koudryAuthor Commented:
Hello,
The core is indeed the master NTP server. It has been configured as follows:
ntp clock-period 17179896
ntp source Loopback10
ntp master 3
ntp peer <Loopback IP of access router, i.e. NTP client>
ntp server <Loopback 10 address of the master>

The problem here is that I am in a lab so I don't have access to the Internet to synchronise the time with an external NTP server from the master point of view. So I am using the Loopback 10 IP address of the master as its own NTP server. This may not be correct.
I am also configuring the peer to point to the NTP client, in this case the access router. Here again, I am not sure if that is correct.
The access router is configured as follows:
ntp source Loopback10
ntp server <Loopback 10 address of the master>

In terms of operations, i.e. checking with "show ip sockets", I can see that the UDP port 123 is open as follows:
On the NTP master:
Proto    Remote      Port      Local       Port  In Out Stat TTY OutputIF
 17   --listen--          XX.XX.XX.XX      123   0   0    1   0

where XX.XX.XX.XX is the Loopback 10 IP address of the master.
On the NTP client (access router):

Proto    Remote      Port      Local       Port  In Out Stat TTY OutputIF
 17   --listen--          XX.XX.XX.XX       123   0   0    1   0

 
where XX.XX.XX.XX is the Loopback 10 IP address of the client.
I think I am not getting synchronisation because maybe the master is not connected to the outside world, but I don't know enough about NTP to troubleshoot.
Thanks,
Koudry
0
 
harbor235Commented:


Yeah, you don't need to do that, it does not really matter that the time is correct, what matter is that they all have the same time, rememeber, with BFD the covergence times are sub second potentially.

The time is the time on the core, whatever it is its fine, all you need to do is sync the clients

As far as troublshooting goes, all you need to do is make sure the loopbacks are routeable for all systems, e.g clients can get to core loopback 10 and core can get to the clients, you specified loopback 10 for them as well to source their ntp traffic, there is no need to do that but will work as long as the client loopbacks are reachable as well.

harbor235 ;}

0
 
koudryAuthor Commented:
Hi harbor235:
Thanks for this. I am navigating between tasks so, I will be a bit slow replying. I will provide further updates later.
0
 
koudryAuthor Commented:
To harbor235:

Thanks for your assistance on this question. I have not been able to do further tests on this subject, but I am hoping to get back to it at some point. However, I do not wish to keep the question open forever. I think the thing to take away from here as far as testing BFD is concerned, is as follow:

(1) Configure BFD on the IGP (e.g. OSPF)

router ospf 100
  bfd all-interfaces

(2) Configure BFD on the individual interfaces that belong to the IGP routing domain


interface GigabitEthernetx/x
  bfd interval 150 min_rx 150 multiplier 3

NB: these figures are configurable. Here the BFD detect time is configured at 450 milliseconds, i.e. 150*3

(3) Use "sh bfd neigh" or "sh bfd neigh detail" command to verify the operational states of BFD

(4) "Debug bfd events" may also help. Debugging the IGP (e.g. OSPF) event may also help.

Since we are dealing with millisecond, it may help to configure the debugging time to milliseconds as follows:

service timestamps debug datetime msec
service timestamps log datetime msec


(5) A dependent feature is NTP: there is a need to make sure that time is synchronised on the BFD participating devices in order to read the time of the events correctly from end to end.

This is not a final answer to this question, so I welcome feedback from members.

Thanks.

Koudry
0
 
harbor235Commented:
Koudry,

Feel free to email me on my outside email if you have future questions.

harbor235 ;}
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.