asked on

BFD test with OSPF

Dear colleagues,

I am trying to test Bidirectional Forwarding Detection (BFD) protocol with the OSPF routing protocol on a Cisco 10720 router running Cisco IOS release 12.0(32)S14. BFD has been configured as follows on both ends of the OSPF routing domain:

Step 1: enable BFD on per OSPF routing domain interface

bfd interval 150 min_rx 150 multiplier 3

The above command configures the BFD detect time to a maximum of 150ms * 3, i.e. 450 miliseconds.

Step 2: enable BFD on the OSPF routing process

router ospf 100
bfd all-interfaces

The above command enables BFD on OSPF for all BFD enabled interfaces in the OSPF domain.
-------------------------------------------------------
I used the "show ip ospf neigh" command to check that OSPF is fully operational. I also used "show bfd neigh" or "show bfd neigh details" to check that the bfd protocol is fully functional.

router#sh bfd neighbors details
Cleanup timer hits: 0

OurAddr NeighAddr LD/RD RH/RS Holddown(mult) State Int
10.0.0.3 10.0.0.1 1/1 Up 430 (3 ) Up Gi2/1
Session state is UP and not using echo function.
Local Diag: 0, Demand mode: 0, Poll bit: 0
MinTxInt: 150000, MinRxInt: 150000, Multiplier: 3
Received MinRxInt: 150000, Received Multiplier: 3
Holddown (hits): 430(0), Hello (hits): 150(20992)
Rx Count: 20955, Rx Interval (ms) min/max/avg: 116/180/136 last: 20 ms ago
Tx Count: 20965, Tx Interval (ms) min/max/avg: 116/168/136 last: 92 ms ago
Elapsed time watermarks: -1 0 (last: 0)
Last packet: Version: 1 - Diagnostic: 0
State bit: Up - Demand bit: 0
Poll bit: 0 - Final bit: 0
Multiplier: 3 - Length: 24
My Discr.: 1 - Your Discr.: 1
Min tx interval: 150000 - Min rx interval: 150000
Min Echo interval: 0
Registered protocols: OSPF
Uptime: 00:47:39
Pseudo pre-emptive process count: 351293 min/max/avg: 8/40/8 last: 0 ms ago
Interrupt send count: 20963 min/max/avg: 116/180/136 last: 140 ms ago
Total Adjs Found: 1
Total BFD sessions: 1
------------------------------

I have also shut down the OSPF interface with the "debug bfd event" turned on. But I cannot make sense of the output to be sure that the detect time as configured, is working as expected.

*Apr 23 00:52:33.837: %OSPF-5-ADJCHG: Process 100, Nbr 1.1.1.1 on GigabitEthernet2/1 from FULL to DOWN, Neighbor Down: Interface down or detached
*Apr 23 00:52:33.837: bfdV1FSM e:6 s:3
*Apr 23 00:52:33.837: BFD: switching timestamps from 1
*Apr 23 00:52:33.837: Session [10.0.0.3,10.0.0.1,Gi2/1,3], event Session delete, state UP -> ADMIN DOWN
*Apr 23 00:52:33.837: BFD:* Could not send bfd packet to 10.0.0.1. result 9
*Apr 23 00:52:33.849: %LINEPROTO-5-UPDOWN: Line protocol on Interface
c10720-Eth-Agg#GigabitEthernet2/1, changed state to down
*Apr 23 00:52:33.853: BFD: switching timestamps on the requested sessions.
*Apr 23 00:52:34.617: BFD:* Could not send bfd packet to 10.0.0.1. result 9
*Apr 23 00:52:34.981: %SYS-5-CONFIG_I: Configured from console by console
*Apr 23 00:52:35.493: BFD:* Could not send bfd packet to 10.0.0.1. result 9
*Apr 23 00:52:35.825: %LINK-5-CHANGED: Interface GigabitEthernet2/1, changed state to administratively down
*Apr 23 00:52:36.349: BFD:* Could not send bfd packet to 10.0.0.1. result 9
*Apr 23 00:52:36.841: bfdV1FSM e:4 s:0
*Apr 23 00:52:36.841: BFD Adj Delinked: Enqueued for Delete for Neighbor 10.0.0.1
*Apr 23 00:52:58.321: %BGP-5-ADJCHANGE: neighbor 1.1.1.1 Down BGP Notification sent
*Apr 23 00:52:58.321: %BGP-3-NOTIFICATION: sent to neighbor 1.1.1.1 4/0 (hold time expired) 0 bytes

----------------------------------
I am sorry for the long post but I am trying to provide enough information on the problem. What I a looking for, is a way of measuring that the detect time is working very well from end to end. So am I using the correct debugging, if so, which events are required / can be tracked to be sure that I am getting to correct detect time behaviour? The debugging timestamp is provided in millisecond.

Thanks very much in advance.

Koudry

harbor235

There is one adjacency and one BFD session

Did you shut down the interface between the two OSPF peers? g2/1 ?

Need more info on your physical topoology.

I would think that you would like to look at the peer on the other side of the link you shutdown to verify the times.

harbor235 ;}

koudry

ASKER

Replying to harbor235:

The topology is as simple as follows:

CPE (RIP)------>C10720 Access router (OSPF)g2/1---------g2/1>C10720 Core router (OSPF)

BFD was enabled on the OSPF participating interfaces between the two OSPF domain routers, i.e. the access and core routers. The BFD configuration is identical on both devices.

Yes, I did shut down the OSPF participating interface on the access router in order to trigger the BFD events. The objective was to measure the BFD detect time which has been configured as a maximum of 450ms (i.e. 150ms * 3).

The only problem is that I don't know how to read the detect time from the millisecond timestamped events from the "debug bfd event" output when the interface was shut down.

As you said, the information I am looking for may be available on the remote end (core router). Will the information be available from one of the debugging commands, e.g. "debug bfd events" or "debug bfd packet"?

With the participating interface shut down on the access router, the session will be dead on the core router. Commands such as "sh bfd neigh" or "sh bfd neigh details" will show 0 session.

Thanks,

Koudry

chad_r

It's been quite a while since I've done any iptables stuff, so was going to leave this but since nobody has attempted an answer, I figured I'd throw a couple thoughts out there. Who knows, perhaps one will help.

Since it's working prior, after you enable it, I would run "iptables -L" and take a look at chains, and compare them to whatever you think should be happening to ensure they look ok. You could post the output here as well.

You could also try to flush the table which may allow traffic to start flowing again, unless your default policy is drop. You can attempt this via "iptables -F" You should still be able to reboot after this and restore your rules, but this might show if there is a particular rule causing the issue, and allow you to start playing with the rules without having to reboot every time.

ASKER CERTIFIED SOLUTION

harbor235

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

chad_r

er, oops. Hehe, disregard my post please. Not sure how I managed to post a response to the incorrect thread. :-/

koudry

ASKER

Replying to harbor235:
Both devices configured with BFD and OSPF, already have loopback interface configured on them and these Loopback interfaces are part of the OSPF routing domain. So I think I am already doing some of the suggestions you have made here.
One thing I am not clear with, is how I make sure that time is synchronised on both routers. Are you referring to the clock configuration on the routers?
As you can see from the initial question, I have turned on the BFD event debugging before shutting down the WAN interface that link both devices in the OSPF routing domain. So do I also need to turn on any further debugging relating to OSPF in order to get more granularity on the behaviour of the BFD routing protocol?
I think BFD is working correctly, I just need to be able to measure with near accuracy the convergence time of OSPF, a way to determine if the BFD protocol has shortened this convergence time.
Please advise further if you can.
Thanks.
Koudry

harbor235

>One thing I am not clear with, is how I make sure that time is synchronised on both routers. Are you referring to the clock configuration on the routers?

Yes, if the routers are operating with independant time sources you will not be able to accurately record
the time it takes to converge. For the sake of your test you can configure onwe of the routers as the clock master and the other will synchoronize it's time with the master, onced synchronized they will both have the same time within a small margin of error.

debug bfd event
debug ip ospf event

Here is a good doc;

http://www.cisco.com/en/US/technologies/tk648/tk365/tk480/technologies_white_paper0900aecd80244005_ps6599_Products_White_Paper.html

harbor235 ;}

koudry

ASKER

Hello,
At the moment, the time / clock synchronisation is achieved using an NTP server as follows:

ntp source Loopback<n>
ntp server A.A.A.A ---> address of alternative NTP server
ntp server B.B.B.B prefer ---> address of preferred NTP server

So if I understand, you are suggesting to use one of the two kits as the NTP server using the loopback address of the master as the preferred NTP server. Is that correct? I am going to make this change to the config to see what happens.
Thanks,
Koudry

harbor235

thats perfect, so what we are trying to do is the following; we are making sure that time from one box to the other is in synch so we can properly quantify how long it takes to converge, otherwise the time difference between the two devices will add/subtract time makeing the covergence number inaccurate.

harbor235 ;}

koudry

ASKER

Hello harbor235:
Thanks for your assistance. I know this probem is not about NTP, but I would like to resolve the NTP clock synchronisation issue before I move back to checking the bahaviour of BFD.
I have used the Loopback<n> of our of the devices as the preferred NTP server on the second device as follows:
On the access router:
ntp source Loopback10
ntp server <loopback10 IP of core router> prefer
The question, how do I configure NTP on the core router, i.e. do I use the Loopback 10 of the access router as NTP server for the core router, or do I use the Loopback 10 IP address of the core router itself. Alternatively, do I have to use a totally different address?
The reason for the question is that, I am getting "Clock is unsynchronized" message on the access router when I do "sh ntp status".
"sh ntp associations" did confirm that the master ntp server is the core router:

sh ntp associations
address ref clock st when poll reach delay offset disp
~<IP of master> 0.0.0.0 16 7 64 0 0.0 0.00 16000.
* master (synced), # master (unsynced), + selected, - candidate, ~ configured

Thanks.
Koudry

harbor235

So I assume we will make the core router the ntp master, so;

core
config t
ntp master
exit
Verify that ntp is started up on the correct loopback "show ip sockets"

this should get things going, all clients should use the core IP in their "ntp server" commands

harbor235 ;}

koudry

ASKER

Hello,
The core is indeed the master NTP server. It has been configured as follows:
ntp clock-period 17179896
ntp source Loopback10
ntp master 3
ntp peer <Loopback IP of access router, i.e. NTP client>
ntp server <Loopback 10 address of the master>

The problem here is that I am in a lab so I don't have access to the Internet to synchronise the time with an external NTP server from the master point of view. So I am using the Loopback 10 IP address of the master as its own NTP server. This may not be correct.
I am also configuring the peer to point to the NTP client, in this case the access router. Here again, I am not sure if that is correct.
The access router is configured as follows:
ntp source Loopback10
ntp server <Loopback 10 address of the master>

In terms of operations, i.e. checking with "show ip sockets", I can see that the UDP port 123 is open as follows:
On the NTP master:
Proto Remote Port Local Port In Out Stat TTY OutputIF
17 --listen-- XX.XX.XX.XX 123 0 0 1 0

where XX.XX.XX.XX is the Loopback 10 IP address of the master.
On the NTP client (access router):

Proto Remote Port Local Port In Out Stat TTY OutputIF
17 --listen-- XX.XX.XX.XX 123 0 0 1 0

where XX.XX.XX.XX is the Loopback 10 IP address of the client.
I think I am not getting synchronisation because maybe the master is not connected to the outside world, but I don't know enough about NTP to troubleshoot.
Thanks,
Koudry

harbor235

Yeah, you don't need to do that, it does not really matter that the time is correct, what matter is that they all have the same time, rememeber, with BFD the covergence times are sub second potentially.

The time is the time on the core, whatever it is its fine, all you need to do is sync the clients

As far as troublshooting goes, all you need to do is make sure the loopbacks are routeable for all systems, e.g clients can get to core loopback 10 and core can get to the clients, you specified loopback 10 for them as well to source their ntp traffic, there is no need to do that but will work as long as the client loopbacks are reachable as well.

harbor235 ;}

koudry

ASKER

Hi harbor235:
Thanks for this. I am navigating between tasks so, I will be a bit slow replying. I will provide further updates later.

koudry

ASKER

To harbor235:

Thanks for your assistance on this question. I have not been able to do further tests on this subject, but I am hoping to get back to it at some point. However, I do not wish to keep the question open forever. I think the thing to take away from here as far as testing BFD is concerned, is as follow:

(1) Configure BFD on the IGP (e.g. OSPF)

router ospf 100
bfd all-interfaces

(2) Configure BFD on the individual interfaces that belong to the IGP routing domain

interface GigabitEthernetx/x
bfd interval 150 min_rx 150 multiplier 3

NB: these figures are configurable. Here the BFD detect time is configured at 450 milliseconds, i.e. 150*3

(3) Use "sh bfd neigh" or "sh bfd neigh detail" command to verify the operational states of BFD

(4) "Debug bfd events" may also help. Debugging the IGP (e.g. OSPF) event may also help.

Since we are dealing with millisecond, it may help to configure the debugging time to milliseconds as follows:

service timestamps debug datetime msec
service timestamps log datetime msec

(5) A dependent feature is NTP: there is a need to make sure that time is synchronised on the BFD participating devices in order to read the time of the events correctly from end to end.

This is not a final answer to this question, so I welcome feedback from members.

Thanks.

Koudry

harbor235

Koudry,

Feel free to email me on my outside email if you have future questions.

harbor235 ;}