• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 433
  • Last Modified:

Redundant WAN links

I am setting up a short-haul WAN IP connection between two buildings. I have a 100Mbps wireless infared laser system as my primary link, and a leased T1 as my backup link. I plan to use Linux boxes as routers on both ends. Both the T1 and laser connections will be back-to-back between the two Linux boxes.

We expect that the laser system will go down on a regular basis, for example during heavy rain storms. I want to be able to revert to the T1 with no interruption to client machines and with the absolute minimum possible number of lost packets. The T1 is always up.

I know that this should be doable with routed, gated, etc, but I don't really know how they work. I know the basics of IP routing, the piece I'm missing is how to set up routing for two links so that packets travel over whichever one is running.

Can someone tell me how this is all supposed to work?
1 Solution
i suggest you use static route betewwen the 2 linux box.

if you only use static route. then just use "route add" command to add a new route and use "route delete " to remove the original route.  and you need change the static route table of the machines of the 2 networks.

if you use rip in the 2 networks. then you need write a gated.conf to broadcast the rip information to all other boxes in the 2 networks. the /etc/gated.conf.sample is a good sample for you
ghjmAuthor Commented:
Thanks for your comment. Of course, I already know I could use static routes and change them manaully when the connection goes down. What I am looking for is someone with practical experience in dynamic routing to help me set expectations and plan what kind of configuration to use.
As much as it breaks my heart to admit it, Linux is _not_ the answer here.  In the latest and greatest kernel release there is experimental support for the kinds of things you will need to make this work.  If you are supporting a mission critical environment (why else use redundant links?) you need something a little more robust than experimental Linux support.

I strongly suggest that you consider a pair of commercial routers, that support OSPF.  Both Bay and Cisco make such routers.

If you really want redundancy, purchase 4 Cisco 2500 series routers and configure a pair at each end of the link with HSRP (Hot-Standby Router Protocol).  HSRP boasts a failover time of as little as three seconds.  Best of all, the clients never know that one or other piece of hardware failed.
Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

ghjmAuthor Commented:
Thanks for the suggestion. Yes, I have a mission critical environment, but there are other factors at work. The setup is for a temporary connection while we transfer labs from one facility to another. I don't want to invest $10,000 or more in Cisco routers, particularly since I will have no use for them when the move is complete.

I have actually got something to work with a failover time of about 20 seconds, which is adequate. It is a bit of a hack job, though, so if someone who knows gated in detail happens to show up, I'd still like to know how to configure what I've described. I am baffled by gated.
I wouldn't hesitate to recommend Linux boxes for this routing task. And you will not be the first in this area ...
I am not sure about the T1 interfaces; I presume you have high speed serial cards.
If both interfaces, 100 Mbps to your laser and the T1, are (supposed to be) connected to one Linux box and at the opposite side you have another Linux terminating these two lines.
If this is the situation, why use routing protocol. Just add static routes, set higher preference to your faster line and lower preference to T1. As these are directly attached interfaces you should get immediate failover at the moment when the lernel detect the laser is down. And the recolvery should be instantaneous.
If you insist on routing protocols, make sure you understand the difference between link-state and distance-vector routing protocols and know what to expect from them.

OSPF as it was suggested provide you with near instantaneous failover, consume more memory and CPU power, but this should not be an issue with your two Linux boxes and two lines. On UNIX machines there is gated that supports OSPF.

RIP (and its updated variant RIP2) is distance-vector protocol and by definition (RFC1058) you may have to wait up to 30 sec before updates are propagated, unless the router (or the routing process on UNIX box) supports split horizont and reverse poisoning features. Also, in case a router suddenly goes down, then the timeout is 6 times 30 sec, e.g. 180 sec. There is routed on every UNIX (and Linux) and it is much easier to configure than gated.

But again, in your simple configuiration, RIP's timeouts should not worry you. Your Linux boxes are directly attached and the routing process wiil be immediately notified (and should switch to the operating link).

Routed shouldn't take you more than 15 minutes (including reading the man page).
ghjmAuthor Commented:
I don't see any way to set a preference on a route. The man page for route says that "metric" is not used by recent kernels. How would you go about doing this?
The kernel I have is possibly out of date, 2.0.31 and I see something like this in ifconfig man page, but not a word in route man page. I take your words for granted, and here is what I would do:

1) If interface metric is not taken into account, this means you are not able to do load balancing between those two Linux boxes.

2) if the metric of each entry in the routing table is not taken into account while routing IP packest, then Linux wouldn't able to perform  useful routing functions. I don't believe this is the case.

There one more thing to consider -- you are not interested to use load balancing between the Linux boxes for traffic originates and terminates at these two machines. You want to direct through the better link only the transit traffic. You may still want to try 'route add ...' with different metrics to see the result.

If this is not workable, then routing protocol will do the job. Here I'll repeat that RIP should do the job as well as OSPF, but it will be easier to configure. The trick with RIP is (you can read in the man page) that it keeps only the best route for each destination it is aware of. If we go back, even if the metric of the interface is not used for load balancing, it may be distributed in the routing updates, or you can use /etc/gateways to assign metric for those two destinations -- laser and T1 interfaces of the opposite Linux box. This way, the updates that come through the laser will have higher preference then those received via the T1. No matter which one is received first, after 30 seconds only the laser interface should be in the routing table. In case the laser goes down, after no more than 30 sec the neighbouring Linux will send RIP update message through the T1, and here you will have a route. When the laser is back in service, the neighbour will send update, announcing a new route is available, and the routing table will be updated accordingly.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now