asked on

Linux Wan Based Failover

Ok,

I've been running into a problem. I need to setup a failover system across multiple public ip address. I know this requires dns changes. Since i knew that from the beginning of this project I ensured the domain we use was registered with a Dynamic DNS Provider. Now our service needs to provide a very reliable up time so the server themselves are sitting in different data centers on opposite sides of the country.

Now last I knew Linux-HA did not support WAN and I need a heartbeat monitor for Apache and MySQL that functions on WAN. I would greatly appreciate any advise or insight into this problem.

noci

Linux-HA assumes that the heartbeat interface is immediately connected to the other system. So interface down actualy means other node is down.
As soon as you insert a switch in between there is a problem if the Heartbeat switch goes down, both systems still continue to work (thinking the other is down) This is called a split-brain issue.

This can be solved but you have to look into a different venue. You need some 3rd system that controbutes a vote to your cluster. (RHEL/CentOS based cluster, using DLM...) then you can use pacemaker to manage the load when needed.
Here there is no assumption of that connection where you can see the other system is actualy down.
If you want a really bulltproof solution checkout OpenVMS.

ravenpl

You definitively will not do that without third system.
And BTW: how do You synchronize databases?

noci

Like i said, OpenVMS does this trick allready >25 years also long distance so nothing realy new there.

I known linux is fresh into this kind of cluster business,
A GFS shared disk might be needed to share that database, on mirrored devices over all locations.

Pyromanci

ASKER

Sorry for the late reply, became really busy here.
The databases are synced with master to master replication. the sync it's self is check via a script I wrote that runs at the end of the day to validate information between the 2.

The split brain issue actually is not a concern for me. When i have used HA in the past i've used it inconjuction with DRBD and I had it set to never switch back over if the primary node came back online. reason being was I had to let DRBD get caught up on the master and doing some validation checking on it. then I would manually tell it take over out side of business hours.

See the problem we have right now is every now or then 1 of 2 things will happen. A). The iptables on the machine become overloaded and lock up (this is due to heavy hacking attempt traffic that just overload the nic). B). Our current host provider has a issue with their network at the data center (this is not their fault it's a issue with their backbone provider and they working on resolving issues, though the cause is unknown at the moment).

So when one of those things happen. Typically I go through and do the DNS change to point the secondary server. though this could be anywhere from 10minutes to a hour after the problem has occurred.

ASKER CERTIFIED SOLUTION

noci

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Pyromanci

ASKER

Wasn't a complete solution, but pointed me the direction i needed to go.