Link to home
Start Free TrialLog in
Avatar of jekautz
jekautz

asked on

DNS and Failover

I need to update my public DNS records so that in the event that my primary ISP is down, our incoming Web and FTP traffic will come through a secondary ISP line.  I understand their is a priority for MX records but I am uncertain about A-records.

How should I go about this?

Thanks in advance.
ASKER CERTIFIED SOLUTION
Avatar of Jamie McKillop
Jamie McKillop
Flag of Canada image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
If it is ok to distribute the requests, you can point the site to both IPs with a short TTL
You can then also use the suggestion of having a script that monitors the IPs and then dynamically updates the DNS record by removing/adding IPs to the record.
No, that won't work in the event of a failure. You can create two A records and use round robin but in the event of a failure, half the lookups will receive the IP that is down.

JJ
I also use DNS Made Easy. Works like a champ for exactly the same situation you are describing. Failover and failback work great, within the limitations of DNS and caching.
jj,

The two A record will pose the issue for the lookup coming back with the currently bad IP which is why I included the suggestion for the script to issue the removal.

You seem to be approaching the issue from the point of having your server pointing to IP1, how can I maintain the web site, other services up when IP1 goes down?

Having the services distributed by always pointing to both IPs while a script/mechanism takes out the IP that is no longer valid is an approach that
1: distributes your bandwidth usage accross both ISPs (unless undesirable)
2: maintains as close as an "continuous" uptime on the services.

Depending on the service and the client/browser used to access, a response with two records with one unreachable. could at times be seen as slow while reachable.

i.e. if you ftp to ftp.mydomain.com that has two ips IP1 (down) and IP2
The connection attempt to the first will be made and upon timeout an attempt to IP2 will be tried and succeed.
Similar appearance will be seen with a web browser either by virtue of the browser failing to reach the first trying the other or given the short TTL, the next DNS lookup, may alter the order in which the two IPs are presented.

set test.yourdomain.com with a TTL of 30 seconds pointing to IP1, IP2
then run
nslookup test.yourdomain.com
you'll see that they will be presented in alternating order once IP1 will be first, after a few queries IP2 will be presented first.
"The connection attempt to the first will be made and upon timeout an attempt to IP2 will be tried and succeed"

This is incorrect. The SMTP protocol is the only protocol that works like this. With all other record types, this is how it works...

If you create two A records for the same hostname, with different IPs, when clients make a query for that hostname, the DNS server will alternate which IP it hands out. If one of the IPs is unavailable this means that every second query will be to the IP that is down. That's every second query to all clients, not every second query to each client. Say we reduced the TTL to 30 seconds, like you suggest. That means every 30 seconds the client will query the DNS server. That doesn't necessarily mean the DNS server will return a different IP. With the two IP example, it if happens there are an odd number of client queries between an individual client's first and second query, that client will get the same IP back both times.

No matter how you look at it, clients are going to get random errors in client applications if we use this method. The only way to ensure uptime is to create a single A record and change the IP when an outage occurs. Depending on how your infrastructure is setup, you could leverage your own script to do this but if you are dealing with critical application with high exposure, I highly recommend you use a commercial service like DNSMadeEasy.

JJ
JJ,
yes if you do not implement a mechanism that removes the currently unavailable IP and then add it back in when it becomes available (note to not remove the last remaining IP even if it is down)
It is better to have a hostname to an IP that is unreachable at the moment versus a hostname without any record (hostname that does not exist)
The DNS cache treats the two differently a host record to an unreachable IP will expire based on the TTL for the record, caching of a non-existent hostname response is cached for a longer duration and may differ. Some references point to 15 minutes, some may use the default TTL in the zone.

When you have one record and it is down, you have 100% down until the replacement is added. When you have two, you have a 50% of users encountering and for a web/ftp the user will hit refresh and during the transition of removing the bad IP you will have at least some functionality.
OK, so if he wants his users to experience intermittent connection drops and hit or miss access to resources, he should round robin the DNS A records. If he wants to ensure his resources remain available and stable during an outage, he should use a service like DNSMadeEasy.

I'm not sure why anyone would would go to the expense of paying for a second ISP line for redundacy only to have 50% availability when the yearly cost of the highest tier of DNSMadeEasy is less than one month charge on a dedicated internet line. But, hey, if he wants to get his ass fired when the the first outage occurs and the CEO asks this same questions, have at it.

JJ
Avatar of jekautz
jekautz

ASKER

There is a lot of material here to cover, but I will attempt to respond to all of it.

On the subject of Round Robin.  I once employed it for a Citrix farm on a WAN and had the same problems that JJ described.  Not only would the clients cache the bad A record for too long, but the DNS servers would also hand out the bad A record.  I can see how setting a short TTL would help with this, but the DNS server will still hand out the bad record in each Round Robin cycle.

So moving on to the idea of scripting.  I can see how how this would work.  You drop the A record for the ISP line that is down, then you would have a Round Robin pool with the one and only good record.  But after that is all said and done, the end result wouldn't be any different than a service like DNSMadeEasy.  How many hours would I spend writing and testing the script?  DNSMadeEasy's service priced out at $30 per year.  It would cost the company far more for me to create this custom service.

Now onto my DNS records.  I am not the SOA for my domain and this is because the company we hired to create our website and maintain it claimed they needed to be the NS to provide the the level of service that we are paying for.  I never put up an argument as I only request a change to our DNS records like once every five years or so.  But I just asked the hostadmin if they provide any failover services like DNSMadeEasy and he said they are using Windows 2008 and are not able to do so.  He tossed out a couple ideas like creating a subdomain and forward the FTP A record to another NS, but said that likely wouldn't work.

I am certain I can wrangle my DNS records away from this service provider if need be, but before I do that is there someway I could have DNSMadeEasy answer for ftp.mydomain.com but have our current NS answer for *.mydomain.com?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of jekautz

ASKER

There are three good, viable solutions presented here.  The current hostmaster has no problems giving me the records, so I will be going with JJ's solution.  Thanks to everyone who helped.