Link to home
Start Free TrialLog in
Avatar of BSODx64
BSODx64

asked on

Very odd Microsoft DNS issue

G'day all.

This is my first posting and it's a bit long so be gentle with me :)

I encountered a very odd DNS issue the other day and am stumped as to what occured, so I am hoping that someone where might be able to explain what happened and how to correct it. Here is our setup. We have one fibre connection and one ADSL connection. The fibre is our primary internert feed. The ADSL is for testing purposes and is provided by a different ISP. There is no bridging between the fibre and ADSL feeds.

We have two sets of DNS servers, one for our internal AD domain, the other for external DNS requests. I'll refer to these as our "private" and "public" DNS servers respectively. The public DNS servers run on a pair of Windows 2003 servers. One is located at our primary site, the other at our DR site. Both are on different networks. We also use our ISPs DNS servers as a thrid "public" DNS server. I'll refer to these as ns1, ns2 and ns3 respectively. NS1 is the primary, NS2 and NS3 are secondaries. NS1 will only allow transfers to NS2 and NS3. NS3 uses BIND under linux whereas NS1 and NS2 are using the standard DNS server component that comes with Windows 2003.

The issue was this, our ISP was doing some maintanence work on our fibre feed and we were going to lose all connectivity whilst this work was done. I was requested to have a holding page up in place of our website via a webserver at our DR site. I planned to do this by simply changing the host record on our public DNS server for our website to that of the webserver at our DR site. In preperation for this I changed the TTL of the web servers host record from 2hrs to 10min three thours before the scheduled cut off time. This should have ensured that most upstream DNS servers would have the shorter TTL version of the host record, so when the IP change took place it should have propergated out quicker. So, about 30min before cutoff I changed the IP address and queried the secondary DNS servers for the IP of our website and sure enough they repsonded with the updated IP. I did this check using our internal network as well as our ADSL network. So, all was looking good.  The engineers cut our fibre and started their work. I soon received a phone call saying our site was not accessable. Sure enough when I checked the holding page was not being loaded. I did some dns checks and discovered that NS2 and NS3 were now refusing the answer DNS queries for our domain! NS2 was actually responding with "query refused" when using nslookup to do the query." NS3 just didn't return anything at all. This had me stumped as it all worked before our fibre feed was cut. Once our fibre feed was reconnected both NS2 and NS3 started answering queries for our domain again. All three DNS servers are on different networks, so losing the fibre should not have effected NS2 or NS3. For the life of me I cannot work out why this would happen. I have checked our domain records and ns1, ns2 and ns3 are defined as our primary and secondary DNS servers for our domain.  Why the secondaries would not respond/reply to DNS queries when the primary is off the air is beyond me. Does anyone have any ideas? Have I made some fundemental design error in our config?

Comments/suggestions/criticisms welcome, well maybe not the criticisms :)

Regards

Craig
Avatar of Kaffiend
Kaffiend
Flag of United States of America image

TTL = 10 minutes ?

When fibre was cut, the records expired ?  and the secondaries could not get an "authoritative" answer from the primary because it was "down" ?  So, they couldn't provide a valid answer.






Avatar of BSODx64
BSODx64

ASKER

Hi Kaffiend,

thanks for your reply. I'll admit I'm not a DNS guru but I would have thought that the seondaries should have provided a authorative answer as well. I would have thought upstream DNS servers would offer un-authorative answers if they couldn't contact the nameservers defined for our domain. Isn't the whole purpose of secondary DNS servers to answer requests if the primary is down? Perhaps I have a DNS configuration error? Something to look into at any rate.

Regards

Craig

ASKER CERTIFIED SOLUTION
Avatar of Kaffiend
Kaffiend
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of BSODx64

ASKER

Ahh, so in effect I shot myself in the foot by setting a short TTL value. Once I changed the IP I should have set the TTL back to 2hrs or so. Ok, this didn't occur to me. Many thanks for taking the time to explain this to me.

Regards

Craig
BTW, most of the outfits that host DNS zones are pretty affordable these days, and most support SRV records, too.

Let them deal with it for a few dollars a year.  This way, (as long as your DNS host is pretty reliable and is not under attack like register.com was about a month back) they will handle primary and secondary for you, and their hardware and bandwidth is probably better than what you can/are willing to dedicate to the task.  Less servers to maintain/patch/worry about is always a god thing.



Avatar of BSODx64

ASKER

Hi Kaffiend,

yes we are considering that. Our current ISP unfortunately requires all DNS changes to be e-mailed 24hrs before hand. They are working on a client interface so that their customers can manage their own zones, but they still say it's a good 6 - 12 months away. Although we don't do a lot of DNS changes, when we do need them they tend to be for some critical issue where 24hr turn around is not acceptable to the company. Perhaps it's time to look at another ISP or someone else to host our DNS.

Once again thanks for all your comments and suggestions.

Craig