Very odd Microsoft DNS issue

G'day all.

This is my first posting and it's a bit long so be gentle with me :)

I encountered a very odd DNS issue the other day and am stumped as to what occured, so I am hoping that someone where might be able to explain what happened and how to correct it. Here is our setup. We have one fibre connection and one ADSL connection. The fibre is our primary internert feed. The ADSL is for testing purposes and is provided by a different ISP. There is no bridging between the fibre and ADSL feeds.

We have two sets of DNS servers, one for our internal AD domain, the other for external DNS requests. I'll refer to these as our "private" and "public" DNS servers respectively. The public DNS servers run on a pair of Windows 2003 servers. One is located at our primary site, the other at our DR site. Both are on different networks. We also use our ISPs DNS servers as a thrid "public" DNS server. I'll refer to these as ns1, ns2 and ns3 respectively. NS1 is the primary, NS2 and NS3 are secondaries. NS1 will only allow transfers to NS2 and NS3. NS3 uses BIND under linux whereas NS1 and NS2 are using the standard DNS server component that comes with Windows 2003.

The issue was this, our ISP was doing some maintanence work on our fibre feed and we were going to lose all connectivity whilst this work was done. I was requested to have a holding page up in place of our website via a webserver at our DR site. I planned to do this by simply changing the host record on our public DNS server for our website to that of the webserver at our DR site. In preperation for this I changed the TTL of the web servers host record from 2hrs to 10min three thours before the scheduled cut off time. This should have ensured that most upstream DNS servers would have the shorter TTL version of the host record, so when the IP change took place it should have propergated out quicker. So, about 30min before cutoff I changed the IP address and queried the secondary DNS servers for the IP of our website and sure enough they repsonded with the updated IP. I did this check using our internal network as well as our ADSL network. So, all was looking good.  The engineers cut our fibre and started their work. I soon received a phone call saying our site was not accessable. Sure enough when I checked the holding page was not being loaded. I did some dns checks and discovered that NS2 and NS3 were now refusing the answer DNS queries for our domain! NS2 was actually responding with "query refused" when using nslookup to do the query." NS3 just didn't return anything at all. This had me stumped as it all worked before our fibre feed was cut. Once our fibre feed was reconnected both NS2 and NS3 started answering queries for our domain again. All three DNS servers are on different networks, so losing the fibre should not have effected NS2 or NS3. For the life of me I cannot work out why this would happen. I have checked our domain records and ns1, ns2 and ns3 are defined as our primary and secondary DNS servers for our domain.  Why the secondaries would not respond/reply to DNS queries when the primary is off the air is beyond me. Does anyone have any ideas? Have I made some fundemental design error in our config?

Comments/suggestions/criticisms welcome, well maybe not the criticisms :)

Regards

Craig
BSODx64Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

KaffiendCommented:
TTL = 10 minutes ?

When fibre was cut, the records expired ?  and the secondaries could not get an "authoritative" answer from the primary because it was "down" ?  So, they couldn't provide a valid answer.






0
BSODx64Author Commented:
Hi Kaffiend,

thanks for your reply. I'll admit I'm not a DNS guru but I would have thought that the seondaries should have provided a authorative answer as well. I would have thought upstream DNS servers would offer un-authorative answers if they couldn't contact the nameservers defined for our domain. Isn't the whole purpose of secondary DNS servers to answer requests if the primary is down? Perhaps I have a DNS configuration error? Something to look into at any rate.

Regards

Craig

0
KaffiendCommented:
They (secondary name servers) are short-lived.  After a few attempts at zone transfer from the (now-down) primary, the zone they host will expire.  

They do serve the purpose of answering queries if the primary is unreachable, but they rely on the primary for the zone information.  If that zone information is no longer valid (because of a short TTL) then they cannot perform name resolution because they no longer "know" what should be in the zone file.  In this case, the nameservers were reachable from the internet, it's just that they could not do name resolution because they no longer had a valid copy of the master zone file.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Challenges in Government Cyber Security

Has cyber security been a challenge in your government organization? Are you looking to improve your government's network security? Learn more about how to improve your government organization's security by viewing our on-demand webinar!

BSODx64Author Commented:
Ahh, so in effect I shot myself in the foot by setting a short TTL value. Once I changed the IP I should have set the TTL back to 2hrs or so. Ok, this didn't occur to me. Many thanks for taking the time to explain this to me.

Regards

Craig
0
KaffiendCommented:
BTW, most of the outfits that host DNS zones are pretty affordable these days, and most support SRV records, too.

Let them deal with it for a few dollars a year.  This way, (as long as your DNS host is pretty reliable and is not under attack like register.com was about a month back) they will handle primary and secondary for you, and their hardware and bandwidth is probably better than what you can/are willing to dedicate to the task.  Less servers to maintain/patch/worry about is always a god thing.



0
BSODx64Author Commented:
Hi Kaffiend,

yes we are considering that. Our current ISP unfortunately requires all DNS changes to be e-mailed 24hrs before hand. They are working on a client interface so that their customers can manage their own zones, but they still say it's a good 6 - 12 months away. Although we don't do a lot of DNS changes, when we do need them they tend to be for some critical issue where 24hr turn around is not acceptable to the company. Perhaps it's time to look at another ISP or someone else to host our DNS.

Once again thanks for all your comments and suggestions.

Craig
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
DNS

From novice to tech pro — start learning today.