DNS Cache Failing on Server 2008

aiscom
aiscom used Ask the Experts™
on
I am having an issue with a Windows Server 2008 SP1 DNS server that I cannot seem to resolve. I have found relevant documentation on the Internet and Experts Exchange but cannot seem to relate it to my situation. I have a root hints DNS server that struggles with caching a single domain, "wildblue.net". It appears that the NS record for the domain as provided by their name server is persisting in the DNS cache longer than than the A record associated with that domain name. Once the A record has expired, the NS record is unable to be resolved and a lookup on that domain returns "Server Failed". If I clear the cache, the NS record is deleted and subsequent requests pull a whole new zone file with name server A record which returns correct responses. I have tried setting the MaxCacheTTL to 2 days as recommended by a number of websites but this has not resolved the issue. When I clear the cache both the NS record and the A record have 24hr TTL's but at some point the NS record must be getting refreshed. When looking at the cache this morning after clearing it yesterday evening, the A record was down to 6hr TTL and the NS record had a 23hr TTL. Left unchanged this A record would expire and subsequent DNS requests would fail.

What is causing these NS and A records to become mismatched and allowing the A record to expire before the NS record? The wildblue.net name server is returning a TTL of 15min for the NS record and 24hrs for the A record. Is my DNS overwriting the TTL of the NS record? Is this a problem on wildblue.net's end or a problem with our DNS server?

Thanks for your help!
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®

Commented:
The problem is defenitly on their end. YOu van not set the TTL of other domains, they set their own TTL and your server simply copies them from their server. Esepcially if this is the only domain that you are having issues, do not touch your server, just let them know about the problem. MOst probably they are on dynamic IP, therefore their A records change often, but NS record stays. It`s only normal.

Author

Commented:
I accept that the TTL of the records should be set by their name server, however the facts would seem to indicate that the TTL's are getting changed. When I request the NS record from their server it states the TTL is 15min, however when I go to my DNS cache and look at the TTL remaining it says 24hrs. I really don't want to change the TTL but it seems as if something is doing that.

Commented:
Do they have their own name servers or they are hosted somewhere?

Author

Commented:
It looks like they have their own name servers. Maybe I can best describe it this way:

Querying their name server edns01.wildblue.net for the NS record for wildblue.net returns
nameserver = edns03.prod.wdc1.wildblue.net
ttl = 900 (15 mins)

Querying a root server a.gtld-server.net for the NS record for wildblue.net returns
nameserver = edns03.prod.wdc1.wildblue.net
ttl = 306 (5 mins 6 secs)

Querying my nameserver for the NS record for wildblue.net returns
nameserver = edns03.prod.wdc1.wildblue.net
ttl = 85524 (23 hours 45 mins 24 secs)

Now if their server is saying do not cache for more than 15 minutes why the heck is my server holding it for 23hrs and 45min?

I did follow the same procedure for the A record that edns03.prod.wdc1.wildblue.net points to and my server seems to obey that TTL correctly.

Commented:
Just ran my own test and indeed their NS record is set for 15minutes and the A record set for 8 hours. Which means they are using an unusual DNS server or a misconfigured DNS settings.
Anyways they are using child child domains and the name server is in a child domain, so it`s only normal that the parent domain has different TTL than the DNS server which is in a child domain. a different zone with different TTL settings.
But my server did not show 24 hours TTL on their records, it was showing me what`s on theri server exactly.
Is it possible that you forced a 24 hous TTL for the cached domain?

Author

Commented:
That's what I can't figure out. It is a standard DNS install on the server. The only change I have made is to set the MaxCacheTTL to 2 days which as I understand makes it so that no TTL can go longer than 2 days. I do not suspect this is the issue because the NS record TTL on my server is getting set to 24hours not 48 hours. I will however try removing this setting and see how the TTL's in the cache change.

Author

Commented:
Removing the MaxCacheTTL value had no effect on the cached record TTL. After updating and restarting the DNS service, I re-requested the wildblue.net record and the TTL is in my cache as 24 hours.

Maybe just as important is if both the NS record and the A record last 24 hours they would both expire at the same time and there would be no issue. For some reason the NS record is getting refreshed, but the A record is not. That results in the A record expiring and the NS record not expiring. Weird.

Author

Commented:
Even more wicked. I changed the MaxCacheTTL value down to 15 minutes thinking I might be able to force all of the records down to the same TTL. All of the records I could find in the cache were less than 15 minute TTL (incl other domains, the root servers, everything) except the NS record for wildblue.net. I sat and watched it's A records and MX records tick down from 15 minutes, expire, then sure enough name resolution for that domain stopped working. Even the NS record is not obeying the name server TTL or the MaxCacheTTL value in DNS server.

Author

Commented:
Still looking for a solution here. The wildblue.net NS record should definitely be set to a TTL of 15min and this Windows server is setting it to 24hrs.  

Author

Commented:
Any other possible ideas?
Have you tried disabling caching altogether for testing purposes? I've had success with this in a similar situation with one of my clients.
Also, you didn't mention whether you cleared the cache in between changing the MaxCacheTtl registry key.
Hotfix that addresses your issue, I believe. http://support.microsoft.com/kb/2508835
Commented:
BEKSUPPORT: We did disable the cache all together and the domain started resolving correctly. This of course was not a viable solution due to increased server load. We did clear the cache after setting the MaxCacheTTL value. This did not resolve the issue. We applied the hotfix linked but that also did not resolve the issue (despite being exactly what we were looking for a resolution to)

After opening a support case with MS they recommended disabling lamedelegation and cachelocking. While this is in my opinion a decrease in security it did resolve the issue. Evidently the DNS server was locking the NS records and preventing them from refreshing when the A records expired.

       dnscmd /Config /LameDelegationTtl 0
       dnscmd /Config /CacheLockingPercent 0

Author

Commented:
While I don't recognize this as a solution to the improper TTL values being cached it did resolve our issue.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial