Link to home
Start Free TrialLog in
Avatar of swappedsr
swappedsr

asked on

DNS ISSUE ROOT HINTS NOT RESOLVING

For the longest time our internal dns server has just relied on root hints for external name resolution.  We started having problems about a week ago where our websites hosted by a company in another state were not resolving.  The dns server is on windows 2003 by the way.  So, what I did was is I went to each root hints server and clicked the resolve button, which in fact gave me a different IP address for one of the servers.  I ran dcdiag/test:dns and it returns no errors.  Now, this issue was only happening with our own websites, every other website loaded fine.  It was strange as I was looking at the cached lookups I would only see one A record and we have like 5 websites. Every time I would try any of our other websites that didn't have a cached lookup record, name resolution would not work nor a record would be created for the site I visited.  So, eventually I had to suck it up and add our isp as a forwarder, but lookups seem slow and not to mention I am now receiving an error while trying to resolve certain root hints servers.  Thanks in advance, dns is not my specialty.
NS-ERROR.gif
Avatar of swappedsr
swappedsr

ASKER

Also, I noticed that within dns I see that all ip's are updated from looking at the internic site. Also, the error I was receiving in the attachment is now random, it sometimes comes up on different root hints servers when clicking the resolve button.  Lastly, I noticed that all the ip addys associated with the root hints servers are correct, but they aren't correct when I look at the configuration file CACHE.DNS.  Here is the output from the CACHE.DNS, says it hasn't been updated since 02' eeek...Do I have to replace this file, will the root hints listed in dns change if I swap it with the new one?  Let me know, Bob


 cache.dns -- DNS CACHE FILE
;
;   Initial cache data for root domain servers.
;
;   YOU SHOULD CHANGE:
;   ->  Nothing if connected to the Internet.  Edit this file only when
;       updated root name server list is released.
;           OR
;   ->  If NOT connected to the Internet, remove these records and replace
;       with NS and A records for the DNS server authoritative for the
;       root domain at your site.
;
;   Note, if you are a root domain server, for your own private intranet,
;   no cache is required, and you may edit your boot file to remove
;   it.
;

;       This file holds the information on root name servers needed to
;       initialize cache of Internet domain name servers
;       (e.g. reference this file in the "cache  .  <file>"
;       configuration file of BIND domain name servers).
;
;       This file is made available by InterNIC
;       under anonymous FTP as
;           file                /domain/named.root
;           on server           FTP.INTERNIC.NET
;
;       last update:    Nov 5, 2002
;       related version of root zone:   2002110501
;
;
; formerly NS.INTERNIC.NET
;
.                        3600000  IN  NS    A.ROOT-SERVERS.NET.
A.ROOT-SERVERS.NET.      3600000      A     198.41.0.4
;
; formerly NS1.ISI.EDU
;
.                        3600000      NS    B.ROOT-SERVERS.NET.
B.ROOT-SERVERS.NET.      3600000      A     128.9.0.107
;
; formerly C.PSI.NET
;
.                        3600000      NS    C.ROOT-SERVERS.NET.
C.ROOT-SERVERS.NET.      3600000      A     192.33.4.12
;
; formerly TERP.UMD.EDU
;
.                        3600000      NS    D.ROOT-SERVERS.NET.
D.ROOT-SERVERS.NET.      3600000      A     128.8.10.90
;
; formerly NS.NASA.GOV
;
.                        3600000      NS    E.ROOT-SERVERS.NET.
E.ROOT-SERVERS.NET.      3600000      A     192.203.230.10
;
; formerly NS.ISC.ORG
;
.                        3600000      NS    F.ROOT-SERVERS.NET.
F.ROOT-SERVERS.NET.      3600000      A     192.5.5.241
;
; formerly NS.NIC.DDN.MIL
;
.                        3600000      NS    G.ROOT-SERVERS.NET.
G.ROOT-SERVERS.NET.      3600000      A     192.112.36.4
;
; formerly AOS.ARL.ARMY.MIL
;
.                        3600000      NS    H.ROOT-SERVERS.NET.
H.ROOT-SERVERS.NET.      3600000      A     128.63.2.53
;
; formerly NIC.NORDU.NET
;
.                        3600000      NS    I.ROOT-SERVERS.NET.
I.ROOT-SERVERS.NET.      3600000      A     192.36.148.17
;
; operated by VeriSign, Inc.
;
.                        3600000      NS    J.ROOT-SERVERS.NET.
J.ROOT-SERVERS.NET.      3600000      A     192.58.128.30
;
; housed in LINX, operated by RIPE NCC
;
.                        3600000      NS    K.ROOT-SERVERS.NET.
K.ROOT-SERVERS.NET.      3600000      A     193.0.14.129
;
; operated by IANA
;
.                        3600000      NS    L.ROOT-SERVERS.NET.
L.ROOT-SERVERS.NET.      3600000      A     198.32.64.12
;
; housed in Japan, operated by WIDE
;
.                        3600000      NS    M.ROOT-SERVERS.NET.
M.ROOT-SERVERS.NET.      3600000      A     202.12.27.33
; End of File

Avatar of Hypercat (Deb)
You could try forcing the system to refresh and reload the root hints from the cache.dns file.  This article describes how to do this:

http://support.microsoft.com/kb/249868/
If I do that will just load the old cache.dns file, should I replace it with the new file, here ftp://ftp.rs.internic.net/domain/named.cache

Or

Should I just edit the existing dns.cache file at c:/windows/system32/dns and then go on with the steps on that site?
You can download a new one but the last time I checked it was exactly the same as the old one. The process in the article I cited actually has you copy a backup copy of the cache.dns file first from another location on the hard drive into the DNS folder and then reload it.  
If I look at the cache.dns it says it was last updated 2002, which is showing 2 ip addresses different then list located in the dns manager root hints tab.
Also, on the link you provided I just see it copying from c:\windows\system32\dns\samples folder, which when I opened it, is an old version (year=2002) just like the cache.dns sitting at c:\windows\system32\dns
Hmmm, mine are dated 3/25/2003, so it looks like somehow you do have a very old copy.  There's certainly no harm in downloading a new copy. I'd just put it in both places so that you have a backup for future reference in case the problem recurs.
Yeah i am running active directory also on this dns server, because it is my domain controller.  So, do you know if these locations are where AD looks for the root hints servers?  Im just afraid of deleting the rootdnsservers folder, arghh!!  Im also leaving in 45 minutes, I should probably wait till tomorrow in case something breaks.
just to point out the obvious - have you tried just clearing your DNS cache? right click the server and clear cache - you wouldnt beleive how often this fixes these issues
Yeah, I already cleared the cache to no availability

Hey guys,

I'll be quite surprised if root hints ends up being the cause. But to complete the update of that...

Replace cache.dns in %systemroot%\system32\dns with the updated version from internic:

ftp://ftp.internic.com/domain/named.cache

You'll have to rename named.cache to cache.dns.

Then follow the article posted by Hypercat above (http://support.microsoft.com/kb/249868/) from step 3 onwards. You'll be using the file downloaded from Internic to replace the first steps.

For the issue itself, when you're bumping into this problem can you directly query the authoritative server from your own?

e.g.

C:\> nslookup
> set type=ns
> yourpublicdomain.com
> server <name-server-for-domain-above>
> set debug
> set type=a
> <record-failing-to-resolve>

Chris
That is the thing, I was having issues contacting their name server using nslookup.

Then it's not your DNS Server, it's theirs or the network between you and them.

Can you tell us the domain name you're attempting to query?

Chris
Here is the output from above, this is their name server they are using for our website, i don't know why this is timing out, but I can hit the site fine, plus I still have forwarders setup. Maybe resolving to our isp's dns is slow and that is why it is timing out?

Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

C:\Documents and Settings\rboklewski>nslookup
Default Server:  pfwhome.psip-pfw.local
Address:  10.185.10.15

> set type=ns
www.example.com
Server:  pfwhome.psip-pfw.local
Address:  10.185.10.15

example.com
        primary name server = charlie.goisg.net
        responsible mail addr = admin.goisg.net
        serial  = 77
        refresh = 600 (10 mins)
        retry   = 600 (10 mins)
        expire  = 86400 (1 day)
        default TTL = 3600 (1 hour)
> server charlie.goisg.net
Default Server:  charlie.goisg.net
Address:  10.161.207.64

> set debug
> set type=a
www.example.com
Server:  charlie.goisg.net
Address:  10.161.207.64

DNS request timed out.
    timeout was 2 seconds.
timeout (2 secs)
DNS request timed out.
    timeout was 2 seconds.
timeout (2 secs)
*** Request to charlie.goisg.net timed-out
The other thing is when I set up a forwarder to our isp I then do see the additional records get added in the lookup cache, but only one record gets added without them.  It is strange, ive never seen this before.

Charlie isn't a DNS server, it's not responding to DNS requests at all.

Name Servers for that domain are:

goisg.net.              172800  IN      NS      grr-ns1.goisg.net.
goisg.net.              172800  IN      NS      hol-ns1.goisg.net.

Both of those two respond and give addresses for Charlie.

Using the "server" statement in NSLookup tells it to use a different DNS Server for name resolution (within NSLookup). Can you try:

nslookup
server grr-ns1.goisg.net
charlie.goisg.net

And see if that responds at all.

The idea is that we see if the network path between your name server and theirs works.

Chris
I got a response, took the forwarders out, but still only one record being added in the lookup cache, without the forwarders I can hit www.profootballweekly.com, but now I can't hit our other sites.  But, if I wait throughout the day that one record will sometimes change to one of our other sites and im not sure why.

Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

C:\Documents and Settings\rboklewski>nslookup
Default Server:  pfwhome.psip-pfw.local
Address:  10.185.10.15

> server grr-ns1.goisg.net
Default Server:  grr-ns1.goisg.net
Address:  65.167.22.30

> charlie.goisg.net
Server:  grr-ns1.goisg.net
Address:  65.167.22.30

Name:    charlie.goisg.net
Address:  10.161.207.64





See if this helps.  
NS-ERROR-2.gif

Mind telling me a couple of the other sites?

Basically though I don't think this is your problem. Whoever is hosting the public DNS records for you is just not doing it right.

If we have a bit of a look we can see oddities. For instance, if we query name servers for profootballweekly we end up with this:

profootballweekly.com.  172800  IN      NS      grr-ns1.goisg.net.
profootballweekly.com.  172800  IN      NS      hol-ns1.goisg.net.
;; Received 128 bytes from 192.33.14.30#53(B.GTLD-SERVERS.NET) in 46 ms

www.profootballweekly.com. 3600 IN      A       63.161.207.223
profootballweekly.com.  3600    IN      NS      charlie.goisg.net.
;; Received 106 bytes from 65.167.22.30#53(grr-ns1.goisg.net) in 109 ms

Absolutely fine until that very last hop. Suddenly we find we're supposed to go to charlie.goisg.net. Bugger all chance of that happening, it advertises a private IP address (which I should have noticed earlier).

Generally this shouldn't be a huge problem because the Time To Live values are the same, any query for www will expire at the same time as the Name Server record. If it didn't you would end up attempting to request the record from Charlie, and it'll just continually fail to resolve.

That said, anything done incorrectly in DNS tends to have a knock-on effect.

Chris
Here are the other sites.

www.profootballweekly.com
dev.profootballweekly.com
www.pfwstore.com
nflbogs.profootballweekly.com

So you think it is the guys hosting our websites?  I wouldn't be a bit surprised.

The strange thing is how come it works with forwarders?  Though I know I had two people have issues from home,  but one had a vpn connection to where the servers are located and one said he got a page cannot be displayed while accessing our www.profootballweekly.com site.  

Also, I didn't mention that we do have a vpn tunnel open to our servers in Michigan.

Further, I see the two name servers above and I know they have offices in both grand rapids MI and holland, MI if that helps.

Can you tell me the version of your DNS executable? %SystemRoot%\System32\DNS.exe. Earlier versions of MS DNS are far less well equipped to deal when things get messy.

I don't have issues resolving the names in question on version 5.2.3790.4171.

Still, all the names above suffer from the same silly configuration issues. Responsibility for that lies with the DNS host and really should be fixed regardless of the outcome here.

Chris
It is the same version as you are running.  Can you tell me what command in nslookup you used to query for the name servers?

Thanks again for you help.
ASKER CERTIFIED SOLUTION
Avatar of Chris Dent
Chris Dent
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I think I am only seeing charlie for the name server record, which you said has a private ip address assigned, correct? But i know you also said it isn't a name server and that the name servers actually forward to charlie? What is charlie then? Can you shed some light on that and what you meant when you said that it shouldn't matter because the ttl of both the A and NS records are the same?  
Nevermind about the above post im reading your post you did at the same time as me :)

It's a Domain Controller (see above, all name servers for the domain advertise the services of a Domain Controller).

My statement earlier wasn't correct, it is a DNS Server (or it runs the DNS Service), it's just the IP Address is not publicly accessible (meaning we can't see the DNS Service).

As for the TTL, because you end up with charlie cached it's no good. Your first query may well work because you manage to ask the right servers. But one successful query per hour is very poor.

Chris

hehe ditto ;)

Chris
yeah goisg.com hosts our websites and our websites are registered with network solutions, but I have no idea how isg's backend is set as far as dns.  I can probably get the records from network solutions to see where they are pointing to.

By the way you were getting timeouts accessing these sites or some of them?
So when i delete the cached lookup for profootballweekly, I can go to anyone of our sites and it will register in the cached lookup, only the first site I query.  Then, it will create an ns record for charlie.goisg.net, but if charlie is no good how come I can access one of the sites?
I should of specified more, but I know for sure my dns doesn't cache anything other than charlie.goisg.net, so how could it find the right path if that ns record is wrong?
You are better than good, nice to have someone like you around.  I have learned a ton already about dns outside my lan from this post.

Glad I could help out :)

I should cover the last couple of questions though.

> By the way you were getting timeouts accessing these sites or some of them?

All of them, although because of the cached correct response it didn't exhibit for an hour.

> Then, it will create an ns record for charlie.goisg.net, but if charlie is no good
> how come I can access one of the sites?

Because your first query gives you the cached record you have that long before it all falls apart.

It's after that, or finding anything else under the same domain that causes problems for us.

But then, because the Time To Live (TTL) is only 1 hour your server soon forgets about it and you begin all over again.

Chris
I think I get it,  So somehow, the first query right after the cached is wiped will yield the correct results, even though the correct name server doesn't get registered on my dns server, but charlie does. So, anything after that initial query will not work, because charlie is assigned a private ip address (10.161.207.64), is that correct? And I think you are absolutely correct, because after an hour it starts over again and I see sometimes that the one A record changes to one of our other sites.  

Lastly, I guess we just can't figure out why our forwarders are working other than that they somehow manage to find the right route somehow.

You are good :)  Let me know if im right in my thinking.  

We've got quite a lengthy process here. I'm trying to think of how best to describe it, hopefully this is adequate.

Full Name resolution via Root:

In this instance the server has nothing cached.

1. Client requests name from server. e.g. www.profootballweekly.com
2. Server begins iterative query against root servers
 - Queries Root (responsible for "."). e.g. a.root-servers.net
   -- Root returns referral to TLD (Top Level Domain) servers
 - Queries TLD (responsible for .com, etc) e.g. a.gtld-servers.net.
   -- TLD returns referral to Authoritative plus Glue (A Records for authoritative Name Servers)
 - Queries Authoritative Servers
   -- Authoritative Servers return response and associated NS Records
     --> e.g. charlie.goisg.net
3. Server caches responses (based on TTL, 3600 seconds / 1 hour in this case)
4. Server returns response to Client
5. Client is happy

Name Resolution via Cache:

A simple query for a cached record.

1. Client requests name from server. e.g. www.profootballweekly.com
2. Server checks cache and finds response
3. Server returns response to Client
4. Client is happy

Name Resolution via Cache:

A query for a record within a cached domain (but not a cached record)

1. Client requests name from server. e.g. nflblogs.profootballweekly.com
2. Server checks cache and finds no valid response for nflblogs.
3. Server checks cache and finds Name Server record for profootballweekly.com
4. Server attempts to query Authoritative Server using cached NS Record

This is where it's falling down at the moment. It should continue with returning the response and making the client happy. Instead it's failing:

5. Server fails to contact Authoritative Server
6. Client becomes upset

You could potentially keep swapping around the two records, www resolves for an hour, nflblogs  resolves for an hour, etc etc. Neither is reliable. The only way to see which is working at a given point in time is by looking in the cache on the server.

The forwarders... it's difficult to know exactly. It would perhaps be nice to think it has a mechanism to discard NS records where the entry includes a private address. That would certainly get around all of this.

As it is I can reproduce this failure (caused by Charlie) on both MS DNS and BIND so it's nothing as simple as using a different DNS service.

If I could give you one piece of advice at this stage it would be to change Web Hosts. There's no excuse for breaking DNS, especially not if you're charging someone for a service.

Chris
Very nice explanation.  So, basically......

(say I deleted cached lookup for profootballweekly.com)
We make the first query to profootballweekly.com  
It finds an authoritative server, but sends back charlie.goisg.net, which is not an authoritative server and any query to one of our other sites there after looks at cache and finds no A record, then it looks at the NS record for charlie and tries to query for website at charlie but can't, because charlie has a private ip address assigned. So, in the end Charlie is wrong and it shouldn't be the one returned in the first place, correct?  Hopefully I got it :)

By the way, they got it fixed, finally the ns1/ns2 name servers are showing :)  It was on their end.  Thanks a ton.