Link to home
Start Free TrialLog in
Avatar of Gregory Miller
Gregory MillerFlag for United States of America

asked on

DNS working but .GOV sites will not resolve intermittantly

Yes, all other TLD's as far as I can tell are resolving fine when this happens but after a few days of operating normally, the DNS will simply not resolve any site with a .GOV TLD. The client figured out that if it happens, they can just reboot and it fixes the problem. This does not assist in understanding what the problem is, however. I originally was told by the client that the firewall (ISA 2000) was blocking access but I was never able to find any logs that indicated this was true.

I was finally called when it occurred after hours and I was immediately able to determine the problem was with DNS. There are no errors being logged and I can figure out no reason for this to be occurring. If I restart the DNS service, all is fine for a few days and then we are back to square one.

I have turned on DNS Debug Logging and I am going to hold my breath that this log file may capture some vital data so I can put my finger directly on the source of the problem. I am including the first resolution logged to the main site the office visits regularly as an example.

Question 1: What would be the best items to log in the debugging to be able to track the specific problem and what would I look for?

Question 2: Anyone have a specific answer to solve this problem without having to go through the debug effort?

This is a SBS2003. There have been no DNS changes done since the original install. I will split points if necessary for good info.

Thank You in Advance for your insight...
Gregory A. Miller
a.k.a. Technodweeb


Example of a working query to "http://pacer.psc.uscourts.gov/":

16:22:22 DA0 PACKET  UDP Snd 209.97.207.48   3a1c   Q [0000       NOERROR] (6)lsmns1(4)gtwy(8)uscourts(3)gov(0)
UDP question info at 01BEA360
  Socket = 408
  Remote addr 209.97.207.48, port 53
  Time Query=0, Queued=0, Expire=0
  Buf length = 0x0500 (1280)
  Msg length = 0x0035 (53)
  Message:
    XID       0x3a1c
    Flags     0x0000
      QR        0 (QUESTION)
      OPCODE    0 (QUERY)
      AA        0
      TC        0
      RD        0
      RA        0
      Z         0
      RCODE     0 (NOERROR)
    QCOUNT    1
    ACOUNT    0
    NSCOUNT   0
    ARCOUNT   1
    QUESTION SECTION:
    Offset = 0x000c, RR count = 0
    Name      "(6)lsmns1(4)gtwy(8)uscourts(3)gov(0)"
      QTYPE   A (1)
      QCLASS  1
    ANSWER SECTION:
      empty
    AUTHORITY SECTION:
      empty
    ADDITIONAL SECTION:
    Offset = 0x002a, RR count = 0
    Name      "(0)"
      TYPE   OPT  (41)
      CLASS  1280
      TTL    0
      DLEN   0
      DATA   (none)

16:22:22 12C8 PACKET  UDP Snd 207.41.14.62    3a1c   Q [0000       NOERROR] (6)lsmns1(4)gtwy(8)uscourts(3)gov(0)
UDP question info at 01BEA360
  Socket = 408
  Remote addr 207.41.14.62, port 53
  Time Query=0, Queued=0, Expire=0
  Buf length = 0x0500 (1280)
  Msg length = 0x0035 (53)
  Message:
    XID       0x3a1c
    Flags     0x0000
      QR        0 (QUESTION)
      OPCODE    0 (QUERY)
      AA        0
      TC        0
      RD        0
      RA        0
      Z         0
      RCODE     0 (NOERROR)
    QCOUNT    1
    ACOUNT    0
    NSCOUNT   0
    ARCOUNT   1
    QUESTION SECTION:
    Offset = 0x000c, RR count = 0
    Name      "(6)lsmns1(4)gtwy(8)uscourts(3)gov(0)"
      QTYPE   A (1)
      QCLASS  1
    ANSWER SECTION:
      empty
    AUTHORITY SECTION:
      empty
    ADDITIONAL SECTION:
    Offset = 0x002a, RR count = 0
    Name      "(0)"
      TYPE   OPT  (41)
      CLASS  1280
      TTL    0
      DLEN   0
      DATA   (none)

16:22:22 12C8 PACKET  UDP Snd 207.41.18.68    0224   Q [0000       NOERROR] (5)pacer(3)psc(8)uscourts(3)gov(0)
UDP question info at 01BBAC90
  Socket = 408
  Remote addr 207.41.18.68, port 53
  Time Query=0, Queued=0, Expire=0
  Buf length = 0x0500 (1280)
  Msg length = 0x0033 (51)
  Message:
    XID       0x0224
    Flags     0x0000
      QR        0 (QUESTION)
      OPCODE    0 (QUERY)
      AA        0
      TC        0
      RD        0
      RA        0
      Z         0
      RCODE     0 (NOERROR)
    QCOUNT    1
    ACOUNT    0
    NSCOUNT   0
    ARCOUNT   1
    QUESTION SECTION:
    Offset = 0x000c, RR count = 0
    Name      "(5)pacer(3)psc(8)uscourts(3)gov(0)"
      QTYPE   A (1)
      QCLASS  1
    ANSWER SECTION:
      empty
    AUTHORITY SECTION:
      empty
    ADDITIONAL SECTION:
    Offset = 0x0028, RR count = 0
    Name      "(0)"
      TYPE   OPT  (41)
      CLASS  1280
      TTL    0
      DLEN   0
      DATA   (none)

16:22:22 12C8 PACKET  UDP Snd 192.168.16.2    a7bd R Q [0084 A     NOERROR] (5)pacer(3)psc(8)uscourts(3)gov(0)
UDP response info at 007E7AD0
  Socket = 392
  Remote addr 192.168.16.2, port 44152
  Time Query=453506, Queued=0, Expire=0
  Buf length = 0x0500 (1280)
  Msg length = 0x0087 (135)
  Message:
    XID       0xa7bd
    Flags     0x8400
      QR        1 (RESPONSE)
      OPCODE    0 (QUERY)
      AA        1
      TC        0
      RD        0
      RA        0
      Z         0
      RCODE     0 (NOERROR)
    QCOUNT    1
    ACOUNT    1
    NSCOUNT   2
    ARCOUNT   2
    QUESTION SECTION:
    Offset = 0x000c, RR count = 0
    Name      "(5)pacer(3)psc(8)uscourts(3)gov(0)"
      QTYPE   A (1)
      QCLASS  1
    ANSWER SECTION:
    Offset = 0x0028, RR count = 0
    Name      "[C00C](5)pacer(3)psc(8)uscourts(3)gov(0)"
      TYPE   A  (1)
      CLASS  1
      TTL    3600
      DLEN   4
      DATA   207.41.15.138
    AUTHORITY SECTION:
    Offset = 0x0038, RR count = 0
    Name      "[C016](8)uscourts(3)gov(0)"
      TYPE   NS  (2)
      CLASS  1
      TTL    3600
      DLEN   14
      DATA   (6)resns1(4)gtwy[C016](8)uscourts(3)gov(0)
    Offset = 0x0052, RR count = 1
    Name      "[C016](8)uscourts(3)gov(0)"
      TYPE   NS  (2)
      CLASS  1
      TTL    3600
      DLEN   9
      DATA   (6)lsmns1[C04B](4)gtwy[C016](8)uscourts(3)gov(0)
    ADDITIONAL SECTION:
    Offset = 0x0067, RR count = 0
    Name      "[C05E](6)lsmns1[C04B](4)gtwy[C016](8)uscourts(3)gov(0)"
      TYPE   A  (1)
      CLASS  1
      TTL    3600
      DLEN   4
      DATA   207.41.18.68
    Offset = 0x0077, RR count = 1
    Name      "[C044](6)resns1(4)gtwy[C016](8)uscourts(3)gov(0)"
      TYPE   A  (1)
      CLASS  1
      TTL    3600
      DLEN   4
      DATA   207.41.14.62
Avatar of scrathcyboy
scrathcyboy
Flag of United States of America image

Someone, in their infinite "unawareness" probably set up DNS rules in SBS2003 to get to com, net and org -- not realizing there is a whole world out there besides those three.  You should delete ALL DNS rules from SBS and the master routers for the domain, and start from scratch -- that is the right way to fix what someone else was too "unaware" to see.  Initially, you should not disallow any sites, and until you scrap all the rules on the server, the DNS will continue to be messed up.  Besides, why create ANY disallow rules??  The items you want to disallow are NOT site, but elements of all sites, like Ads, Flash, and that kind of stuff.  We find that Adblock are better than master server rules, which are almost always ill conceived.
Avatar of Gregory Miller

ASKER

Thank you scatchyboy...

Could you be specific about the rules you are mentioning and where those might be created so I can verify. I created the server two years ago and am the only administrator, other than a secretary who has the ability to reboot if the time requires it. This problem is very intermittant and so I am not thinking it would be a rule anyway. But I am very curious and will check it out if you point me in the right direction.

Thanks,
-greg
As a side note... If you are speaking of rules being Zones, there are no specific zones created except for mydomain.local and _msdcs.mydomain.local which were both created at the time of installation. Both of these exist on another server I was using to compare the settings with and all is very much the same between the two.

Thank again,
-greg
Totally depends on your network topology, if the router is determining IP filetering, then you must go into the router setup, which is a lot easier than SBS setup in windows.  But most people set up site filtering with windows, just to make it more difficult on themselves, believing MS software is better.  In SBS, you will find it it somthing like start, settings, control panel, internet options (but of course it is different in SBS than in a normal OS), and under content, privacy and security, you will find some restrictions in place.  If not there, it is in the Frewall settings of SBS, which are too complicated to walk you through without it in front of me.  YOu will have to figure this out for yourself.

Basically, if your systems and server are behind a router/firewall, which they should be, then ALL IP filtering and firewalling should be dedicated to the router, which is 1000x better at firewalling than windows is, the problem is, people make these changes in windows, then they cant find them again.

The bottom line is simple, if you remove the SBS server, and have the workstations go to the internet relying only on the very secure router firewalling, you would NOT have this problem.  So it is in the SBS setup, and someone has made that needlessly, and you need to figure out what it is, and remove it, or else STOP SBS from firewalling anything -- that should be left to your router, hardware beats windows at this.
I appreciate your input but I think you may be missing the point. The DNS service on the specific server is not resolving .GOV sites intermittantly. This has been verified as not a firewall or router issue. The DNS Service simply is not forwarding requests nor is it offering cached answers to requests. This is the problem.

Thank you,
-greg
Are you aware that a lot of GOV sites that were on the internet have been taken down by Bush, for fear that there might be sensitive information on them.  Is this your problem, old sites are no longer available -- or are you having intermittent access to sites you can get to, one day, but not the next.  In that case, you should check with your ISP, the group providing you internet access.  It is possible they are having problems.  Report this issue to them, if you have eliminated router, firewall and server, and you think your DNS is OK, that is the next logical step.  Also, have you tried changing DNS servers 1 and 2 in the TCP/IP properties of the server?  Why not use a different pair of DNS servers that work better?  You can use ANY DNS servers on the internet that you want.  Usually people use the DNS servers of their ISP, but you dont have to, you can use any.  Try a .GOV DNS server for one of them.
scrathcyboy,

Thank you for your input but your suggestions are not in line with the problem at hand.

It would appear as if all .GOV requests to the local domain DNS server is timing out for some reason. Not for ANY domains other than .GOV... Everything else is working perfectly fine. There are no DNS related errors in the application or system logs to speak of. I am really baffled at this time.

I did think of another test I will try next time this occurs and that is to flush the DNS Server cache to see if it is just poisoned but I do not think this is the issue since it affects only .GOV tld's.

I was able to grab some additional details before restarting the DNS Server service.

=======nslookup prior to restarting=======
C:\Documents and Settings\Administrator>nslookup
Default Server:  svr001.lhlaw.local
Address:  192.168.16.2

> pacer.psc.uscourts.gov
Server:  svr001.lhlaw.local
Address:  192.168.16.2

DNS request timed out.
    timeout was 2 seconds.
*** Request to svr001.lhlaw.local timed-out
> set debug
> pacer.psc.uscourts.gov
Server:  svr001.lhlaw.local
Address:  192.168.16.2

------------
Got answer:
    HEADER:
        opcode = QUERY, id = 5, rcode = NXDOMAIN
        header flags:  response, auth. answer, want recursion, recursion avail.
        questions = 1,  answers = 0,  authority records = 1,  additional = 0

    QUESTIONS:
        pacer.psc.uscourts.gov.lhlaw.local, type = A, class = IN
    AUTHORITY RECORDS:
    ->  lhlaw.local
        ttl = 3600 (1 hour)
        primary name server = svr001.lhlaw.local
        responsible mail addr = hostmaster
        serial  = 1868
        refresh = 900 (15 mins)
        retry   = 600 (10 mins)
        expire  = 86400 (1 day)
        default TTL = 3600 (1 hour)

------------
DNS request timed out.
    timeout was 2 seconds.
timeout (2 secs)
*** Request to svr001.lhlaw.local timed-out

> pacer.psc.uscourts.gov
Server:  svr001.lhlaw.local
Address:  192.168.16.2

------------
Got answer:
    HEADER:
        opcode = QUERY, id = 9, rcode = NXDOMAIN
        header flags:  response, auth. answer, want recursion, recursion avail.
        questions = 1,  answers = 0,  authority records = 1,  additional = 0

    QUESTIONS:
        pacer.psc.uscourts.gov.lhlaw.local, type = A, class = IN
    AUTHORITY RECORDS:
    ->  lhlaw.local
        ttl = 3600 (1 hour)
        primary name server = svr001.lhlaw.local
        responsible mail addr = hostmaster
        serial  = 1868
        refresh = 900 (15 mins)
        retry   = 600 (10 mins)
        expire  = 86400 (1 day)
        default TTL = 3600 (1 hour)

------------
DNS request timed out.
    timeout was 2 seconds.
timeout (2 secs)
*** Request to svr001.lhlaw.local timed-out




=======After DNS Server service restart=======
> pacer.psc.uscourts.gov
Server:  svr001.lhlaw.local
Address:  192.168.16.2

------------
Got answer:
    HEADER:
        opcode = QUERY, id = 11, rcode = NXDOMAIN
        header flags:  response, auth. answer, want recursion, recursion avail.
        questions = 1,  answers = 0,  authority records = 1,  additional = 0

    QUESTIONS:
        pacer.psc.uscourts.gov.lhlaw.local, type = A, class = IN
    AUTHORITY RECORDS:
    ->  lhlaw.local
        ttl = 3600 (1 hour)
        primary name server = svr001.lhlaw.local
        responsible mail addr = hostmaster
        serial  = 1868
        refresh = 900 (15 mins)
        retry   = 600 (10 mins)
        expire  = 86400 (1 day)
        default TTL = 3600 (1 hour)

------------
------------
Got answer:
    HEADER:
        opcode = QUERY, id = 12, rcode = NOERROR
        header flags:  response, auth. answer
        questions = 1,  answers = 1,  authority records = 2,  additional = 2

    QUESTIONS:
        pacer.psc.uscourts.gov, type = A, class = IN
    ANSWERS:
    ->  pacer.psc.uscourts.gov
        internet address = 207.41.15.138
        ttl = 3600 (1 hour)
    AUTHORITY RECORDS:
    ->  uscourts.gov
        nameserver = lsmns1.gtwy.uscourts.gov
        ttl = 3600 (1 hour)
    ->  uscourts.gov
        nameserver = resns1.gtwy.uscourts.gov
        ttl = 3600 (1 hour)
    ADDITIONAL RECORDS:
    ->  lsmns1.gtwy.uscourts.gov
        internet address = 207.41.18.68
        ttl = 3600 (1 hour)
    ->  resns1.gtwy.uscourts.gov
        internet address = 207.41.14.62
        ttl = 3600 (1 hour)

------------
Name:    pacer.psc.uscourts.gov
Address:  207.41.15.138
Avatar of IntInc
IntInc

Technodweeb,
I am having this exact same issue with uscourts.gov domains on one of our clients servers. Clearing the cache or rebooting solves the problem, but only until the next time it happens. Have you found a permanent resolution to the issue yet?
No, I am still searching... It only happen every so often with no tracks to follow so I try a buch of tests when it does and then restart the DNS service and all is fine. I have not hit upon any tests that really lead me anywhere yet but I will post here if I do.

My biggest issue is that when it is noticed, it is when a user is trying to file some time sensitive brief with a court and can not. This gives me a very narrow window of time to troubleshoot before I must get them online again. If I ever get an extended window, I figure I will open a ticket with Microsoft Support and let them see the issue first hand and then give suggestions to solve or to track down the real problem.

I am not suggesting that ISA2000 is the problem but I am going to be ditching it in about a week or so and replace it with a hardware firewall. Everything else will remain the same. I am half hoping this gets rid of the issue but then again, it bugs me very much that I can not track this one down.

-greg
Technodweeb,
Thank you for the response, to give your situation a bit of help, there is no need to reboot the server for the temporary fix, we discovered that the issue gets fixed if you clear the cache in the DNS managment console. After the cache is cleared it immediately starts responding again to uscourts.gov requests. We have also gone as far as manually adding all 1150 subdomains into the hosts file on the affected workstations and servers as a temporary solution. This temporary solution of course could entail a large amount of maintenance in the future so we want to find a better plan. I can provide you with this list of uscourts.gov sites if you want. In your earlier posts you mentioned that the issue occurrs with all .gov website and not just the uscourts.gov sites. We have not been able to test for other .gov hosts like irs.gov or other agencies. Can you confirm if it is uscourts.gov and subdomains only or does your issue affect all .gov sites?
Next time it occurs I will veryify. I am not excited about managing a hosts file but the thought has crossed my mind as I worked through this. Is the server in question on your side SBS2003? If not what platform and OS if you dont mind sharing. I am Dell Optiplex P4x1 2GB and SBS 2003 fully patched.
Ours is an HP Server ML370 G4 - Xeon 3.2ghz 1Gb ram, 2003 Server R2, not fully patched. Thank you I appreciate the help.
We are still attempting to find a resolution for this issue, please leave this posting up. Thank you.
I had Microsoft on the Server a couple of weeks ago and I think they fixed it... I will post the solution in a few hours.
OK, I am not exactly certain how the DNS server was functioning at all, for any TLD resolution but here is what solved our issues with the .GOV site resolution problems...

MS CASE_ID_NUM: srx060801606963

After about two hours of watching a MS tech support person poke around the system, the problem became worse but that was only temporary and a reboot brought it all back online again. What happened was possibly related but possibly not. About the same time this issue began happening, the SharePoint MSDE database died due to corruption and could not start. I reinstalled the services from the original CD\DVD and all seemed fine. When MS was poking around, it was noticed that the Site Identifier was not the original number for Companyweb and SharePointCentral Administration. He did some alterations in the registry, related to the Sharepoint stuff and this caused the Connect to the Internet Wizard in the Server Management console to no longer work. Now, this Wizard may have been broken for a while prior to MS altering the registry entries, I do not know. It had been a long time since I ran that Wizard. Anyway, MS gave me a reference to a KB Article (of which I can not find just now) which I used to properly uninstall and then reinstall the whole SharePoint services parts. This was not a big issue since we do not really use it and wiping it out and starting over on that was fine...

Now comes the meat and potatoes... In the DNS Management console under the Forwarders tab, I had no forwarders listed which I was told would cause the DNS to look to one of the root servers for resolution. I may have misunderstood the root part but the fact that I had no forwarders listed was probably causing me most of my problem. This was all happening late at night and I did not have my ISP DNS info in front of me as I was working at home so MS gave me two IP's to use temporarily but warned that they would be slow. (4.2.2.1 & 4.2.2.2) I used these for about 36 hours just to see if the problem went away before I changed back to my proper ISP DNS entries. I had no issues with the MS DNS Servers.

I have subsequently changed back to my ISP DNS for forwarded resolution requests and have seen no more problems at all. Previously, if I was to look in the DNS Management console under Cached Lookups, .(root), gov, uscourts, psc it would be empty and I knew the problem was happening. Now if the psc folder exists, it has host detail info inside or the folder content has expired and the psc folder is removed. I have been testing now for two weeks with no failures...

I have to give props to Nishant Singh of MS support and the SBS Team...

IntInc, I hope this helps you out a bit...

-greg
Thanks Greg.  I'll definitely take a look.
ASKER CERTIFIED SOLUTION
Avatar of Netminder
Netminder

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial